Tuesday, April 18, 2017

New MySQL JSON Functions (more)

MySQL 8 is going to have new batch of JSON functions and last time JSON_PRETTY() was covered in some details. The recent release of 8.0.1 provides an opportunity to try these new functions and a few that might have been missed with 8.0.0.

Unquoting

The -> shortcut for JSON_EXTRACT() was introduced with MySQL 5.7. And now there is the unquoting extraction operator or ->> to simplify things again! Think of it as JSON_UNQUOTE wrapped around JSON EXTRACT. The following there queries produce the same output.

Aggregation

The new JSON_ARRAYAGG() and JSON_OBJECTAGG() takes a column or column argument and crates an array or object.

Clear as mud?

Well, examine this example:

The two rows from table foo are combined to make a two element array.

The JSON_OBJECT() function takes pairs of columns, assumes they are a key/value pair, and combines them. Note that non-JSON columns and data from JSON columns can be combined, as well as literal strings.

Monday, April 17, 2017

Pretty JSON with JSON_PRETTY()

JSON is a great way to share data and is highly praised in being human readable. Well, compared to XML and some other standards it is easier to read. But sometimes the data looks mushed or wilted. And 'easy' is relative, right? Well, what if you had a way to make it easier to read, er, pretty?

MySQL 8.0.1 has a few new JSON functions but one of them is designed to help make your JSON data look pretty. Place JSON_PRETTY around the column desired and out comes better looking JSON data.

Wow! That is an improvement!

But wait! There's more

JSON_PRETTY can also help make non-JSON data pretty JSON. Using JSON_ARRAY() or JSON_OBJECT to create JSON-ized versions of data and then passing those to JSON_PRETTY produces the desired effect.

Cool!

Being a long time programming with languages that used a lot of braces and brackets, I have found it necessary when not using a help IDA that counted matches for brackets/braces for me to hunt for that missing or extra bracket/braces. Looking at JSON data from a simple select can quickly tire you eyes trying to guess how things are grouped in objects. But it is evident between the first and second examples how nicely JSON_PRETTY presents the data.

Wednesday, February 8, 2017

MySQL 8 Invisible Indexes

MySQL 8 features invisible indexes. But what good is an index if you can see it?

Is It Really Invisible

You probably know by now that indexes can really speed up data searches. And many of you have mastered the use of EXPLAIN to see if the optimizer is using an index for your query. Which means you should know that sometimes for some queries an index does not help.

mysql> CREATE TABLE t2 (
    ->   i INT NOT NULL,
    ->   j INT NOT NULL,
    ->   UNIQUE j_idx (j)
    -> ) ENGINE = InnoDB;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t2 values (1,2),(3,4),(5,6),(7,8);
Query OK, 4 rows affected (0.00 sec)

mysql> explain select j from t2 where j>2;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key   | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | t2    | NULL       | range | j_idx         | j_idx | 4       | NULL |    3 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> 

So a simple select of j with values over 2 uses j_idx.

Hocus pocus - Your Index is now Invisible

But what if we are not sure if that index is really helping? Deleting an index for testing (and then rebuilding) can be a time consuming task. With MySQL 8 you simple make the index invisible.

mysql> alter table t2 alter index j_idx invisible;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain select j from t2 where j>2;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |    33.33 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> 

In this overly simple example it is easy to see that the query is not using a key, does not know of any possible keys, and had to read more rows to answer the query. So the j_idx query was helping for this case. You can also know if this is true by looking for errors occuring for queries that include index hints that refer to the invisible index, Performance Schema data shows an increase in workload for affected queries, or suddenly the query starts appearing in the slow query log.

Not for Primary keys

Any key can be made invisible except implicit or explicit primary keys. And it is storage engine neutral from MySQL 8.0.1 onward but 8.0.0 can only works with InnoDB. Please read the manual for more detail.

Presto-Change-o

But how do you reverse the invisibility?


mysql> alter table t2 alter index j_idx visible;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

Monday, January 30, 2017

PHP and MySQL Basics IV -- SQL Injection and Prepared Statements

SQL Injection is a highly feared and often misunderstood problem. The basic phobia is that someone hijacks your SQL request and suddenly has full access to everything in your database. Well, it usually is not that easy and it is actually easy to avoid.

Rule 1: Never Trust User Supplied

The usual example is something like a query SELECT * FROM customer_data WHERE customer_id='$id' and the programmer was expecting an integer for the customer_id. But a dastardly use inserts some horrible SQL code to pirate the information so the query looks like SELECT * FROM customer_data WHERE customer_id=1 OR customer_id > 0 and suddenly all your customer data is out free in the universe waiting for who knows what.

The code could have checked to see if the value of customer_id was truly an integer or returning an error if not. The is_int function was designed to do just this.

if is_int($customer_id)  {
  //Do all the stuff we want to do if we have a integer
  //submitted for a customer_id
} else {
  echo "Hey! I want an INTEGER for a customer identification number!";
}
Even more dastardly and from the PHP Manual where the dastardly injector resets all the passwords:

<?php
But a malicious user subits the value ' or uid like'%admin% to $uid to change the admin's password, or simply 
sets $pwd to hehehe', trusted=100, admin='yes to gain more privileges. Then, the query will be twisted:
>?php

// $uid: ' or uid like '%admin%
$query = "UPDATE usertable SET pwd='...' WHERE uid='' or uid like '%admin%';"
// $pwd: hehehe', trusted=100, admin='yes
$query = "UPDATE usertable SET pwd='hehehe', trusted=100, admin='yes' WHERE
...;";?>

Paranoid yet? This is another case checking the values for reasonableness can save grief. Is the uid an integer, is that integer in a proper range for uids? If too low or too high, you need to suspect someone is doing something bad.

On the database side

One thing I have recommended and implemented for years is separate users for SELECTs and INSERT/UPDATE/DELETEs. It is too easy to use one connection string over and over. But if you are in a situation where you need to take 'loose' information from an user and use that information against you data base, you need to split the queries. Set up a cust_read account on the MySQL server that can read customer data ONLY. The a corresponding cust_mod for all the other queries. This way you are assured that they can not delete or update records with that account if things get compromised. Also do not user superuser accounts like root for production database interactions -- save these accounts for maintenance functions.

Bound variables

The MySQL server supports using anonymous, positional placeholder with ?. Again from the PHP Manual:

/* Prepared statement, stage 1: prepare */
if (!($stmt = $mysqli->prepare("INSERT INTO test(id) VALUES (?)"))) {
    echo "Prepare failed: (" . $mysqli->errno . ") " . $mysqli->error;
}
Or as used in a SELECT

/* Prepared statement, stage 1: prepare */
if (!($stmt = $mysqli->prepare("SELECT * FROM test WHERE id = ?"))) {
    echo "Prepare failed: (" . $mysqli->errno . ") " . $mysqli->error;
}
/* Prepared statement, stage 2: bind and execute */
$id = 1;
if (!$stmt->bind_param("i", $id)) {
    echo "Binding parameters failed: (" . $stmt->errno . ") " . $stmt->error;
}

if (!$stmt->execute()) {
    echo "Execute failed: (" . $stmt->errno . ") " . $stmt->error;
} else {
    echo "Got it!!!\n";
}

Prepared Statements - Help But Not A Cure

Prepared statements make it nearly impossible to cram lots of data into one variable. That was nearly impossible.

Take the extra time to make sure that integers are really integers and that they are in the correct range of numbers. If you are looking for a string of up to say 30 characters that you do something when the length is over run. MySQL used to get a lot of flack with people running servers in less that strict mode and having excess data truncated with only a warning generated. Now sometimes that data is valuable. But if your corporate standard is to store emails in 45 characters and the user is trying to store 50, you need to programmatically warn those with long emails that their data is too long (and the Ops folks that the email to the account if going to bounce (if you do accept that truncated email)).

RTFM

The PHP Manual's section on SQL Injection is a must read. Do use separate MySQL accounts for SELECTs and INSERT/UPDATE/DELETEs. Keep asking yourself how to keep from exposing more data than the absolute minimum needed. Do not use SELECT * FROM foo but instead explicitly name the columns in your select statement; If something goes wrong you are not exposing data column that may have private information. And be paranoid -- keep asking if there is something else you can do to protect the data.

Wednesday, January 18, 2017

Python and the MySQL Document Store

The MySQL Document Store and X Devapi have a lot of very interesting features but right now my programming language of choice, PHP, is not yet supported. My Python is rusty and learning Node.JS is progressing. But the ability to search data from a database without knowing Structured Query Language (SQL) is going to appeal to many.

Example One

import mysqlx
import string

session = mysqlx.get_session({
'host': 'localhost',
'port': 33060,
'user': 'dstokes',
'password': 'Hell0Dave!'})

schema = session.get_schema('world_x');
collection = schema.get_collection('countryinfo')

print "Find three records***\n"
result = collection.find().limit(3).execute()
docs = result.fetch_all()

for i, data in enumerate(docs):
    print "{iteration}: {data}".format(iteration = i, data=data)

print "Find USA***\n"
result = collection.find('_id = "USA"').execute()
row = result.fetch_all()

for i, data in enumerate(row):
    print "{iteration}: {data}".format(iteration = i, data=data)

session.close()
Is result = collection.find().limit(3).execute() simpler than SELECT * FROM countryinfo LIMIT 3;? Maybe not. But the second query result = collection.find('_id = "USA"').execute() is more likely a better example. Now SELECT * FROM countryinfo WHERE _ID = 'USA'; is where you can start to see the value of the document store.

Variables can be bound to queries:

country = "Mexico"
result = collection.find('Name = :param').bind('param', country).execute()

Or an expanded version:

for country in ("Mexico", "Canada", "Brazil"):
    result = collection.find('Name = :param').bind('param', country).execute()

This is closer to what most programming languages teach today. SQL is a descriptive language. The differences between object-oriented/procedural languages and descriptive are subtle. But the subtlety escapes novices (and is often ignored by those at higher skill levels.

Of course it would be more efficient to make one dive into the database with a
SELECT * FROM countryinfo WHERE Name = 'Mexico' OR Name = 'Canada' OR Name = 'Brazil'
or
SELECT FROM countryinfo WHERE Name in ('Mexico', 'Canada', 'Brazil')

But those are not going to work.

Remember the data we are peeking at is in a JSON column.

So we would use
SELECT * FROM countryinfo where doc->"$.Name" = "Mexico" OR doc->"$.Name" = "Canada" OR doc->"$.Name" = "Brazil"
or
select * from countryinfo WHERE doc->"$.Name" IN ("Canada","Mexico","Brazil")
(note use of the -> shorthand in place of JSON_EXPLAIN)

That gets much more complicated. And complication can introduce errors. Novices should have few problems if they keep working in their chosen language without have to make a context switch mentally to write some SQL. And they days very few are being taught SQL.

Friday, January 13, 2017

PHP and MySQL Basics III -- Resulting Results

In the first two blogs entries on this series we set up a connection to MySQL and sent off a query. Now we need to get the data back from the database and into the application.

An Embarrassment of Riches

PHP has many options for what we want to do. But for the best place to start with was checking that rows were actually returned from a query. Below the results from a query are returned to a variable named $result. We can find out how many rows were returned from the server by examining $result->num_rows.
if (!$result = $mysqli->query($sql)) {
    
    // Again, do not do this on a public site, but we'll show you how
    // to get the error information
    echo "Error: Our query failed to execute and here is why: \n";
    echo "Query: " . $sql . "\n";
    echo "Errno: " . $mysqli->errno . "\n";
    echo "Error: " . $mysqli->error . "\n";
    exit;
}

// succeeded, but do we have a result?
if ($result->num_rows === 0) {
    // Oh, no rows! Sometimes that's expected and okay, sometimes
    // it is not. You decide.
    echo "No data returned.";
    exit;
}

This is a case where a programmer needs to know their data. In some cases you will not have a record or records returned because there is no data. Other times no data returned is a sign of big problems. So you have to have some education on what you expect back, and what you do not expect back.

Example

<?php
$mysqli = new mysqli("localhost", "root", "hidave", "world_x");

/* check connection */
if ($mysqli->connect_errno) {
    printf("Connect failed: %s\n", $mysqli->connect_error);
    exit();
}

/* Select queries return a resultset */
$query="SELECT Name, CountryCode, District FROM city LIMIT 10";

if ($result = $mysqli->query($query)) {

        if ($result->num_rows){
                printf("Select returned %d rows.\n", $result->num_rows);

                /* free result set */
                $result->close();
        } else {
                echo "No data returned";
        }
} else {   // if ($result)
   printf("Query failed: %s", $mysqli_error);
}

$mysqli->close();
?>

Sometime you just need the number of records, like number of outstanding customer orders. But in this case we are making sure we have some data to work with before proceedings.

So Now We Have Data

Now you have at least three choices -- rare, medium, or well done. Err, make that an associative array, an array or an object. Each have their uses and it is okay to have a favorite you use more.
$query="SELECT Name, CountryCode, District FROM city LIMIT 10";

if ($result = $mysqli->query($query)) {

        if ($result->num_rows){
                printf("Select returned %d rows.\n", $result->num_rows);
                $assoc = $result->fetch_assoc();
                $row = $result->fetch_row();
                $obj = $result->fetch_object();
        } else {
                echo "No data returned";
        }
} else {   // if ($result)
   printf("Query failed: %s", $mysqli_error);
}

So you make you choice of method and take the results. Here we use fetch_assoc(), fetch_row(), or fetch_object(). Depending on how you want to refer to the data, you use the one that fits the situation. Of course they are similar in use.

//associated array keys = column name, data = data from DB
printf("Sample assoc array %s -> %s\n", $assoc['Name'], $assoc['CountryCode']);

// simple row
printf("Sample row array %s -> %s\n", $row[0], $row[1]);

//object
printf("Sample object %s -> %s\n", $obj->Name, $obj->CountryCode);

Yes, you need to know all three as you will be looking at old code or someone else code that does not use your favorite. And sometimes you may need an object rather than a simple row.

Full Listing

<?php
$mysqli = new mysqli("localhost", "root", "hidave", "world_x");

/* check connection */
if ($mysqli->connect_errno) {
    printf("Connect failed: %s\n", $mysqli->connect_error);
    exit();
}

/* Select queries return a resultset */
$query="SELECT Name, CountryCode, District FROM city LIMIT 10";

if ($result = $mysqli->query($query)) {

        if ($result->num_rows){
                printf("Select returned %d rows.\n", $result->num_rows);
                $assoc = $result->fetch_assoc();
                $row = $result->fetch_row();
                $obj = $result->fetch_object();
        } else {
                echo "No data returned";
        }
} else {   // if ($result)
   printf("Query failed: %s", $mysqli_error);
}
//associated array keys = column name, data = data from DB
printf("Sample assoc array %s -> %s\n", $assoc['Name'], $assoc['CountryCode']);

// simple row
printf("Sample row array %s -> %s\n", $row[0], $row[1]);

//object
printf("Sample object %s -> %s\n", $obj->Name, $obj->CountryCode);

$result->close();
$mysqli->close();
?>

Wednesday, January 11, 2017

PHP and MySQL Basics II - Case Sense

Last time we set up a connection from a PHP program to a MySQL server. This time we will progress a little further in that direction.

Query

Data is asked for from the MySQL server by using a query written in a language named Structured Query Language (SQL). Now that we have a connection open to the server, we can pass out request to the server.

Manual Labor

The PHP Manual is wonderful 99% of time. If you take a peek at the page for mysqli::query there is a great example of a simple query. Many of learned to program by copying/pasting from books/manuals and this is a great us of the examples in the PHP manual. Except it may not work for you.

MySQL is usually case SeNsATiVe, so 'A' may not be the same thing as 'a'. But this is dependent to some extent on your operating system where 'A' = 'a'. I was using the example from the manual and ... it did not work.

What Happened

Here is an excerpt of the code, somewhat cut down:
<?php
$mysqli = new mysqli("localhost", "user", "secret", "world_x");

/* check connection */
if ($mysqli->connect_errno) {
    printf("Connect failed: %s\n", $mysqli->connect_error);
    exit();
}

$mysqli->close();
?>

Run the program and ... nothing.

So What Happened?

What happened is a subtle problem that novices will smack into very hard. Take a look at this section of the example.
/* Select queries return a resultset */
if ($result = $mysqli->query("SELECT Name FROM City LIMIT 10")) {
    printf("Select returned %d rows.\n", $result->num_rows);

    /* free result set */
    $result->close();
}

If you try the query SELECT Name FROM City LIMIT 10; with the MySQL command line client program you will get the answer. And the answer is:

mysql> SELECT Name FROM City LIMIT 10;
ERROR 1146 (42S02): Table 'world_x.City' doesn't exist
mysql>

I am using the new world_x example database where the city is NOT capitalized instead of the old world database where it is! This lesson can be summed as check you schema/table/column names for case sensitivity. Except that there is another problem here.

In the real world occasional the database/table/column that you carefully double checked was spelled correctly and with the proper case sensitivity will go away. It may have been renamed, deleted, munged, or what have you. What is needed is a way to check to see if there was an error if the query can not run.

Lets change the code slightly:

/* Select queries return a resultset */
if ($result = $mysqli->query("SELECT Name FROM City LIMIT 10")) {
    printf("Select returned %d rows.\n", $result->num_rows);

    /* free result set */
    $result->close();
} else {
    printf("Query failed: %s\n", $mysqli->error);
}

Always Check for Return Codes

By simply adding about 40 characters, the reliability of the program shoots up immensely AND we get an exact answer if what went wrong.
Query failed: Table 'world_x.City' doesn't exist
Same error as when we tried by query by hand. But now our code can handle this issue. We could even try to catch the error, send a note via a message queue to the operations staff about the nature of the problem, and possible limp along until things are resolved. Or we could just call exit()

When you are offered a return code be sure to check it. Yes, it may be over kill in simple examples. But the payoff comes when things go bad and you are scrambling to find out eleven months from now why your program is suddenly not working.