Thursday, August 10, 2017

Handy JSON to MySQL Loading Script

JSON in Flat File to MySQL Database

So how do you load that JSON data file into MySQL. Recently I had this question presented to me and I thought I would share a handy script I use to do such work. For this example I will use the US Zip (postal) codes from JSONAR. Download and unzip the file. The data file is named zips.json and it can not be bread directly into MySQL using the SOURCE command. It needs to have the information wrapped in a more palatable fashion.

head zips.json 
{ "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA", "_id" : "01001" }
{ "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA", "_id" : "01002" }
{ "city" : "BARRE", "loc" : [ -72.10835400000001, 42.409698 ], "pop" : 4546, "state" : "MA", "_id" : "01005" }
{ "city" : "BELCHERTOWN", "loc" : [ -72.41095300000001, 42.275103 ], "pop" : 10579, "state" : "MA", "_id" : "01007" }
{ "city" : "BLANDFORD", "loc" : [ -72.936114, 42.182949 ], "pop" : 1240, "state" : "MA", "_id" : "01008" }
{ "city" : "BRIMFIELD", "loc" : [ -72.188455, 42.116543 ], "pop" : 3706, "state" : "MA", "_id" : "01010" }
{ "city" : "CHESTER", "loc" : [ -72.988761, 42.279421 ], "pop" : 1688, "state" : "MA", "_id" : "01011" }
{ "city" : "CHESTERFIELD", "loc" : [ -72.833309, 42.38167 ], "pop" : 177, "state" : "MA", "_id" : "01012" }
{ "city" : "CHICOPEE", "loc" : [ -72.607962, 42.162046 ], "pop" : 23396, "state" : "MA", "_id" : "01013" }
{ "city" : "CHICOPEE", "loc" : [ -72.576142, 42.176443 ], "pop" : 31495, "state" : "MA", "_id" : "01020" }

Follow the Document Store Example

The MySQL Document Store is designed for storing JSON data and this example will follow its practices by having a two column table -- a JSON column, and another column for a primary key (remember InnoDB wants so badly to have a primary key on each table that it will create one for you but it is better practice to make it yourself; besides we want to search on the zipcode which is labeled as _id in the data. So we use a stored generated column that uses JSON_UNQUOTE(JSON_EXTRACT(doc,"$_id")) and saves that info in a column named zip.

So a simple table is created and it looks like this:

mysql> desc zipcode\g
+-------------+-------------+------+-----+---------+-------------------+
| Field       | Type        | Null | Key | Default | Extra             |
+-------------+-------------+------+-----+---------+-------------------+
| doc         | json        | YES  |     | NULL    |                   |
| zip         | char(5)     | NO   | PRI | NULL    | STORED GENERATED  |
+-------------+-------------+------+-----+---------+-------------------+
2 rows in set (0.00 sec)

Handy Script

So now we have the data, we have the table, and now we need to convert the data into something MySQL can use to laod the data.

Bash is one of those shells with so many rich built-in tools that is hard to remember them all. But it does have a hand read line feature that can be used for the task.


#!/bin/bash
file="/home/dstokes/Downloads/zips.json"
while IFS= read line
do
 echo "INSERT INTO zipcode (doc) VALUES ('$line');"
done <"$file"
Run the script and output the data to a file named foo, ./loader.sh > foo. The output shows how the data is wrapped:
$head foo
INSERT INTO zipcode (doc) VALUES ('{ "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA", "_id" : "01001" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA", "_id" : "01002" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "BARRE", "loc" : [ -72.10835400000001, 42.409698 ], "pop" : 4546, "state" : "MA", "_id" : "01005" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "BELCHERTOWN", "loc" : [ -72.41095300000001, 42.275103 ], "pop" : 10579, "state" : "MA", "_id" : "01007" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "BLANDFORD", "loc" : [ -72.936114, 42.182949 ], "pop" : 1240, "state" : "MA", "_id" : "01008" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "BRIMFIELD", "loc" : [ -72.188455, 42.116543 ], "pop" : 3706, "state" : "MA", "_id" : "01010" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CHESTER", "loc" : [ -72.988761, 42.279421 ], "pop" : 1688, "state" : "MA", "_id" : "01011" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CHESTERFIELD", "loc" : [ -72.833309, 42.38167 ], "pop" : 177, "state" : "MA", "_id" : "01012" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CHICOPEE", "loc" : [ -72.607962, 42.162046 ], "pop" : 23396, "state" : "MA", "_id" : "01013" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CHICOPEE", "loc" : [ -72.576142, 42.176443 ], "pop" : 31495, "state" : "MA", "_id" : "01020" }');

So now the data can be loaded with mysql -u itisme test < foo.

Tuesday, July 25, 2017

PHP and MySQL Without the SQL

Embedding Structured Query Language (SQL) within PHP, or other programming languages, has been problematic for some. Mixing two programming languages together is just plainly not aesthetically pleasing. Especially when you have a declarative language (SQL) mixed with a procedural-object oriented language. But now, with the MySQL XDevAPI PECL extension, PHP developers can now stop mixing the two languages together together.

MySQL Document Store

The MySQL Document Store eliminates the heavy burden for SQL skills. It is designed to be a high speed, schema-less data store and is based on the MySQL JSON data type. This gives you roughly a gigabyte of store in a document format to do with as needed. So you do not need to architect you data before hand when you have no idea how it will evolve. No need to normalize your data. Now behind the scenes is a power MySQL database server but you are no longer writing SQL to use it!

But Is The Code Ugly?

If you looked at previous editions of this blog then you have seen examples of using the MySQL XDevAPI PECL extension. There is another example below of how to search for the information under various keys in a JSON document. The great news is that the code is all very modern looking PHP with no messy SQL statements thumb-tacked onto the code. This should ongoing support by those with little or no SQL skills.

Previous you would have had to stick SELECT JSON_EXTRACT(doc,'Name') AS 'Country', JSON_EXTRACT(doc,geography) as 'Geo', JSON_EXTACT(doc,'geography.Region) FROM world_x WHERE _id = "USA" as a string in the PHP code. If you prefer the -> operator to replace JSON_EXTRACT, the code can be trimmed down to SELECT doc->"$.Name" AS 'Country', doc->"$.geography" AS 'Geo', doc->"$.geography.Region" FROM world_x WHERE _id = "USA".

But the XDevAPI simplifies these queries into $result = $collection->find('_id = "USA"')->fields(['Name as Country','geography as Geo','geography.Region'])->execute();. This is much easier to understand than the previous two queries for most. And this example shows how to chain down the document path as it specifies all of the geography hey's values and also just the data under geography.Region. It also show how to alias columns from the document store to a label of the developers choice.


#!/usr/bin/php
<?PHP
// Connection parameters
  $user = 'root';
  $passwd = 'hidave';
  $host = 'localhost';
  $port = '33060';
  $connection_uri = 'mysqlx://'.$user.':'.$passwd.'@'.$host.':'.$port; 
  echo $connection_uri . "\n";

// Connect as a Node Session
  $nodeSession = mysql_xdevapi\getNodeSession($connection_uri);
// "USE world_x"
  $schema = $nodeSession->getSchema("world_x");
// Specify collection to use
  $collection = $schema->getCollection("countryinfo");

// Query the Document Store
  $result = $collection->find('_id = "USA"')->fields(['Name as Country','geography as Geo','geography.Region'])->execute();

// Fetch/Display data
  $data = $result->fetchAll();
  var_dump($data);
?>

And The Output


mysqlx://root:hidave@localhost:33060
array(1) {
  [0]=>
  array(3) {
    ["Geo"]=>
    array(3) {
      ["Region"]=>
      string(13) "North America"
      ["Continent"]=>
      string(13) "North America"
      ["SurfaceArea"]=>
      int(9363520)
    }
    ["Country"]=>
    string(13) "United States"
    ["geography.Region"]=>
    string(13) "North America"
  }
}

User Guide

The MySQL Shell User Guide is a great place to start learning how to interactively start using the Document Store.

Tuesday, July 18, 2017

Using find() with the MySQL Document Store

The Video

The find() function for the MySQL Document Store is a very powerful tool and I have just finished a handy introductory video. By the way -- please let me have feed back on the pace, the background music, the CGI special effects (kidding!), and the amount of the content.

The Script

For those who want to follow along with the videos, the core examples are below. The first step is to connect to a MySQL server to talk to the world_x schema (Instructions on loading that schema at the first link above).

\connect root@localhost/world_x

db is an object to points to the world_x schema. To find the records in the countryinfo collection, use db.countryinfo.find(). But that returns 237 JSON documents, too many! So lets cut it down to one record by qualifying that we only want the record where the _id is equal to USA, db.countryinfo.find(‘_id = “USA”’)

Deeper Level items in the Document

You can reach deeper level items by providing the path to the object such as db.countryinfo.find(‘geography.Continent = “North America”). Be sure to remember Case Sensitivity!!

Limits, Offsets, and Specifications

It is very simple to place restrictions on the amount of output by post pending .limit(5).skip(6). And you can specify which parameters meet you specification by using find(‘GNP > 8000000’)

Limiting Fields

But what if you do not want all the document but just a few certain fields. Then post pend .fields([“Name”, “GNP”]) to find().

And once again you can dig deeper into the document with specifying the path, such as.fields([“Name”, “GNP”, “geography.Continent”]).sort(“GNP”).

Sorting

Yes, you can easily sort the output from find() by adding .sort(“GNP”,”Name”) at the end.

More Complex Searches

Of course you can make the find() function perform more complex dives into the data such as db.countryinfo.find(‘GNP> 500000 and IndepYear > 1500”).

Parameter Passed Values

And find of parameter passed values will be happy to find they have not been forgotten. db.countryinfo.find(“Name = :country”).bind(“Country”,”Canada”)

Sunday, July 16, 2017

MySQL Document Store Video Series

I am starting a series of videos on the MySQL Document Store. The Document Store allows those who do not know Structured Query Language (SQL) to use a database without having to know the basics of relational databases, set theory, or data normalization. The goal is to have sort 2-3 minute episodes on the various facets of the Document Store including the basics, using various programming languages (Node.JS, PHP, Python), and materializing free form schemaless, NoSQL data into columns for use with SQL.

The first Episode, Introduction, can be found here.

Please provide feedback and let me know if there are subjects you would want covered in the near future.

Thursday, July 6, 2017

MySQL Document Store Concepts for PHP Developers

Docstore versus traditional MySQL

I have had quite a few questions from PHP Developers who are very interested in the new MySQL Document Store versus the traditional use of MySQL with PHP. You used to have to write queries in Structured Query Language (SQL) by outing them in strings withing your PHP code. In the past I have heard many folks complain about having to use a programming language within a programming language. And several of you have just skipped all than an started using an ORM. And roughly two percent of the developers at conferences tell me they have had any formal training in SQL, relational theory, or sets!

So what changes with Document Store?

The first and biggest changing is that you stop writing queries in SQL. SQL is surprisingly hard for many programmers. Part of this is that SQL is a declarative language. CSS is also a declarative language. These languages describe what the desired output looks like. Most other languages assemble the parts to get the desired output.

The second is that you your code stops looking like string handing routines. Compare $result = $mysqli->query("SELECT doc FROM countryinfo WHERE _id='USA'")) of the traditional embedded SQL code embedded in the PHP code to $result = $collection->find('_id = "USA"')->execute() of the Document store.

TraditionalDocument Store

<?PHP
// Connection parameters
$host='127.0.0.1';
$user='root';
$pass='hidave';
$db  = 'world_x';

// connect to database server
$mysqli = mysqli_connect('localhost','root','hidave');

// Choose schema
$mysqli->select_db('world_x');


// send SQL query
if ($result = $mysqli->query("SELECT doc FROM countryinfo WHERE _id='USA'")) {
    $row = mysqli_fetch_row($result);
    var_dump($row);

    /* free result set */
    $result->close();
}


$mysqli->close();
?>

<?PHP
// Connection parameters
  $user = 'root';
  $passwd = 'hidave';
  $host = 'localhost';
  $port = '33060';

  $connection_uri = 'mysqlx://'.$user.':'.$passwd.'@'.$host.':'.$port; 
  

// Connect as a Node Session
  $nodeSession = mysql_xdevapi\getNodeSession($connection_uri);
// Choose schema
  $schema = $nodeSession->getSchema("world_x");
// Specify collection to use
  $collection = $schema->getCollection("countryinfo");
// Find desired record
  $result = $collection->find('_id = "USA"')->execute();
// Fetch/Display data
  $data = $result->fetchAll();
  var_dump($data);
?>

Choice

The good old mysqli extension is still a powerful way of getting data in and out of a MySQL server. But now you have another option that just may fit better with your programming paradigm.

Monday, July 3, 2017

How to Use PHP and MySQL Document Store

PHP Developers can now try the MySQL Document Store by using the MySQL X DevAPI for PHP PECL Extension. Developers in other languages have had access for a while but now PHP coders can get in on the action and use the MySQL Document Store.

What Does the Code Look Like?


#!/usr/bin/php
<?PHP
// Connection parameters
  $user = 'root';
  $passwd = 'S3cret#';
  $host = 'localhost';
  $port = '33060';
  $connection_uri = 'mysqlx://'.$user.':'.$passwd.'@'.$host.':'.$port;
  echo $connection_uri . "\n";

// Connect as a Node Session
  $nodeSession = mysql_xdevapi\getNodeSession($connection_uri);
// "USE world_x"
  $schema = $nodeSession->getSchema("world_x");
// Specify collection to use
  $collection = $schema->getCollection("countryinfo");
// SELECT * FROM world_x WHERE _id = "USA"
  $result = $collection->find('_id = "USA"')->execute();
// Fetch/Display data
  $data = $result->fetchAll();
  var_dump($data);
?>
Well, PHP code is PHP code. The big changes is that the developer no longer needs to use Structured Query Language to talk with the database. Now one connects to the schema of choice, sets the collection to use used, and then finds the record(s) of choice. Zero SQL involved.

PECL Extension

PECL is the PHP Community Library which houses all sorts of treasures. Now it also houses the MySQL XDevAPI extension. I have had the best luck with downloading the software and building it on my system following the directions in the README file. This is not an easy build but keep plugging and you will get it built.

Output


dstokes@davelaptop:~/phpxdev$ php 001.php
mysqlx://root:hidave@localhost:33060
array(1) {
  [0]=>
  array(7) {
    ["GNP"]=>
    int(8510700)
    ["_id"]=>
    string(3) "USA"
    ["Name"]=>
    string(13) "United States"
    ["IndepYear"]=>
    int(1776)
    ["geography"]=>
    array(3) {
      ["Region"]=>
      string(13) "North America"
      ["Continent"]=>
      string(13) "North America"
      ["SurfaceArea"]=>
      int(9363520)
    }
    ["government"]=>
    array(2) {
      ["HeadOfState"]=>
      string(14) "George W. Bush"
      ["GovernmentForm"]=>
      string(16) "Federal Republic"
    }
    ["demographics"]=>
    array(2) {
      ["Population"]=>
      int(278357000)
      ["LifeExpectancy"]=>
      float(77.099998474121)
    }
  }
}

Next Time

More PHP and MySQL Document Store code!

Monday, June 26, 2017

Indexing the MySQL Document Store

Indexing and the MySQL Document Store

The MySQL Document Store allows developers who do not know Structured Query Language (SQL) to use MySQL as a high efficient NoSQL document store. It has several great features but databases, NoSQL and SQL, have a problem searching through data efficiently. To help searching, you can add an index on certain fields to go directly to certain records. Traditional databases, like MySQL, allow you to add indexes and NoSQL databases, for example MongoDB, lets you add indexes. The MySQL Document Store also allows indexing.

So lets take a quick look at some simple data and then create an index.

mysql-js> db.foo.find()
[
    {
        "Name": "Carrie",
        "_id": "888881f14651e711940d0800276cdda9",
        "age": 21
    },
    {
        "Name": "Alex",
        "_id": "cc8a81f14651e711940d0800276cdda9",
        "age": 24
    },
    {
        "Last": "Stokes",
        "Name": "Dave",
        "_id": "davestokes"
    }
]
3 documents in set (0.01 sec)

mysql-js> db.foo.createIndex("ageidx").field("age","INTEGER", false).execute()
Query OK (0.01 sec)

The _id field was already indexed by default and I chose the age key for a new index. By the way you can crate UNIQUE and NON UNIQUE indexes. The arguments for the createIndex function are as follows. The first is the key in the JSON data to index. Second comes the index data type and age is an integer. And the third specifies if NOT NULL is supported and setting it to false means the column can contain NULL. BTW note that the last record has no age key which would be noted as a null; so if some of your records do not have the key to be indexed you should have this set to false.

So What Happened?

So lets take a look at what happened behind the scenes, using SQL.
mysql> DESC foo;
+------------------------------------------------+-------------+------+-----+---------+-------------------+
| Field                                          | Type        | Null | Key | Default | Extra             |
+------------------------------------------------+-------------+------+-----+---------+-------------------+
| doc                                            | json        | YES  |     | NULL    |                   |
| _id                                            | varchar(32) | NO   | PRI | NULL    | STORED GENERATED  |
| $ix_i_F177B50B40803DD7D3962E25071AC5CAA3D1139C | int(11)     | YES  | MUL | NULL    | VIRTUAL GENERATED |
+------------------------------------------------+-------------+------+-----+---------+-------------------+
3 rows in set (0.00 sec)

A VIRTUAL generated column was created. You may recall that virtual generated columns are not created until referenced, hence do not take up the space of a stored generated column. So what if $ix_i_F177B50B40803DD7D3962E25071AC5CAA3D1139C is not as human friendly as ageidx.

So lets try a search.

mysql-js> db.foo.find("age = 24")
[
    {
        "Name": "Alex",
        "_id": "cc8a81f14651e711940d0800276cdda9",
        "age": 24
    }
]
1 document in set (0.00 sec)

Sadly there is no corresponding function to the EXPLAIN SQL command. Which means there is no nice and easy way to see how much the index gains us in terms of performance.

Drop the index

But what if you want to remove that new index? Well, it is as simple as creating the index in the first place.

mysql-js> db.foo.dropIndex("ageidx").execute()
Query OK (0.01 sec)