Thursday, August 10, 2017

Handy JSON to MySQL Loading Script

JSON in Flat File to MySQL Database

So how do you load that JSON data file into MySQL. Recently I had this question presented to me and I thought I would share a handy script I use to do such work. For this example I will use the US Zip (postal) codes from JSONAR. Download and unzip the file. The data file is named zips.json and it can not be bread directly into MySQL using the SOURCE command. It needs to have the information wrapped in a more palatable fashion.

head zips.json 
{ "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA", "_id" : "01001" }
{ "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA", "_id" : "01002" }
{ "city" : "BARRE", "loc" : [ -72.10835400000001, 42.409698 ], "pop" : 4546, "state" : "MA", "_id" : "01005" }
{ "city" : "BELCHERTOWN", "loc" : [ -72.41095300000001, 42.275103 ], "pop" : 10579, "state" : "MA", "_id" : "01007" }
{ "city" : "BLANDFORD", "loc" : [ -72.936114, 42.182949 ], "pop" : 1240, "state" : "MA", "_id" : "01008" }
{ "city" : "BRIMFIELD", "loc" : [ -72.188455, 42.116543 ], "pop" : 3706, "state" : "MA", "_id" : "01010" }
{ "city" : "CHESTER", "loc" : [ -72.988761, 42.279421 ], "pop" : 1688, "state" : "MA", "_id" : "01011" }
{ "city" : "CHESTERFIELD", "loc" : [ -72.833309, 42.38167 ], "pop" : 177, "state" : "MA", "_id" : "01012" }
{ "city" : "CHICOPEE", "loc" : [ -72.607962, 42.162046 ], "pop" : 23396, "state" : "MA", "_id" : "01013" }
{ "city" : "CHICOPEE", "loc" : [ -72.576142, 42.176443 ], "pop" : 31495, "state" : "MA", "_id" : "01020" }

Follow the Document Store Example

The MySQL Document Store is designed for storing JSON data and this example will follow its practices by having a two column table -- a JSON column, and another column for a primary key (remember InnoDB wants so badly to have a primary key on each table that it will create one for you but it is better practice to make it yourself; besides we want to search on the zipcode which is labeled as _id in the data. So we use a stored generated column that uses JSON_UNQUOTE(JSON_EXTRACT(doc,"$_id")) and saves that info in a column named zip.

So a simple table is created and it looks like this:

mysql> desc zipcode\g
+-------------+-------------+------+-----+---------+-------------------+
| Field       | Type        | Null | Key | Default | Extra             |
+-------------+-------------+------+-----+---------+-------------------+
| doc         | json        | YES  |     | NULL    |                   |
| zip         | char(5)     | NO   | PRI | NULL    | STORED GENERATED  |
+-------------+-------------+------+-----+---------+-------------------+
2 rows in set (0.00 sec)

Handy Script

So now we have the data, we have the table, and now we need to convert the data into something MySQL can use to laod the data.

Bash is one of those shells with so many rich built-in tools that is hard to remember them all. But it does have a hand read line feature that can be used for the task.


#!/bin/bash
file="/home/dstokes/Downloads/zips.json"
while IFS= read line
do
 echo "INSERT INTO zipcode (doc) VALUES ('$line');"
done <"$file"
Run the script and output the data to a file named foo, ./loader.sh > foo. The output shows how the data is wrapped:
$head foo
INSERT INTO zipcode (doc) VALUES ('{ "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA", "_id" : "01001" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA", "_id" : "01002" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "BARRE", "loc" : [ -72.10835400000001, 42.409698 ], "pop" : 4546, "state" : "MA", "_id" : "01005" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "BELCHERTOWN", "loc" : [ -72.41095300000001, 42.275103 ], "pop" : 10579, "state" : "MA", "_id" : "01007" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "BLANDFORD", "loc" : [ -72.936114, 42.182949 ], "pop" : 1240, "state" : "MA", "_id" : "01008" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "BRIMFIELD", "loc" : [ -72.188455, 42.116543 ], "pop" : 3706, "state" : "MA", "_id" : "01010" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CHESTER", "loc" : [ -72.988761, 42.279421 ], "pop" : 1688, "state" : "MA", "_id" : "01011" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CHESTERFIELD", "loc" : [ -72.833309, 42.38167 ], "pop" : 177, "state" : "MA", "_id" : "01012" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CHICOPEE", "loc" : [ -72.607962, 42.162046 ], "pop" : 23396, "state" : "MA", "_id" : "01013" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CHICOPEE", "loc" : [ -72.576142, 42.176443 ], "pop" : 31495, "state" : "MA", "_id" : "01020" }');

So now the data can be loaded with mysql -u itisme test < foo.