Friday, August 12, 2016

Moving data from MongoDB to MySQL's JSON Document Store

I had an interesting phone call from someone wanting to move from MongoDB to MySQL's new JSON Document Store. The big question was 'How do I export my Mongo data into something I can read into MySQL?"

The good news is that the folks at Mongo have given us the tools for the hard part of a simple process. For this example I am going to use some test data create previously in a database creatively named dave. The numbers are made up and I am using my name and the names of the canines residing in my home. So a quick peek at the data:


$ mongo
MongoDB shell version: 3.2.8
connecting to: test
> use dave
switched to db dave
> db.dave.find()
{ "_id" : 123, "name" : "Dave" }
{ "_id" : 456, "name" : "Jack" }
{ "_id" : 789, "name" : "Dexter" }
{ "_id" : 787, "name" : "Boo" }
> 
bye

Dumping That Data

First use mongodump -d dave to write out the data much as you would with mysqldump. Under you current working directory of your shell (assuming you are on Unix/Linux) there will be created a directory named dump. And under dump is a directory named after the example database dave.

A dave.bson file was created with the data.

BSON to Text

Executing bsondump dave.bson > output.file will convert the MongoDB BSON formatted data into something much easier to work with.

$ cat output.file 
{"_id":123.0,"name":"Dave"}
{"_id":456.0,"name":"Jack"}
{"_id":789.0,"name":"Dexter"}
{"_id":787.0,"name":"Boo"}

The output.file can be processed in a number of ways to pull it into MySQL such as using your favorite text editor to wrap insert statements around the data or using a simple program that reads a line from the text file and then send data to the database.

Thursday, August 4, 2016

MySQL Document Store -- The NoSQL Zipcodes

The MySQL Document Store functionality allows developers to use a relation database with or without SQL (structured Query Language), also known as NoSQL. The example in this blog is hopefully a simple look at this new feature of MySQL. The example data used is from JSONStudio.com and is a JSON formatted data set for US zip (postal) codes (656K compressed). So download your copy of this data set and lets get to work.

Create a collection

Collections are tables and below we create a collection name 'zip' in the test database in the Python dialect.

mysqlsh -u root -p --py test
Creating an X Session to root@localhost:33060/test
Enter password:
Default schema `test` accessible through db.

Welcome to MySQL Shell 1.0.4 Development Preview

Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type '\help', '\h' or '\?' for help.

Currently in Python mode. Use \sql to switch to SQL mode and execute queries.
mysql-py> db.createCollection("zip")


Is it there?

As soon as most of use create a table we want to see if it is there.

mysql-py> db.getCollections();
[
    <Collection:zip>
]
mysql-py> 
So it is there. But what is the underlying structure of this table. Switch to SQL dialect (or open a mysql client.

mysql> SHOW CREATE TABLE zip;
+-------+--------------------------------+
| Table | Create Table                                                                                                                                                                                                       |
+-------+--------------------------------+
| zip   | CREATE TABLE `zip` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,
  PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+-------+--------------------------------+
1 row in set (0.00 sec)

mysql> 
If you peeked at the zip code file you downloaded, you may have noticed that it has an _id field already. But what if your data set has no _id or you want to use another key/value pair from the data as an index? Simply use a stored generated column on the field of your choice. Remember good indexing practices still count as the underlying relational database still has to keep the infrastructure underneath up to date.

Loading data

I will skip over the loading of the zip code data (I can address that in a later blog post if there is any interest. For now lets take it as a given that the data has been moved into the new collection.

Finding a Rainbow

So lets look for a particular zip code. For out data set the zip code corresponds with _id field.And remember that this column is a generated column using that field from the JSON document.

mysql-py> db.zip.find("_id = '76077'")
[
    {
        "_id": "76077",
        "city": "RAINBOW",
        "loc": [
            -97.70652,
            32.281216
        ],
        "pop": 722,
        "state": "TX"
    }
]
1 document in set (0.00 sec)

mysql-py> 

How About Searching a Non-indexed JSON data

Lets look for the state of Texas, or TX in the JSON data. Previous we had the _ID field as a materialized column extracted from the JSON data. Now we are asking the MySQL server to read all the records and return the ones meeting the criteria. This does perform a full table scale of the data (not as efficient as as index) but, thanks to the relatively small amount of records, it does return fairly quickly.

mysql-py> db.zip.find("state = 'TX'")
.
.  (Omitted)

.  
{
        "_id": "79935",
        "city": "EL PASO",
        "loc": [
            -106.330258,
            31.771847
        ],
        "pop": 20465,
        "state": "TX"
    },
    {
        "_id": "79936",
        "city": "EL PASO",
        "loc": [
            -106.30159,
            31.767655
        ],
        "pop": 52031,
        "state": "TX"
    }
]
1676 documents in set (0.06 sec)

mysql-py> 

Wrap Up

So now we can create a collection and search it. But what happens when we add records and especially records without our index-able key? That will be covered in another blog soon.