Monday, April 25, 2016

What happens when you create a MySQL Document Store

The MySQL Document Store introduced with version 5.7.12 allows developers to create document collections without have to know Structured Query Language. The new feature also comes with a new set of terminology. So let us create a collection and see what it in it (basically creating a table for us SQL speakin' old timers).

So start the mysqlsh program, connect to the server, change to the world-x schema (database) switch to Python mode, a create a collection (table).

What did the server do for us? Switching to SQL mode, we can use describe to see what the server has done for us.

We have a two column table. The first is named doc and is used to store JSON. And there is also a column named _id and please notice this column is notated as STORED GENERATED.

The generated column extracts values from a JSON document and materializes that information into a new column that then can be indexed. But what did the system extract for us to create this new column?

Lets use SHOW CREATE TABLE to find out.

mysql-sql> SHOW CREATE TABLE foobar;
+--------+----------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------------------------------------------------------+
| Table  | Create Table

                                                                      |
+--------+----------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------------------------------------------------------+
| foobar | CREATE TABLE `foobar` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'
))) STORED NOT NULL,
  UNIQUE KEY `_id` (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+--------+----------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql-sql>
So the 5.7.12 document store is creating an index for us on a field named _id in our JSON document. Hmm, what if I do not have an _id field in my data. So I added two records ("Name" : "Dave" and "Name" : "Jack") into my new collection and then took a peek.
mysql> select * from foobar;
+-------------------------------------------------------------+-----------------
-----------------+
| doc                                                         | _id
                 |
+-------------------------------------------------------------+-----------------
-----------------+
| {"_id": "819a19383d9fd111901100059a3c7a00", "Name": "Dave"} | 819a19383d9fd111
901100059a3c7a00 |
| {"_id": "d639274c3d9fd111901100059a3c7a00", "Name": "Jack"} | d639274c3d9fd111
901100059a3c7a00 |
+-------------------------------------------------------------+-----------------
-----------------+
2 rows in set (0.00 sec)

mysql>

But what if i do have a _id of my own?

The system picked up the _id for the Dexter record. Remember that the index on the _id field is marked UNIQUE which means you can not reuse that number.

So we know the document store wants is creating an unique identification number (that we can also use).

Update: The client generates the identification number, the server can not due to possible conflicts in future sharding projects.

2 comments:

  1. Thanks for the details.

    The unique secondary index on _id means that the minimum number of indexes for a DocStore table with InnoDB is 2 -- the hidden PK and the unique secondary on _id. This means the cost of the DocStore feature is an extra index, and unique secondary indexes have extra costs:
    1) change buffer cannot be used for them with InnoDB
    2) page reads must be done to validate unique constraint for all engines

    For MongoDB, there is a PK on _id so we don't get the cost of an extra index. Although for another reason there is an extra hidden index in MongoDB when using WiredTIger & RocksDB engines.

    ReplyDelete
  2. Someone please correct me if I'm wrong here but, Why are we storing what essentially amounts to a GUID as a UTF8 VARCHAR(32) rather than a VARBINARY(16)? 32 * 3 bytes for utf8 max length = 96 bytes vs 16 bytes that are actually needed for the primary key. This is going cut the branching factor of the GEN_CLUST_INDEX by 1/6 and create some very deep trees for no reason (this can kill performance especially load time for very large tables)... Am I missing something here or is there good justification for this design decision?

    ReplyDelete