Open Source Databases: sql

Showing posts with label sql. Show all posts

Monday, May 20, 2019

Structuring Your Unstructured JSON data

The world seems awash in unstructured, NoSQL data, mainly of the JSON variety. While this has a great many benefits as far as data mutability and not being locked into a rigid structure there are some things missing that are common in the structured world of SQL databases.

What if there was a way to take this unstructured NoSQL JSON data and cast it, temporarily, into a structured table? Then you could use all the processing functions and features found in a relation database on you data. There is a way and it is the JSON_TABLE function.

JSON_TABLE

You can find the documentation for JSON_TABLE here but there are some examples below that may make learning this valuable function easier than the simple RTFM.

I will be using the world_x dataset for the next example

If we run a simple SELECT JSON_PRETTY(doc) FROM countryinfo LIMIT 1; the server will return something similar to the following:

{
"GNP": 828,
"_id": "ABW",
"Name": "Aruba",
"IndepYear": null,
"geography": {
"Region": "Caribbean",
"Continent": "North America",
"SurfaceArea": 193
},
"government": {
"HeadOfState": "Beatrix",
"GovernmentForm": "Nonmetropolitan Territory of The Netherlands"
},
"demographics": {
"Population": 103000,
"LifeExpectancy": 78.4000015258789
}
}

We can use JSON_TABLE to extract the Name, the Head of State, and the Governmental Form easily with the following query. If you are not used to the MySQL JSON Data type, the "$" references the entire document in the doc column (and doc is out JSON data type column in the table). And notice that the $.government.HeadOfState and $.government.GovernmentForm are the full path to the keys in the document.

select jt.* FROM countryinfo,
json_table(doc, "$" COLUMNS (
name varchar(20) PATH "$.Name",
hofstate varchar(20) PATH '$.government.HeadOfState',
gform varchar(50) PATH '$.government.GovernmentForm')) as jt
limit 1;

The syntax is JSON_TABLE(expr, path COLUMNS (column_list) [AS] alias) where expr is either a column column from a table or a JSON document passed to the function ('{"Name": "Dave"}' as an example). Then the desired columns are specified where we name the new column, give it a relational type, and then specify the path of the JSON values we want to cast.

And the results are in the form of a relational table.

+-------+----------+----------------------------------------------+
| name | hofstate | gform |
+-------+----------+----------------------------------------------+
| Aruba | Beatrix | Nonmetropolitan Territory of The Netherlands |
+-------+----------+----------------------------------------------+

This is JSON_TABLE in its most basic form. The only thing I would like to emphasize is that the keys of the JSON data are case sensitive and it is import to check your spelling!

Data Problems

There is also a nice feature to JSON_TABLE where you assign a default value if that key/value pair is missing or yet another value if there is something can not be cast. If we use a non-existent key/value pair here named 'xyz' for an example, we can insert the value '888' for any JSON document missing values.

select jt.* FROM countryinfo,
json_table(doc, "$" COLUMNS (
name varchar(20) PATH "$.Name",
hofstate varchar(20) PATH '$.government.HeadOfState',
xyz int(4) PATH '$.xyz' DEFAULT '999' ON ERROR DEFAULT '888' ON EMPTY,
gform varchar(50) PATH '$.government.GovernmentForm')) as jt
limit 1;

And how the result looks:

+-------+----------+-----+----------------------------------------------+
| name | hofstate | xyz | gform |
+-------+----------+-----+----------------------------------------------+
| Aruba | Beatrix | 888 | Nonmetropolitan Territory of The Netherlands |
+-------+----------+-----+----------------------------------------------+

NULL Handling

Now be careful with Null values. If you change the new line to xyz int(4) PATH '$.IndepYear' DEFAULT '999' ON ERROR DEFAULT '888' ON EMPTY, we can easily see that the NULL value for Aruba's year of independence will return the default '999' value. And if you change the path to '$.Name' to try and force the string value into the integer it will trake the ON ERROR path.

And you can assign missing values to NULL

Nested Path Data

Iterating nested arrays can be painful but JSON_TABLE makes it very simple. So creating some dummy data, we can start work on digging through the nested information.

select * from a;
+----+-----------------------+
| id | x |
+----+-----------------------+
| 1 | {"a": 1, "b": [1, 2]} |
| 2 | {"a": 2, "b": [3, 4]} |
| 3 | {"a": 3, "b": [5, 6]} |

+----+-----------------------+

The query features the NESTED PATH argument

select d.* FROM a,
JSON_TABLE(x, "$" columns
(mya varchar(50) PATH "$.a",
NESTED PATH "$.b[*]"
columns (myb int path '$'))
) as d;

The output.

+-----+-----+
| mya | myb |
+-----+-----+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
| 3 | 5 |
| 3 | 6 |
+-----+-----+
6 rows in set (0.0013 sec)

Not bad but lets add another level.

select * from b;
+----+-----------------------------------------------------+
| id | x |
+----+-----------------------------------------------------+
| 1 | {"a": 2, "b": [{"c": 101, "d": [44, 55, 66]}]} |
| 2 | {"a": 1, "b": [{"c": 100, "d": [11, 22, 33]}]} |
| 3 | {"a": 3, "b": [{"c": 102, "d": [77, 88, 99, 101]}]} |
+----+-----------------------------------------------------+
3 rows in set (0.0009 sec)

So lets embed another level

select d.* FROM b,
JSON_TABLE(x, "$" columns
(mya varchar(50) PATH "$.a",
NESTED PATH "$.b[*]"
columns (myc int path '$.c',
NESTED PATH '$.d[*]'
columns (dpath int path '$')))
) as d order by myc;
+-----+-----+-------+
| mya | myc | dpath |
+-----+-----+-------+
| 1 | 100 | 22 |
| 1 | 100 | 33 |
| 1 | 100 | 11 |
| 2 | 101 | 44 |
| 2 | 101 | 55 |
| 2 | 101 | 66 |
| 3 | 102 | 77 |
| 3 | 102 | 88 |
| 3 | 102 | 99 |
| 3 | 102 | 101 |
+-----+-----+-------+
10 rows in set (0.0006 sec)

And we can get ordinal numbers too.

select d.* FROM b,
JSON_TABLE(x, "$" columns
(mya varchar(50) PATH "$.a",
NESTED PATH "$.b[*]"
columns (myc int path '$.c',
nested path '$.d[*]'
columns (dcount for ordinality,
dpath int path '$'))) ) as d
order by dpath;
+-----+-----+--------+-------+
| mya | myc | dcount | dpath |
+-----+-----+--------+-------+
| 1 | 100 | 1 | 11 |
| 1 | 100 | 2 | 22 |
| 1 | 100 | 3 | 33 |
| 2 | 101 | 1 | 44 |
| 2 | 101 | 2 | 55 |
| 2 | 101 | 3 | 66 |
| 3 | 102 | 1 | 77 |
| 3 | 102 | 2 | 88 |
| 3 | 102 | 3 | 99 |
| 3 | 102 | 4 | 101 |
+-----+-----+--------+-------+
10 rows in set (0.0009 sec)

And not that we have the data structured, we can start using the WHERE clause such as  where myc > 100 and dpath < 100.

Sunday, March 10, 2019

MySQL and PHP Basics Part I

I have had some requests to write some blogs on the basics of using PHP and MySQL together. This will not be a series for the experienced as it will start at a level where I will go into a lot of details but expect very few prerequisites from the reader. If this is not you, please move on. If it is you and you read something you do not understand, please contact me to show me where I assumed too much.

PHP and MySQL are both in their mid twenties and both vital in the worlds of developers. With the big improvements in PHP 7 and MySQL 8, I have found a lot of developers flocking to both but stymied by the examples they see as their are many details not explained. So let's get to the explaining!

1. Use the latest software

If you are not using PHP 7.2 or 7.3 (or maybe 7.1) then you are missing out in features and performance. The PHP 5.x series is deprecated, no longer support, and is quickly disappearing.

MySQL 8.0 is likewise a big advancement but many sites are using earlier versions. If you are not on 5.6 with plans to upgrade to 5.7 then you are about to be left behind. If you are running an earlier version then you are using antique code. Also get your MySQL from MySQL as your Linux distro may not be current, especially for the smaller ones. The APT and DEB repos can be found here and there are Docket containers available too.

In many cases it is a fight to keep your software core tools up to date, or fairly up to to date. The time and heartache in fighting problems resolved in a later version of software or creating a work around for a feature not in your older version will eventually bite you hard and should be viewed as a CRM (Career Limiting Move). BTW hiring managers look for folks with current skills not skill for a decade old version of the skills.

2. Do not pick one connector over another (yet!)

PHP is a very rich environment for developers and it has three viable options for connecting to MySQL databases. Please note that the older mysql connector is deprecated, no longer support, and is to be avoided. It was replaced by the mysqli or mysqlnd (native driver) and is officially supported by Oracle's MySQL Engineers. Next is the PDO (public data objects) connector that is designed to be database agnostic but there is no overall maintainer who watches out for the code but Oracle MySQL Engineers do try to fix MySQL related issues if they do not impinge on other PDO code. And the newest, using the new MySQL X DevAPI and X protocol is the X DevAPI connector which supports both SQL and NoSQL interfaces.

The good news for developers is that you can install all three, side by side, with no problem. For those staring out the ability to transpose from connector can be a little confusing as they work just a bit differently from each other but the ability to use two or more is a good skill to have. Such much like being able to drive a car with an automatic or manual transmission, it does give you more professional skills.

Next time we will install PHP, MySQL, the three connectors, and some other cool stuff so you can start using PHP to access your MySQL servers.

Wednesday, January 30, 2019

What the Heck is a Lateral Derived Table?

MySQL 8.0.14 came with Lateral Derived Tables as a new feature. But what is that and how do you use them?

Let's start what a derived table is. According to the manual (link below) A derived table is an expression that generates a table within the scope of a query FROM clause. You are probably used to using a subquery or JSON_TABLE where that query feeds data into another query.

Derived Tables

Derived tables can return a scalar, column, row, or table. But there are limits to their use.
A derived table cannot be a correlated subquery and a derived table cannot contain references to other tables of the same SELECT. And previous to MySQL 8.0.14, a derived table cannot contain outer references.

Okay, so what is an 'outer reference'? The SQL-92 standard states that derived tables cannot depend on other tables in the same FROM clause.and cannot contain references to columns of other FROM clause tables. But SQL-99 changed that with the keyword LATERAL (think sudo) so you can reference previously mention table.

Overly Simplified Example

The following is a very contrived query to illustrate the use of the LATERAL keyword. The query behind the LATERAL is returning the country.Code column to the query in front of the LATERAL. I am using the good old World database from the MySQL example data sets.

SELECT Name, Population, District, x.cc
FROM city,
LATERAL (SELECT Code AS cc FROM country
WHERE city.CountryCode = Code) AS x
WHERE District = 'Texas' ORDER BY name;

Now if you remove that the LATERAL from the above query, you will see the following error message. (Please do try the above query with and without LATERAL as a learning exercise)

Error Code: 1054. Unknown column 'city.CountryCode' in 'where clause'

So with the LATERAL keyword in place, the left query (the one before the LATERAL) provides that city.CountryCode column to the right query (the one after the lateral). In other words the query behind the LATERAL keyword is depending on the previous query. Since we named the derived table as x we need to remember to reference it in the other table with x too.

Quick Conclusion

So with lateral derived tables and Common Table Expressions (CTEs), it has become much easier to write sunqueries with MySQL.

Friday, November 9, 2018

A Tale of Two JSON Implementations - MySQL and MariaDB

JSON has proven to be a very import data format with immense popularity. A good part of my time for the last two or so years has been dedicated to this area and I even wrote a book on the subject. This is a comparison of the implementations of handling JSON data in MySQL and MariaDB. I had requests from the community and customers for this evaluation.

JSON Data Types Are Not All Equal

MySQL added a JSON data type in version 5.7 and it has proven to be very popular. MariaDB has JSON support version 10.0.16 but is actually an alias to a longtext data type so that statement based replication from MySQL to MariaDB is possible.

MySQL stores JSON documents are converted to an internal format that permits quick read access to document elements. The binary format is structured to enable the server to look up sub-objects or nested values directly by key or array index without reading all values before or after them in the document. From a practical standpoint the big thing most people notice is that the data is alphabetized.

MariaDB does no such optimization and their documentation states the claim that the performance is at least equivalent.

JSON Tables

The first comparison is 'how hard is it to put non-JSON data into a JSON column?' The standard (see below under 'extra'), by the way, states that the data must be in the UTF8MB4 character set. And what is going on 'underneath the cover'?

The first step is to create tables to hold JSON data.

MySQL:

create table t1 (j json);
Query OK, 0 rows affected (0.1826 sec)

MySQL localhost:33060+ ssl json SQL > show create table t1;
+-------+----------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+----------------------------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE `t1` (
`j` json DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+----------------------------------------------------------------------------------------------------------------+
1 row in set (0.0149 sec)

MySQL localhost:33060+ ssl json SQL > desc t1;
+-------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------+------+-----+---------+-------+
| j | json | YES | | NULL | |
+-------+------+------+-----+---------+-------+
1 row in set (0.0028 sec)

MariaDB:

MariaDB [json]> create table t2 (j json);
Query OK, 0 rows affected (0.046 sec)

MariaDB [json]> show create table t2;
+-------+----------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                     |
+-------+----------------------------------------------------------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (
`j` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+----------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.000 sec)

MariaDB [json]> desc t2;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| j     | longtext | YES | | NULL |      |
+-------+----------+------+-----+---------+-------+
1 row in set (0.001 sec)

MariaDB [json]>

Conclusion: Both use UTF8MB4 and the underlying tables are roughly equivalent.

Checking Constraints

Ensuring that only JSON gets into the JSON columns is essential. MySQL does this by default for the JSON data type but MariaDB does not.

MySQL:

insert into t1 values('junk');

ERROR: 3140: Invalid JSON text: "Invalid value." at position 0 in value for column 't1.j'.

MariaDB:

MariaDB [json]> insert into t2 values ('junk');

Opps! We now have a NON-JSON value in the table.

To be fair, MariaDB does let you establish a constraint check on the column to avoid this issue.

MariaDB [json]> drop table t2;
Query OK, 0 rows affected (0.046 sec)

MariaDB [json]> create table t2 (j json, check (json_valid(j)));

Query OK, 0 rows affected (0.049 sec)

MariaDB [json]> insert into t2 values ('junk');

ERROR 4025 (23000): CONSTRAINT `CONSTRAINT_1` failed for `json`.`t2`

Conclusion: MySQL does as advertised by rejecting non-JSON data by default while MariaDB can do so with a little extra work.

Loading Data

The only major complaint about JSON data is that there is just so much of it. So having a way to bulk load is important.

MySQL:

MySQL's new shell (mysqlsh) has Python, JavaScipt, and SQL modes. It is very easy to use either the Python or JavaScript modes to write a quick script to read bulk data sets line by line. Giuseppe 'Datacharmer' Maxia has a great example of converting data from MongoDB to MySQL using the shell and I have used that example code extensively in the past. But now 8.0.13 has a bulk loader built into the shell.

The New MySQL Shell's utility to bulk load JSON

This utility functions allows JSON data to be stores in a JSON Document Collection (MySQL Document Store) or in a JSON column of s structured table. Very handy.

MariaDB:

Maria does not have an equivalent to the new MySQL Shell nor does it have a bulk loader utility for JSON data.

I tired to use the Connect Storage Engine (not installed by default) and follow the examples on the CONNECT JSON Table Type page without luck. The Connect engine is supposed to have the capability to auto discover a table structure and define the table itself. Mea Cupla for not getting this to work and I would encourage those who do use this feature to send me pointers PLEASE! But after several hours and tedious attempts to follow the examples exactly it was time to move on to something else.

Conclusion: MySQL does better loading data, especially in bulk.

JSON Functions

Both databases have functions and there are some differences.

Functions That Create JSON Values

JSON_ARRAY, JSON_OBJECT and JSON_QUOTE are found in both and work the same.

Functions That Search JSON Values

JSON_CONTAINS, JSON_CONTAINS_PATH, JSON_EXTRACT, JSON_KEYS, and JSON_SEARCH are found in both and work the same. However only MySQL has the -> and ->> shortcuts for JSON_EXTRACT and JSON_UNQUOTE(JSON_EXTRACT))..

Functions That Modify JSON Values

JSON_ARRAY_APPEND, JSON_ARRAY_INSERT, JSON_INSERT, JSON_REMOVE, JSON_REPLACE, JSON_SET, and JSON_UNQUOTE are in both and work the same.
MariaDB has JSON_MERGE which has been deprecated in MYSQL 8.0.3 and replaced with JSON_MERGE_PRESERVE & JSON_MERGE_PATCH. MySQL 8.0 supports the JSON Merge Patch format defined in RFC 7396 function.

Functions That Return JSON Value Attributes

JSON_DEPTH, JSON_LENGTH, JSON_TYPE, and JSON_VALID are found in both and work the same.

Table Functions

JSON_TABLE which allows you to use unstructured JSON data in a structured temporary table is in MySQL only.

JSON Utility Functions

JSON_PRETTY, JSON_STORAGE_FREE, and JSON_STORAGE_SIZE are only in MySQL.

Other Functions

JSON_ARRAYAGG and JSON_OBJECTAGG are only in MySQL and are very handy for turning non JSON data into JSON.
JSON_VALUE and JSON_QUERY are MariaDB specific and return an object/array or a scalar respectively.

Conclusion: I tested both database's functions and found they worked as advertised. JSON_PRETTY is much missed by my eyes when dealing with documents with several levels of embedding with MariaDB. The merging functions are richer for MySQL especially for those who need to follow the merge patch standard. And JSON_TABLE is a valuable tool when trying to treat unstructured data as structured data temporarily especially combined with CTEs and Windowing Functions that were introduced in MySQL 8.

Updating Data

Updating data can be expensive and JSON data can be expansive.

MySQL:

MySQL 5.7 required a complete re-write of the document. If this is something you do a lot then you need to consider upgrading to MySQL 8.

MySQL 8.0's the optimizer can perform a partial, in-place update of a JSON column instead of removing the old document and writing the new document in its entirety to the column.

Replication. But there are conditions to this: 1) It has to be a JSON column, 2) The UPDATE statement uses any of the three functions JSON_SET(), JSON_REPLACE(), or JSON_REMOVE() to update the column but a direct set like UPDATE mytable SET jcol = '{"a": 10, "b": 25'}) does not work, 3) The input column and the target column must be the same column, 4) All changes replace existing array or object values with new ones, and do not add any new elements to the parent object or array, and 5) The value being replaced must be at least as large as the replacement value. In other words, the new value cannot be any larger than the old one (An exception to this requirement occurs when a previous partial update has left sufficient space for the larger value. You can use the function JSON_STORAGE_FREE() see how much space has been freed by any partial update). If you can follow those rules the optimizer will do partial rewrites.

MariaDB:

I searched the MariaDB docs and Jira but found no mention of partial JSON column updates. If anyone has links, please send them to me. So it appears that MariaDB does a full rewrite.

Conclusion: MySQL is more efficient here.

Replication

Efficient replication is a must and goes double for JSON with the potential of very large document payloads having to cross network connections.

MySQL:

In MySQL 5.7 an update to a JSON column was written to the binary log as the complete document. In MySQL 8.0, it is possible to log partial updates to JSON documents. In statement based replication JSON partial updates are always logged as partial updates.

But in row based replication they are logged as complete documents. To enable the logging of partial updates, set binlog_row_value_options=PARTIAL_JSON. Please note that the replication master has this variable set, partial updates received from that master are handled and applied by a replication slave regardless of the slave's own setting for the variable.

MariaDB:

I searched the MariaDB docs and Jira but found no mention of partial JSON Replication updates. If anyone has links, please send them to me. So expect the full document to be replicated.

Conclusion: MySQL can be more efficient.

X DevAPI / Document Store

Only MySQL has the Document Store and the underlying Document Store and it is proving to be very popular with customers. The ability to use a database as a NoSQL Document Store and a relational database is popular. Not having embedded strings in their code and having the IDEs help is driving developers to this approach.

MariaDB's CTO said at Zendcon that they will support if customers demand.

Conclusion: MySQL is the sole choice here.

Overall Conclusion

I have been using MySQL's JSON data type since the 5.7 DMRs and know them well. MariaDB's implementation seems very familiar and works as expected. MySQL is superior in the partial updates of data and replication, functions such as JSON_TABLE, the X DevAPI, and bulk loading of data.

Extra

The IETF's standard is here and makes pretty quick reading.

Sunday, July 29, 2018

A Kind Introduction MySQL Windowing Functions Part I

Windowing functions are a critical tool for grouping rows of data that are related to other rows. But they go far beyond the regular aggregate functions found in MySQL 5.7 and earlier. In MySQL 8 you do not have to collapse all the information down into a single output row. Each row can retain its individual identity but the server can analyze the data as a unit.

Statistics and Damned Lies

Finding the total Population of the District Texas from the world.city table is simple.

SQL> select District, sum(Population)
from city where district = 'Texas';
+----------+-----------------+
| District | sum(Population) |
+----------+-----------------+
| Texas    |         9208281 |
+----------+-----------------+
1 row in set (0.0068 sec)

Simple. But try to expand it to the entire USA and you get problems.

SQL> select District, sum(Population)
from city where CountryCode = 'USA';
+----------+-----------------+
| District | sum(Population) |
+----------+-----------------+
| New York |        78625774 |
+----------+-----------------+
1 row in set (0.0046 sec)

The results only give out the results for all the cities and lumps them under New York. This is not the desired answer. By the way the only time New York (and New York city in particular) has 78 million people is when they are all on the road in front of me when I am trying to take a taxi to an airport.

With a windowing function it is easy to iterate over a subset of the entire data. Imagine reading the data through a page with a section cut out to form a window that is just the right size to read only the group of rows desired!

SQL> select name, District, Population, sum(Population) over() as p0,
     District, sum(population) over( partition by District) as p1
     from city where CountryCode = 'USA' limit 10;
+------------+----------+------------+----------+----------+---------+
| name       | District | Population | p0       | District | p1      |
+------------+----------+------------+----------+----------+---------+
| Birmingham | Alabama |     242820 | 78625774 | Alabama | 801519 |
| Montgomery | Alabama |     201568 | 78625774 | Alabama | 801519 |
| Mobile     | Alabama |     198915 | 78625774 | Alabama | 801519 |
| Huntsville | Alabama |     158216 | 78625774 | Alabama | 801519 |
| Anchorage | Alaska   |     260283 | 78625774 | Alaska   | 260283 |
| Phoenix    | Arizona |    1321045 | 78625774 | Arizona | 3178903 |
| Tucson     | Arizona |     486699 | 78625774 | Arizona | 3178903 |
| Mesa       | Arizona |     396375 | 78625774 | Arizona | 3178903 |
| Glendale   | Arizona |     218812 | 78625774 | Arizona | 3178903 |
| Scottsdale | Arizona |     202705 | 78625774 | Arizona | 3178903 |
+------------+----------+------------+----------+----------+---------+
10 rows in set (0.0075 sec)

The above query has two windows. The keyword to notice is OVER(). The window defined is OVER() with nothing within the parenthesis as this should be under stood to mean 'the widow is open wide enough to see all the data'. So sum(Population) over() as p0 will give us the sum of all the Population columns and name the column p0.

The second window is defined as sum(population) over (partition by District) as p1 will provide all of the Population of each district summed in a column named p1.

Indianapolis 500 winning Lotus driven by Jimmy Clark. Before this car all other winners of the race had the engine in front of the driver. What does this have to do with Windowing Functions, databases, SQL, or anything else in this blog post. Well, nothing but while at Detroit PHP I stopped by the Henry Ford Museum and took this picture of a very icon car that changed Indy forever. Windowing Functions can change your SQL skills.

Different Types of Windows

The OVER clause has two forms - window_spec and window_name. The window_spec option declares the specified window with the parenthesis. While window_name is a window defined elsewhere in the query. The send window in the above query (where the output is named p1) is a window_spec.

So here is a window_name example:

SQL> SELECT District, Sum(Population) OVER w
    FROM city where CountryCode ='USA'
    WINDOW w AS (partition by District) limit 10;
+----------+------------------------+
| District | Sum(Population) OVER w |
+----------+------------------------+
| Alabama |                 801519 |
| Alabama |                 801519 |
| Alabama |                 801519 |
| Alabama |                 801519 |
| Alaska   |                 260283 |
| Arizona |                3178903 |
| Arizona |                3178903 |
| Arizona |                3178903 |
| Arizona |                3178903 |
| Arizona |                3178903 |
+----------+------------------------+
10 rows in set (0.0032 sec)

The window was given the name w and then defined as WINDOW w AS (partition by District).   By the way the declaration within the window_name is itself as window_spec.

So what is a window_spec??

The definition from the manual (https://dev.mysql.com/doc/refman/8.0/en/window-functions-usage.html) informs that a window_spec is defined as:

window_spec: [window_name] [partition_clause] [order_clause] [frame_clause]

The window_name is an alias that can be used elsewhere in the query.

The partition_clause is how the data is to be divided up into groups. If this is unspecified it makes one big group.

The order_clause provides the sort_order. Remember the ORDER BY from regular SQL queries? This is how you can order the groups.

And the frame_clause determines sub groups within the big group.

And as it so often happens, there is a lot of material to be covered under the frame_clause and that will be in a future blog.

Thursday, June 14, 2018

MongoDB versus MySQL Document Store command comparisons I

Both MongoDB and the MySQL Document Store are JSON document stores. The syntax differences in the two products are very interesting. This long will be a comparison of how commands differ between these two products and may evolve into a 'cheat sheet' if there is demand.

I found an excellent Mongo tutorial Getting Started With MongoDB that I use as a framework to explore these two JSON document stores.

The Data

I am using the primer-dataset.json file that MongoDB has been using for years in their documentation, classes, and examples. MySQL has created the world_x data set based on the world database used for years in documentation, classes and examples. The data set is a collection of JSON documents filled with restaurants around Manhattan.

For the Mongo examples the schema name is test and the collection is named restaurants while the MySQL corollary schema name is nyeats and the collection is named restaurants. I kept the collection names the same between the two products and hope that the differences in schema names causes no problems. Please see my previous entry if you seek details on loading this data into the MySQL Document Store.

Starting the Shells

The first step in comparing how the two work is access the data through their respective shells.

The MySQL mysqlsh connected to the nyeats schema

The MongoDB mongo shell connected to the test schema

I have widows with both shells ready to go and not it is time to start the comparison.

All The Records in a Collection

Both use db as a global variable to point to the current schema. Simply typing db at the command prompt will report back the current active schema for both.

But what if you want to see all the records in the collection restaurants? With both you can issue db.restaurants.find() but where MySQL returns all the documents in the collection Mongo has a pager that requires you to type 'it' to continue?

Find Documents by Cuisine

So lets pick restaurants by their cuisine and since Red Beans and Rice is one of my favorites we will use Cajun as the cuisine of choice. The arguments to the find() function are a JSON object in Mongo and an equation for MySQL.

MySQL: db.restaurants.find("cuisine = 'Cajun'")
Mongo: db.restaurants.find( { "cuisine" : "Cajun" })

The output is show below under 'Output From Cajun Cuisine as it takes up a lot of real estate on a computer screen. The big difference for those who do not want to page down is that MySQL pretty prints the output while Mongo does not. The pretty print is much easier on my old eyes.

Restaurants By Zipcode

How about we look for restaurants in one Zipcode (or postal code for those outside the USA). By the way a Zipcode can cover a lot of territory.

Mongo takes a JSON object as the search parameter while MySQL wants and equation. Note that we are using a second tier key 'address.zipcode' to reach the desired information.

MySQL: db.restaurants.find("address.zipcode = '10075'")
MongoDB: db.restaurants.find( { "address.zipcode": "10075" })

When gt Is Not Great Than >!!!

I wanted to tinker with the above by changing the equal sign to a great than. It is easy to change the equal sign in the MySQL argument to any other relation symbol like <, >, or >= intuitively. I am still working on getting Mongo's $gt to work (Not intuitive or easy).

Logical OR

So far there has not been a whole lot of difference between the two. But now we start to see differences. The or operator for Mongo wants a JSON array with the delimiters inside JSON objects. MySQL looks more like traditional SQL.

MongoDB: db.restaurants.find(
{ $or : [ { "cuisine": "Cajun"}, { "address.zipcode": "10075" } ] } )
MySQL: db.restaurants.find(
"cuisine = 'Cajun' OR address.zipcode = '10075'")

To me the MySQL argument looks more like every other programming language I am used to.

Sorting on Two Keys

Let sort the restaurants by burough and zipcode, both ascending. Mongo is looking for JSON objects with the key name and sort order (1 for ascending, -1 for descending!) while MySQL defaults to ascending on the keys provided.

MongoDB: db.restaurants.find().sort( { "burough" : 1, "address.zipcode" : 1 })
MySQL: db.restaurants.find().sort("burough","address.zipcode")

End of Part I

I am going to spend some time to dive deeper into the differences between the two and especially Mongo's confusing (at least to me) great than expression.

Output From Cajun Cuisine

MySQL:
JS > db.restaurants.find("cuisine = 'Cajun'")
[
    {
        "_id": "00005b2176ae00000000000010ec",
        "address": {
            "building": "1072",
            "coord": [
                -74.0683798,
                40.6168076
            ],
            "street": "Bay Street",
            "zipcode": "10305"
        },
        "borough": "Staten Island",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1408579200000
                },
                "grade": "A",
                "score": 13
            },
            {
                "date": {
                    "$date": 1391644800000
                },
                "grade": "A",
                "score": 9
            },
            {
                "date": {
                    "$date": 1375142400000
                },
                "grade": "A",
                "score": 12
            },
            {
                "date": {
                    "$date": 1338336000000
                },
                "grade": "A",
                "score": 8
            }
        ],
        "name": "Bayou",
        "restaurant_id": "40974392"
    },
    {
        "_id": "00005b2176ae000000000000128a",
        "address": {
            "building": "9015",
            "coord": [
                -73.8706606,
                40.7342757
            ],
            "street": "Queens Boulevard",
            "zipcode": "11373"
        },
        "borough": "Queens",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1420848000000
                },
                "grade": "A",
                "score": 11
            },
            {
                "date": {
                    "$date": 1400457600000
                },
                "grade": "A",
                "score": 7
            },
            {
                "date": {
                    "$date": 1384473600000
                },
                "grade": "A",
                "score": 12
            },
            {
                "date": {
                    "$date": 1370390400000
                },
                "grade": "B",
                "score": 16
            },
            {
                "date": {
                    "$date": 1338249600000
                },
                "grade": "A",
                "score": 7
            }
        ],
        "name": "Big Easy Cajun",
        "restaurant_id": "41017839"
    },
    {
        "_id": "00005b2176ae0000000000002146",
        "address": {
            "building": "90-40",
            "coord": [
                -73.7997187,
                40.7042655
            ],
            "street": "160 Street",
            "zipcode": "11432"
        },
        "borough": "Queens",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1416873600000
                },
                "grade": "A",
                "score": 9
            },
            {
                "date": {
                    "$date": 1384732800000
                },
                "grade": "A",
                "score": 10
            },
            {
                "date": {
                    "$date": 1366070400000
                },
                "grade": "B",
                "score": 16
            },
            {
                "date": {
                    "$date": 1345507200000
                },
                "grade": "B",
                "score": 18
            }
        ],
        "name": "G & L Cajun Grill",
        "restaurant_id": "41336510"
    },
    {
        "_id": "00005b2176ae0000000000002ce7",
        "address": {
            "building": "2655",
            "coord": [
                -74.1660553,
                40.5823983
            ],
            "street": "Richmond Avenue",
            "zipcode": "10314"
        },
        "borough": "Staten Island",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1412035200000
                },
                "grade": "A",
                "score": 10
            },
            {
                "date": {
                    "$date": 1392768000000
                },
                "grade": "B",
                "score": 18
            },
            {
                "date": {
                    "$date": 1371772800000
                },
                "grade": "B",
                "score": 16
            },
            {
                "date": {
                    "$date": 1335916800000
                },
                "grade": "A",
                "score": 11
            },
            {
                "date": {
                    "$date": 1322611200000
                },
                "grade": "A",
                "score": 11
            }
        ],
        "name": "Cajun Cafe & Grill",
        "restaurant_id": "41485811"
    },
    {
        "_id": "00005b2176ae000000000000352d",
        "address": {
            "building": "509",
            "coord": [
                -73.964513,
                40.693846
            ],
            "street": "Myrtle Avenue",
            "zipcode": "11205"
        },
        "borough": "Brooklyn",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1417651200000
                },
                "grade": "A",
                "score": 13
            },
            {
                "date": {
                    "$date": 1386028800000
                },
                "grade": "A",
                "score": 9
            },
            {
                "date": {
                    "$date": 1370390400000
                },
                "grade": "A",
                "score": 4
            },
            {
                "date": {
                    "$date": 1355529600000
                },
                "grade": "A",
                "score": 13
            }
        ],
        "name": "Soco Restaurant",
        "restaurant_id": "41585575"
    },
    {
        "_id": "00005b2176ae0000000000003579",
        "address": {
            "building": "36-18",
            "coord": [
                -73.916912,
                40.764514
            ],
            "street": "30 Avenue",
            "zipcode": "11103"
        },
        "borough": "Queens",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1418256000000
                },
                "grade": "A",
                "score": 10
            },
            {
                "date": {
                    "$date": 1394668800000
                },
                "grade": "A",
                "score": 0
            },
            {
                "date": {
                    "$date": 1375488000000
                },
                "grade": "B",
                "score": 17
            },
            {
                "date": {
                    "$date": 1358467200000
                },
                "grade": "A",
                "score": 10
            },
            {
                "date": {
                    "$date": 1341446400000
                },
                "grade": "A",
                "score": 12
            },
            {
                "date": {
                    "$date": 1324080000000
                },
                "grade": "A",
                "score": 10
            }
        ],
        "name": "Sugar Freak",
        "restaurant_id": "41589054"
    },
    {
        "_id": "00005b2176ae0000000000004172",
        "address": {
            "building": "1433",
            "coord": [
                -73.9535815,
                40.6741202
            ],
            "street": "Bedford Avenue",
            "zipcode": "11216"
        },
        "borough": "Brooklyn",
        "cuisine": "Cajun",
        "grades": [
            {
                "date": {
                    "$date": 1397001600000
                },
                "grade": "A",
                "score": 8
            },
            {
                "date": {
                    "$date": 1365033600000
                },
                "grade": "A",
                "score": 10
            }
        ],
        "name": "Catfish",
        "restaurant_id": "41685267"
    }
]
7 documents in set (0.0488 sec)

Mongo:

db.restaurants.find( { "cuisine" : "Cajun" })
{ "_id" : ObjectId("5b2293b5f46382c40db834ce"), "address" : { "building" : "1072", "coord" : [ -74.0683798, 40.6168076 ], "street" : "Bay Street", "zipcode" : "10305" }, "borough" : "Staten Island", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2014-08-21T00:00:00Z"), "grade" : "A", "score" : 13 }, { "date" : ISODate("2014-02-06T00:00:00Z"), "grade" : "A", "score" : 9 }, { "date" : ISODate("2013-07-30T00:00:00Z"), "grade" : "A", "score" : 12 }, { "date" : ISODate("2012-05-30T00:00:00Z"), "grade" : "A", "score" : 8 } ], "name" : "Bayou", "restaurant_id" : "40974392" }
{ "_id" : ObjectId("5b2293b5f46382c40db8366b"), "address" : { "building" : "9015", "coord" : [ -73.8706606, 40.7342757 ], "street" : "Queens Boulevard", "zipcode" : "11373" }, "borough" : "Queens", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2015-01-10T00:00:00Z"), "grade" : "A", "score" : 11 }, { "date" : ISODate("2014-05-19T00:00:00Z"), "grade" : "A", "score" : 7 }, { "date" : ISODate("2013-11-15T00:00:00Z"), "grade" : "A", "score" : 12 }, { "date" : ISODate("2013-06-05T00:00:00Z"), "grade" : "B", "score" : 16 }, { "date" : ISODate("2012-05-29T00:00:00Z"), "grade" : "A", "score" : 7 } ], "name" : "Big Easy Cajun", "restaurant_id" : "41017839" }
{ "_id" : ObjectId("5b2293b5f46382c40db84528"), "address" : { "building" : "90-40", "coord" : [ -73.7997187, 40.7042655 ], "street" : "160 Street", "zipcode" : "11432" }, "borough" : "Queens", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2014-11-25T00:00:00Z"), "grade" : "A", "score" : 9 }, { "date" : ISODate("2013-11-18T00:00:00Z"), "grade" : "A", "score" : 10 }, { "date" : ISODate("2013-04-16T00:00:00Z"), "grade" : "B", "score" : 16 }, { "date" : ISODate("2012-08-21T00:00:00Z"), "grade" : "B", "score" : 18 } ], "name" : "G & L Cajun Grill", "restaurant_id" : "41336510" }
{ "_id" : ObjectId("5b2293b5f46382c40db850c6"), "address" : { "building" : "2655", "coord" : [ -74.1660553, 40.5823983 ], "street" : "Richmond Avenue", "zipcode" : "10314" }, "borough" : "Staten Island", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2014-09-30T00:00:00Z"), "grade" : "A", "score" : 10 }, { "date" : ISODate("2014-02-19T00:00:00Z"), "grade" : "B", "score" : 18 }, { "date" : ISODate("2013-06-21T00:00:00Z"), "grade" : "B", "score" : 16 }, { "date" : ISODate("2012-05-02T00:00:00Z"), "grade" : "A", "score" : 11 }, { "date" : ISODate("2011-11-30T00:00:00Z"), "grade" : "A", "score" : 11 } ], "name" : "Cajun Cafe & Grill", "restaurant_id" : "41485811" }
{ "_id" : ObjectId("5b2293b5f46382c40db8590d"), "address" : { "building" : "509", "coord" : [ -73.964513, 40.693846 ], "street" : "Myrtle Avenue", "zipcode" : "11205" }, "borough" : "Brooklyn", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2014-12-04T00:00:00Z"), "grade" : "A", "score" : 13 }, { "date" : ISODate("2013-12-03T00:00:00Z"), "grade" : "A", "score" : 9 }, { "date" : ISODate("2013-06-05T00:00:00Z"), "grade" : "A", "score" : 4 }, { "date" : ISODate("2012-12-15T00:00:00Z"), "grade" : "A", "score" : 13 } ], "name" : "Soco Restaurant", "restaurant_id" : "41585575" }
{ "_id" : ObjectId("5b2293b5f46382c40db8596b"), "address" : { "building" : "36-18", "coord" : [ -73.916912, 40.764514 ], "street" : "30 Avenue", "zipcode" : "11103" }, "borough" : "Queens", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2014-12-11T00:00:00Z"), "grade" : "A", "score" : 10 }, { "date" : ISODate("2014-03-13T00:00:00Z"), "grade" : "A", "score" : 0 }, { "date" : ISODate("2013-08-03T00:00:00Z"), "grade" : "B", "score" : 17 }, { "date" : ISODate("2013-01-18T00:00:00Z"), "grade" : "A", "score" : 10 }, { "date" : ISODate("2012-07-05T00:00:00Z"), "grade" : "A", "score" : 12 }, { "date" : ISODate("2011-12-17T00:00:00Z"), "grade" : "A", "score" : 10 } ], "name" : "Sugar Freak", "restaurant_id" : "41589054" }
{ "_id" : ObjectId("5b2293b6f46382c40db86551"), "address" : { "building" : "1433", "coord" : [ -73.9535815, 40.6741202 ], "street" : "Bedford Avenue", "zipcode" : "11216" }, "borough" : "Brooklyn", "cuisine" : "Cajun", "grades" : [ { "date" : ISODate("2014-04-09T00:00:00Z"), "grade" : "A", "score" : 8 }, { "date" : ISODate("2013-04-04T00:00:00Z"), "grade" : "A", "score" : 10 } ], "name" : "Catfish", "restaurant_id" : "41685267" }