Monday, July 23, 2018

Advanced MySQL JSON_TABLE

JSON_TABLE is one of the more complex functions that arrived in MySQL 8.  It takes schemaless JSON data and turns it into a relational table. So your NoSQL data becomes SQL data and you can use good ol' SQL where clauses on that temporary relational table! 

I stated writing about JSON_TABLE here and here last December.  And you can find details on using JSON_TABLE in my book.  The following examples cover what to do when key/value pairs are missing or bad, traversing nested paths, and adding an ordinal number to nested values.  These operations provide a great deal of muscle when wrestling NoSQL data into a relational format.

JSON_TABLE

The first argument to JSON_TABLE is the name of the JSON column in the table.  In the following example '$" denotes the entire JSON data type column or JSON document. 

COLUMNS is used to call out the various key/values and assign them to a temporary 'column' and a definition of that column.  In this example the JSON key Population is being called out as an integer to be named Pop.  Oh, if the values are missing for Population, JSON_TABLE will insert the DEFAULT value, here 999, to the results. Or if there is an error in the data, like a string instead of something integer-ish, it output a NULL, a valid JSON string, or the DEFAULT value, which is used in this case.  And if the field is empty (NULL), a valid JSON string, or the DEFAULT value, as you direct -- '987' in the example that follows.

Example 1 -Handling bad or missing data

This example has four records where two of the records need some direction.  The has a bad value ('fish') and an empty value (no Population key in the JSON column). 

mysql> SELECT name, 
          Info->>"$.Population", 
          Pop FROM city2,   
          JSON_TABLE(Info,"$" COLUMNS 
          ( Pop INT PATH "$.Population" 
          DEFAULT '999' 
          ON ERROR DEFAULT 
          '987' ON EMPTY))  
         AS x1;
+-------+-----------------------+------+
| name  | Info->>"$.Population" | Pop  |
+-------+-----------------------+------+
| alpha | 100                   |  100 |
| beta  | fish                  |  999 |
| delta | 15                    |   15 |
| gamma | NULL                  |  987 |
+-------+-----------------------+------+
4 rows in set, 1 warning (0.00 sec)

So 'fish' becomes '999' and NULL becomes '987'.

Example 2

Nested values in JSON data are very common and there is a NESTED PATH operator to allow extracting those values.  The restaurants data set from the MongoDB world has some ratings scores in an array named 'grades' that are nested.

{"_id": "00005b2176ae0000000000000001",
"name": "Morris Park Bake Shop",
"grades": [
{"date": {"$date": 1393804800000}, "grade": "A", "score": 2}, 
{"date": {"$date": 1378857600000}, "grade": "A", "score": 6}, 
{"date": {"$date": 1358985600000}, "grade": "A", "score": 10}, 
{"date": {"$date": 1322006400000}, "grade": "A", "score": 9}, 
{"date": {"$date": 1299715200000}, "grade": "B", "score": 14}],
"address": {"coord": [-73.856077, 40.848447],
"street": "Morris Park Ave",
"zipcode": "10462", "
"cuisine": "Bakery",
"restaurant_id": "30075445"}

The NESTED PATH operator allows access to each of the grades.

mysql> select aaaa.*  from restaurants,  
             json_table(doc, "$" COLUMNS
             (name char(50) path "$.name",   
              style varchar(50) path "$.cuisine",  
             NESTED PATH '$.grades[*]' COLUMNS 
             (Grading char(10) path "$.grade", 
             Score INT path "$.score"))) 
      as aaaa ;
+--------------------------------+------------+---------+-------+
| name                           | style      | Grading | Score |
+--------------------------------+------------+---------+-------+
| Morris Park Bake Shop          | Bakery     | A       |     2 |
| Morris Park Bake Shop          | Bakery     | A       |     6 |
| Morris Park Bake Shop          | Bakery     | A       |    10 |
| Morris Park Bake Shop          | Bakery     | A       |     9 |
| Morris Park Bake Shop          | Bakery     | B       |    14 |
| Wendy'S                        | Hamburgers | A       |     8 |
| Wendy'S                        | Hamburgers | B       |    23 |
| Wendy'S                        | Hamburgers | A       |    12 |
| Wendy'S                        | Hamburgers | A       |    12 |
| Dj Reynolds Pub And Restaurant | Irish      | A       |     2 |
+--------------------------------+------------+---------+-------+
10 rows in set (0.00 sec)

From here the average scores could be computed with a Windowing Function and a Common Table Expression. A big tip of the hat to LeFred for the following

WITH cte1 AS (SELECT doc->>"$.name" AS 'name',
doc->>"$.cuisine" AS 'cuisine',
        (SELECT AVG(score) FROM 
        JSON_TABLE(doc, "$.grades[*]"
        COLUMNS (score INT PATH "$.score")) as r ) AS avg_score
 FROM restaurants)
 SELECT *, rank() OVER 
  (PARTITION BY cuisine ORDER BY avg_score) AS `rank`
  FROM cte1 
  ORDER by `rank`, avg_score DESC limit 10;


Example 3

Not all the restaurant have the same number of grades. But it is very easy to see the number of grades. We can get ordinal numbers next to each grade with  FOR ORDINALITY


mysql> select aaaa.name, aaaa.ordinal, aaaa.Grading  
       FROM restaurants,  json_table(doc, "$" COLUMNS(
         name char(50) path "$.name",   
         style varchar(50) path "$.cuisine",  
         NESTED PATH '$.grades[*]'    COLUMNS (
           ordinal FOR ORDINALITY, 
           Grading char(10) path "$.grade", 
           Score INT path "$.score"))) 
        as aaaa  limit 10;
+--------------------------------+---------+---------+
| name                           | ordinal | Grading |
+--------------------------------+---------+---------+
| Morris Park Bake Shop          |       1 | A       |
| Morris Park Bake Shop          |       2 | A       |
| Morris Park Bake Shop          |       3 | A       |
| Morris Park Bake Shop          |       4 | A       |
| Morris Park Bake Shop          |       5 | B       |
| Wendy'S                        |       1 | A       |
| Wendy'S                        |       2 | B       |
| Wendy'S                        |       3 | A       |
| Wendy'S                        |       4 | A       |
| Dj Reynolds Pub And Restaurant |       1 | A       |
+--------------------------------+---------+---------+
10 rows in set (0.00 sec)

Conclusion

JSON_TABLE provides a way to structure and manipulate unstructured, NoSQL data.  It is a powerful way to present data at the database level that might hamper an application.  Plus it provides a best of both the SQL and NoSQL worlds that leverages the best of MySQL.