Monday, July 22, 2019

JSON Schema Validation with MySQL 8.0.17

JSON has become the standard document interchange format over the last several years.  MySQL 5.7 added a native JSON data type and it has been  greatly enhanced with version 8.0.  But many in the relational world have complained the the NoSQL approach does not allow you to have rigor on your data. That is to make sure an integer value is really an integer and within specified ranges or string of the proper length. And there was no way to make sure that email addresses are not listed under a combination of E-mail, e-mail, eMail, and eMAIL.  JSON is great for many things but traditional, normalized data was better for making certain that your data matched what was specified.

If only there was a way to enforce come rigor on JSON data! Or a way to annotate (pronounced 'document') your JSON data. Well there is. MySQL 8.0.17 has added the ability to validate JSON documents against a schema following the guidelines of the JSON-Schema.org's fourth draft standard. You can find both the manual page 12.17.7 JSON Schema Validation Functions and the JSON Schema information online.


Valid JSON and Really Valid JSON


As you are probably already aware, MySQL will reject an invalid JSON document when using the JSON data type.  But there is a difference between syntactically valid and validation against a schema. With schema validation you can define how the data should be formatted. This will help with automated testing and help ensure the quality of your data.


Overly Simple Example


Lets create a simple document schema that looks at a key named 'myage' and set up rules that the minimum value is 28 and the maximum value is 99.

set @s='{"type": "object",
     "properties": {
       "myage": {
       "type" : "number",
       "minimum": 28,
       "maximum": 99
   }
}
}';

And here is our test document where we use a value for 'myage' what is between the minimum and the maximum.

set @d='{  "myage": 33}';

Now we use JSON_SCHEMA_VALID() to test if the test document passes the validation test, with 1 or true as a pass and 0 or false as a fail.

select JSON_SCHEMA_VALID(@s,@d);
+--------------------------+
| JSON_SCHEMA_VALID(@s,@d) |
+--------------------------+
|                        1 |
+--------------------------+
1 row in set (0.00 sec)


Now try with a non-numeric value.

set @d='{  "myage": "foo"}';
Query OK, 0 rows affected (0.00 sec)

mysql> select JSON_SCHEMA_VALID(@s,@d);
+--------------------------+
| JSON_SCHEMA_VALID(@s,@d) |
+--------------------------+
|                        0 |
+--------------------------+

And a value below the minimum.

mysql> set @d='{  "myage": 16}';
Query OK, 0 rows affected (0.00 sec)

mysql> select JSON_SCHEMA_VALID(@s,@d);
+--------------------------+
| JSON_SCHEMA_VALID(@s,@d) |
+--------------------------+
|                        0 |
+--------------------------+
1 row in set (0.00 sec)

Validity Report

We can use JSON_SCHEMA_VALIDATION_REPORT() to get more information on why a document is failing with JSON_SCHEMA_VALID().

mysql> select JSON_SCHEMA_VALIDATION_REPORT(@s,@d)\G
*************************** 1. row ***************************
JSON_SCHEMA_VALIDATION_REPORT(@s,@d): {"valid": false, "reason": "The JSON document location '#/myage' failed requirement 'minimum' at JSON Schema location '#/properties/myage'", "schema-location": "#/properties/myage", "document-location": "#/myage", "schema-failed-keyword": "minimum"}
1 row in set (0.00 sec)

And, you should note, the response is in JSON format. And you can neaten the output up with JSON_PRETTY() wrapped around the above query.


select JSON_PRETTY(JSON_SCHEMA_VALIDATION_REPORT(@s,@d))\G
*************************** 1. row ***************************
JSON_PRETTY(JSON_SCHEMA_VALIDATION_REPORT(@s,@d)): {
  "valid": false,
  "reason": "The JSON document location '#/myage' failed requirement 'minimum' at JSON Schema location '#/properties/myage'",
  "schema-location": "#/properties/myage",
  "document-location": "#/myage",
  "schema-failed-keyword": "minimum"
}

Required Keys


If you want to make sure certain keys are included in a document, you can use a the required option in your schema definition.  So if you are working with GIS information, you can specify requiring longitude and latitude.

""required": ["latitude", "longitude"]


So we can no have required fields and specify their value ranges. And we can verify BEFORE committing the JSON document to the MySQL server that the data conforms to our schema.





Using JSON SCHEMA Validation with Check Constraint

SO the next logical step is to use the CONSTRAINT CHECK option on table creation to assure that we are not only getting a valid JSON document but a verified JSON document.

 
CREATE TABLE `testx` (
`col` JSON,
CONSTRAINT `myage_inRange`
CHECK (JSON_SCHEMA_VALID('{"type": "object",
"properties": {
"myage": {
"type" : "number",
"minimum": 28,
"maximum": 99
}
},"required": ["myage"]
}', `col`) = 1)
);


And the proof that it works.

mysql> insert into testx values('{"myage":27}');
ERROR 3819 (HY000): Check constraint 'myage_inRange' is violated.
mysql> insert into testx values('{"myage":97}');
Query OK, 1 row affected (0.02 sec)
 


So two of the big criticisms on using JSON in a relational database are now gone. We can add rigor and value checks.  While not as easy to do as with normalized  relational data, this is a huge win for those using JSON.
 


More on JSON Schema


I highly recommend going through the basics of JSON Schema as they is a lot of material that can not be covered in a simple blog.