Wednesday, October 12, 2016

NomadPHP and MySQL JSON Lightening Talk

I will be presenting a short lightening talk on MySQL's JSON Data Type for NomandPHP on Thursday the 13th of October
. Chuck Reeves will be doing the main presentation on How the 3rd Normal Form Destroyed a Family. Get your tickets here.

Friday, August 12, 2016

Moving data from MongoDB to MySQL's JSON Document Store

I had an interesting phone call from someone wanting to move from MongoDB to MySQL's new JSON Document Store. The big question was 'How do I export my Mongo data into something I can read into MySQL?"

The good news is that the folks at Mongo have given us the tools for the hard part of a simple process. For this example I am going to use some test data create previously in a database creatively named dave. The numbers are made up and I am using my name and the names of the canines residing in my home. So a quick peek at the data:

$ mongo
MongoDB shell version: 3.2.8
connecting to: test
> use dave
switched to db dave
> db.dave.find()
{ "_id" : 123, "name" : "Dave" }
{ "_id" : 456, "name" : "Jack" }
{ "_id" : 789, "name" : "Dexter" }
{ "_id" : 787, "name" : "Boo" }

Dumping That Data

First use mongodump -d dave to write out the data much as you would with mysqldump. Under you current working directory of your shell (assuming you are on Unix/Linux) there will be created a directory named dump. And under dump is a directory named after the example database dave.

A dave.bson file was created with the data.

BSON to Text

Executing bsondump dave.bson > output.file will convert the MongoDB BSON formatted data into something much easier to work with.

$ cat output.file 

The output.file can be processed in a number of ways to pull it into MySQL such as using your favorite text editor to wrap insert statements around the data or using a simple program that reads a line from the text file and then send data to the database.

Thursday, August 4, 2016

MySQL Document Store -- The NoSQL Zipcodes

The MySQL Document Store functionality allows developers to use a relation database with or without SQL (structured Query Language), also known as NoSQL. The example in this blog is hopefully a simple look at this new feature of MySQL. The example data used is from and is a JSON formatted data set for US zip (postal) codes (656K compressed). So download your copy of this data set and lets get to work.

Create a collection

Collections are tables and below we create a collection name 'zip' in the test database in the Python dialect.

mysqlsh -u root -p --py test
Creating an X Session to root@localhost:33060/test
Enter password:
Default schema `test` accessible through db.

Welcome to MySQL Shell 1.0.4 Development Preview

Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type '\help', '\h' or '\?' for help.

Currently in Python mode. Use \sql to switch to SQL mode and execute queries.
mysql-py> db.createCollection("zip")

Is it there?

As soon as most of use create a table we want to see if it is there.

mysql-py> db.getCollections();
So it is there. But what is the underlying structure of this table. Switch to SQL dialect (or open a mysql client.

| Table | Create Table                                                                                                                                                                                                       |
| zip   | CREATE TABLE `zip` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,
  PRIMARY KEY (`_id`)
1 row in set (0.00 sec)

If you peeked at the zip code file you downloaded, you may have noticed that it has an _id field already. But what if your data set has no _id or you want to use another key/value pair from the data as an index? Simply use a stored generated column on the field of your choice. Remember good indexing practices still count as the underlying relational database still has to keep the infrastructure underneath up to date.

Loading data

I will skip over the loading of the zip code data (I can address that in a later blog post if there is any interest. For now lets take it as a given that the data has been moved into the new collection.

Finding a Rainbow

So lets look for a particular zip code. For out data set the zip code corresponds with _id field.And remember that this column is a generated column using that field from the JSON document.

mysql-py>"_id = '76077'")
        "_id": "76077",
        "city": "RAINBOW",
        "loc": [
        "pop": 722,
        "state": "TX"
1 document in set (0.00 sec)


How About Searching a Non-indexed JSON data

Lets look for the state of Texas, or TX in the JSON data. Previous we had the _ID field as a materialized column extracted from the JSON data. Now we are asking the MySQL server to read all the records and return the ones meeting the criteria. This does perform a full table scale of the data (not as efficient as as index) but, thanks to the relatively small amount of records, it does return fairly quickly.

mysql-py>"state = 'TX'")
.  (Omitted)

        "_id": "79935",
        "city": "EL PASO",
        "loc": [
        "pop": 20465,
        "state": "TX"
        "_id": "79936",
        "city": "EL PASO",
        "loc": [
        "pop": 52031,
        "state": "TX"
1676 documents in set (0.06 sec)


Wrap Up

So now we can create a collection and search it. But what happens when we add records and especially records without our index-able key? That will be covered in another blog soon.

Thursday, July 28, 2016

Is the new MySQL Document Store and JSON data type cheating?

Is it cheating? Is using MySQL without Structured Query Language (SQL) or putting all your data into one column proper? Impossible a year ago and probably thought as a poor/crazy practice until recently, this is a new type of MySQL usage. NoSQL has had a big impact in the SQL world with several relational products from vendors like MySQL, Microsoft, Postgresql and others offering NoSQL interfaces and JSON data types to their databases.

Several old timers have come to me asking if putting lots of data in a JSON column is cheating? After all data normalization is part of relational databases and the way to efficiency and speed is well organized data. This higgledy–piggledy fashion of putting an entire document in a column without breaking it down to its component sections does violate the first rule of data normalization. And that has worked for decades pretty well.

But things change.

Many new development projects start with no idea what their data is going to look like. Maybe, if they are lucky, they will pick an API for interchanging data. Some will decide to use a database as a persistent message queue. Sometimes the service is product and the products offered by that service are nebulous at beast at the onset so the developers do not want to be locked into a schema.

And schema changes get expensive quickly. Code written before the change needs to be updated and you have to get the DBA to run an ALTER TABLE that can take too long to complete. And rolling back is almost impossible.

Compounding this is the lack of database skills in new developers. Very few have any training in Structured Query Language, relational theory, sets, data normalization, or other skills that have been held in high esteem since the days of Codd. And these new developers argue that they do not need these skills as technology and approaches to coding have evolved. So if the foundation of relational databases is the weak link then remove it.

The JSON data type has been very popular with developers. In the past you could dump JSON documents in a CHAR column and used REGEX or have your application dig out the parts you needed from within the document. Now with MySQL 5.7 there is a native JSON data type with supporting functions so that developers can manipulate the document data easily. So storing data in a JSON format in a column of a table within a database becomes a valuable has becomes a useful part of a developers life.

Ruby on Rails was the first piece of software that allowed many developers to see the power of CRUD (Create Read Update Delete) access to a database without needing to know SQL. The MySQL Document Store features that arrived with 5.7.13 also provide CRUD so developers can use 'collections' for schema-less data storage. The developers do not need to know SQL. relational theory, or any of the other old stuff. The good news for the old timers is that the data may be stored in a schema-less fashion but data is stored in a good ol' MySQL table. Which means the decades spent gathering SQL skills still work on this new data.

But is it cheating? Yes, and no. Yes in that you would get better performance out of having all your data in fourth or fifth normalized form just like a good driver can get better performance out of good driving practices with a manual transmission. No in that the data is still in a relational database and pertinent information can still be pried out of the JSON data. Heck you can even make materialized columns from the JSON data to create indexes.

Wednesday, July 6, 2016

How to make sure that 'password' is not a valid MySQL password

Password management is an issue for many. It is not uncommon for an organization to require you to changing your password on a regular basis and many have rules on the length and content of passwords. The length and complexity (that require certain amounts of upper & lower case characters, special characters, and numerals) rules try to prevent users from using '12345', 'qwerty', or even the word 'password' itself as passwords. With MySQL 5.7 you can filter out those bad passwords, obscenities, slurs, or other words you do not want used as or part of a password.

If the STRONG password policy is set in the my.cnf file, a validate_password_dictionary_file can be specified and you can use your favorite text editor to add the words or phrases you wish to not be valid in that file. Note that words shorter than four characters are ignored. So create a file with a text editor ( mine was placed in /var/lib/mysql/dit) with the banned words, edit the my.cnf file to set the validate_password_policy to STRONG, and also add the path to your dictionary file to validate_password_dictionary_file line of the same file. Restart your server and test.

My test dictionary had words like 'foobar', 'snafu', and 'password' and trying to use password with one of the words in the dictionary file would generate a ERROR 1819 (HY000): Your password does not satisfy the current policy requirement, even if I mixed the case of the various letters in the restricted words.

MySQL 5.7 also added the ability to set lifetimes for password in age, the ability to lock accounts, and stopped adding anonymous accounts (no user name and no password) during installation.

By the way I will be speaking in Detroit at the Converge Conference on MySQL 5.7 security if you would like to know more about this and other MySQL 5.7 related information.

Wednesday, June 29, 2016

MySQL Password Security Changes for PHP Developers

MySQL 5.7 introduced many new facets to password security. The first thing most notice is that you are assigned a random root password at installation time. You then have to search the log file for this random password, use it to login, and then change it. For the examples on the post I am using a fresh install of 5.7.13 on Oracle Linux 7.1 and was provided with the easy to remember password of nLvQRk7wq-NY which to me looked like I forgot to hit escape when trying to get out of vim. A quick ALTER USER to change the password and you are on your way.


Password Lifetime and Complexity

5.7.13 now has the default password lifetime set to 0 or 'never expire'. My fresh install shows that the value of mysql.user.password_lifetime is set to NULL which means use the server default value. The lifetime is measured in days and stored in the password_last_changed column of the nysql.users table. If the password is expired, you are put into sandbox mode where the only command you can execute is to change the password. That works great for interactive users. But what about your application? It uses a username password pair to talk to the database but it is very unlikely that anyone planned on changing passwords upon expiration. I seriously doubt anyone has set up the exception routine to handle an expired password properly. And if so, how do you notify all involved about this new password --- securely.

What to do

The best thing would be to set the default password lifetime for accounts used by applications to zero. It simply does not expire. QED & out.

But what if your company wants ALL password changed on a regular basis? And they do mean ALL. Earlier there was a listing of the defaults. The test system are set to a password length of eight characters minimum, requires mixed case, requires at least one upper case letter, one special (nonalphanumeric) character, and is of MEDIUM complexity.

MEDIUM complexity means that passwords need one numeric, one lower case, one upper case, and one special character. LOW tests the password length only. And STRONG adds a condition that sub strings of the length of four characters or long do not match entries in a specified password file (use to make sure swear words, common names, etcetera are not part of a password).

Lets create a dummy account.

CREATE USER 'foobar'@'Localhost' IDENTIFIED BY 'Foo@Localhost1' PASSWORD EXPIRE;

Checking the entry in the user table, you will find that the account's password is expired. For extra credit notice what the authentication string is set to. We can't have just a password string as some authentication tokens or hashes are not really password.

So login as foobar and you will get a notice that the password must be reset before we can do anything else.

ALTER USER 'foobar'@'localhost' IDENTIFIED By '1NewP@assword';

Corporate Standard

Your corporate rules may require you to rotate password every N days and set the corresponding complexity. With MySQL 5.7 you can follow what their model is. If you do not have a standard and want to create one, be sure to DOCUMENT well what your standard is and make sure that standard is well known.

There are ways to use packages like PAM or LDAP for authentication but that is for another day.

Friday, May 27, 2016

Generated MySQL Columns and Changing Values

I was speaking at PHP[Tek] this week on the JSON Data Type and using generated columns. JSON columns can not be indexed but data from a JSON column can be extracted via a generated column and that column can be indexed. All was going well until someone asked me about modifying data in a generated column. Was it possible?

I blinked hard. I have not tried that! I had not seen any mention of that in the documentation. So I had to admit that I did not know and would have to try that.

The Test

mysql> CREATE TABLE gentest (a INT, b INT AS (a + 1) STORED, INDEX(b));
mysql> INSERT INTO gentest VALUES (1),(2),(3),(4);
So now we have a table with data to test. So lets try to modify the value of one of the generated columns.

mysql> UPDATE gentest SET b = 9 WHERE a = 1;

And what did the server do?

It returned an error and told me The value specified for generated column 'b' in table 'gentest' is not allowed.

Lesson Learned

So now I know that the server will not allow you to mangle, er, change the value of generated columns away from their definition. And yes, I find the same thing with both the VIRTUAL and STORED versions of generated columns.