Showing posts with label data. Show all posts

Wednesday, March 24, 2021

From Spreadsheet to Database with MySQL Workbench

In the last post I covered some of the many reasons to migrate data from a spreadsheet to a database and now it is time to show how to do just that. Moving data from a spreadsheet into MySQL can actually be pretty simple in many cases. Now I will not mislead you by telling you that all cases are easy but there is an easy way to make this migration. And I highly recommend the following process when possible as it can save lots of time. More painful migrations will be covered in the future but this post is my favorite 'fast and easy' process.

This is a guide to taking data from a spreadsheet and easily moving that data into a MySQL database instance. The trick is the Import Wizard in MySQL Workbench that does a lot of the detail work for you. In a future post we will go into what you have to do when you are not using Workbench and have many more steps to take to get the job done.

You will find the data for this example at https://catalog.data.gov/dataset/texas-dot-txdot-work-zone-data and this dataset provides lane closure occurrences within the Texas Department of Transportation (TxDOT) highway system in a tabular format. By the way, Texans love to argue about who has the best BBQ and freeway construction. This is a continuously updatid archive of the TxDOT WZDx feed data can be found at ITS WorkZone Raw Data Sandbox and the ITS WorkZone Semi-Processed Data Sandbox. The live feed is currently compliant with the Work Zone Data Exchange (WZDx) Specification version 2.0. But most importantly for this blog is that it is a good example dataset. So please download it and import it into the spreadsheet of your choice.

1. Export Data

There are many ways to covert the data from a spreadsheet to something MySQL can consume. The easiest is to save it in CSV or Comma Separated Variables were every column of data has a comma between it and the next column. In Sheets you will use FILE->Download->CSV , Excel is File->Save As->CSV, and other spread sheet software will have something similar. Please not that for most programs this exports the current sheet and not the entire spreadsheet.

road_event_feed_info_feed_update_date,road_event_feed_info_version,road_event_id,subidentifier,road_name,road_number,direction,beginning_cross_street,ending_cross_street,beginning_milepost,ending_milepost,beginning_accuracy,ending_accuracy,start_date,end_date,start_date_accuracy,end_date_accuracy,event_status,total_num_lanes,vehicle_impact,workers_present,reduced_speed_limit,restrictions,description,issuing_organization,creation_date,update_date,types_of_work,lanes,geometry_linestring,geometry_multipoint

03/17/2021 04:58:33 PM,2,ELP_2019-08-01_00014+2021-03-17,ELP,SH-20,,eastbound,US-54,,,,Estimated,Estimated,03/17/2021 12:00:00 AM,03/17/2021 11:59:59 PM,Estimated,Estimated,,3,some-lanes-closed,,,[],2019 - 08/01 On-Going Closure - Right lane closure,TxDOT,,,[],"[{""lane_edge_reference"": ""left"", ""lane_number"": 1, ""lane_status"": ""open"", ""lane_type"": ""left-lane""}, {""lane_edge_reference"": ""left"", ""lane_number"": 2, ""lane_status"": ""open"", ""lane_type"": ""middle-lane""}, {""lane_edge_reference"": ""left"", ""lane_number"": 3, ""lane_status"": ""closed"", ""lane_type"": ""right-lane""}]",,MULTIPOINT ((-106.445503 31.773454))

The data above shows only the first two rows of data. The first row of data has the column headings and the second row is sample data. Workbench can read the first row and use those columns names to create the column names for the table in the database. It will read the other lines in the table to make an educated guess on the data types for the columns of the table.

2. Import Data

Start MySQL Workbench and login your to instance.

Open a new SQL tab by clicking on the icon indicated below.

Issue a CREATE DATABASE texdot; command in that new SQL tab. Do not forget the semicolon at the end of the line that indicates the end of the Structured Query Language (SQL) command.

Next execute that create database command by hitting the second lightening bolt icon. The second lightening bolt icon executes the SQL under the cursor.

You will see that the command has executed with the 1 row(s) effected message.

Now we have to tell Workbench that we want to use that new database we just created. No, that does not automatically happen. You have to select the little 'eyeball' icon on schema list to reload the list of available schemas (schemas and databases are used interchangeably in the MySQL world).

Now scroll down in the schemas window until you see your new database. Click on 'texdot' to tell Workbench that this is the database we want to use. When you click on it you will see underneath the active schema name in green.

Then select the right pointing arrowhead to expand the information on the schema.

Right click on the Tables line and you will see the Table Data Import Wizard option.

Now comes the neat stuff.

Enter the name of the file where the CSV data resides.

In my case it was in D:\tex\ and I was able to browse for it.

We want to create a new table in the texdot directory

And I was able to enter the name zone for the new table.

The wizard starts to work it's magic. It takes the data in the first row of the CSV file and converts them to file names. It then looks at the rest of the datafile to determine data type.

You will note that you can change them if you want and it will automatically make a best for types like JSON. Select Next when you are ready to proceed.

You are now ready to import. Select Show Logs from the lower left corner and then select Next.

The data is now uploaded.

The time to import is partially a function of the CSV file length but eventually you will get the Import data file finished message. Again select Next.

There are some statistics on the performance of the import that may be of use. In this case we import just over 1600 records. Select Finished.

Hit that eyeball icon again to reload the table table.

And we can now use SQL to query the data. Enter SELECT * FROM texas_dzone LIMIT 5; and again select that second lightening bolt icon. (please note that in an earlier run is uesed the name 'zone' instead of 'texas_dzone')

And the results of that query.

In just a few moments the data went from being in a spreadsheet to being in a database. No muss, no fuss. In the future we will look at datasets that take a little more 'engineering' to get into the database as there are cases where we and not the wizard have to do the 'heavy lifting'.

More on MySQL Workbench

For on MySQL Workbench please see https://www.mysql.com/products/workbench/ to download and https://dev.mysql.com/doc/workbench/en/ for the documentation.

Friday, July 20, 2018

De-Normalization of Your Database with JSON

One of the humbling things about working at Oracle with the various MySQL personnel is that you are often blown away by something one of them says or does. And that is on a regular basis. In this case it is Dr. Charles Bell who gave a great series of presentations last June at the Southeast Linuxfest.

In particular he presented in a full formed state some ideas that had been rattling around in my skull (but no way near as coherent) on how to take advantage of the MySQL JSON data type. Below are his points from his slide deck. I was reviewing my notes from his presentation when I realized that this information really needs to be more widely disseminated. And I would like your feedback on these ideas?

1.. We can use a JSON field to eliminate one of the issues of traditional database solutions: many-to-many-joins

This allows more freedom to store unstructured data (data with pieces missing)
You still use SQL to work with the data via a database connector but the JSON documents in the table can be manipulated directly in code.
Joins can be expensive. Reducing how many places you need to join data can help speed up your queries. Removing joins may result in some level of denormalization but can result in fast access to the data.

2. Plan For Mutability

Schemaless designs are focused on mutability. Build your applications with the ability to modify the document as needed (and within reason)

3. Remove Many-to-Many Relationships

Use embedded arrays and lists to store relationships among documents. This can be as simple as embedding the data in the document or embedding an array of document ids in the document.
In the first case data is available as soon as you can read the document and in the second it only takes one additional step to retrieve the data. In cases of seldom read (used) relationships, having the data linked with an array of ids can be more efficient (less data to read on the first pass)

This presentation and others from the Southeast Linuxfest will be available online and I will be sure to post about that when it happens.

And a big thank you to Dr. Chuck for these ideas.

Wednesday, April 4, 2018

Array Ranges in MySQL JSON

Pretend you have a JSON array of data that looks roughly like the following.

mysql> insert into x(y) values('["a","b","c","d"]');
Query OK, 1 row affected (0.10 sec)

You could get all the values from that array using $[*]


mysql> select y->"$[*]" from x;
+----------------------+
| y->"$[*]"            |
+----------------------+
| ["a", "b", "c", "d"] |
+----------------------+
1 row in set (0.00 sec)

Or the individual members of the array with an index that starts with zero.


mysql> select y->"$[0]" from x;
+-----------+
| y->"$[0]" |
+-----------+
| "a"       |
+-----------+
1 row in set (0.00 sec)

But what about the times you want the last item in the array and really do not want to loop through all the items? How about using 'last'?


mysql> select y->"$[last]" as 'last' from x;
+------+
| last |
+------+
| "d"  |
+------+
1 row in set (0.00 sec)

Holey Moley! What is this? Well Roland Bouman, who was my predecessor on the MySQL Certification Team many years ago, still contributes to MySQL on a regular basis. He put in a (feature request for a JSON range operator.
So please thank Roland when you see him!!
Be sides 'last' there is 'to' too!.

mysql> select y->'$[last - 3 to last - 1]' as 'last three' from x;
+-----------------+
| last three      |
+-----------------+
| ["a", "b", "c"] |
+-----------------+
1 row in set (0.01 sec)

You can also use the ordinal number, here $[1], to strip off the first value of the array from the rest.

mysql> select y->'$[1 to last]' as 'strip first' from x;
+-----------------+
| strip first     |
+-----------------+
| ["b", "c", "d"] |
+-----------------+
1 row in set (0.00 sec)

This makes juggling array data in JSON columns much easier.

Tuesday, January 2, 2018

Two New MySQL Books!

There are two new MySQL books both from Apress Press. One is an in depth master course on the subject and the other is a quick introduction.

ProMySQL NDB Cluster is subtitled Master the MySQL Cluster Lifecycle and at nearly 700 pages it is vital resource to anyone that runs or is thinking about running NDB Cluster. The authors, Jesper Wisborg Krogh and Mikiya Okuno, have distilled their vast knowledge of this difficult subject in a detail packed but easily readable book. MySQL Cluster is much more complex in many areas than a regular MySQL server and here you will find all those details. If you run MySQL NDB Cluster then you need this book. The partitioning information in chapter 2 is worth the price of the book alone. I am only a third of the way through the book and have found it as clear and concise as any technical book I have read and it is actually an easy read.

MariaDB and MySQL Common Table Expressions and Window Functions Revealed by Daniel Bartholomew is a slender introduction to CTEs and Window functions. If you were raised on MySQL and do not know either subject well, then I highly recommend this book. CTEs are going to have a big impact on the way developers write sub queries and may cause self joins to become extinct. Windowing functions will bring a new eave of analytical analysis to MySQL. This book is just over 100 pages and has useful examples for novices in either area.

More books! Well yes, there are more MySQL books in the works so save your pocket change in order to buy them when they appear.

Thursday, August 10, 2017

Handy JSON to MySQL Loading Script

JSON in Flat File to MySQL Database

So how do you load that JSON data file into MySQL. Recently I had this question presented to me and I thought I would share a handy script I use to do such work. For this example I will use the US Zip (postal) codes from JSONAR. Download and unzip the file. The data file is named zips.json and it can not be bread directly into MySQL using the SOURCE command. It needs to have the information wrapped in a more palatable fashion.

head zips.json 
{ "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA", "_id" : "01001" }
{ "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA", "_id" : "01002" }
{ "city" : "BARRE", "loc" : [ -72.10835400000001, 42.409698 ], "pop" : 4546, "state" : "MA", "_id" : "01005" }
{ "city" : "BELCHERTOWN", "loc" : [ -72.41095300000001, 42.275103 ], "pop" : 10579, "state" : "MA", "_id" : "01007" }
{ "city" : "BLANDFORD", "loc" : [ -72.936114, 42.182949 ], "pop" : 1240, "state" : "MA", "_id" : "01008" }
{ "city" : "BRIMFIELD", "loc" : [ -72.188455, 42.116543 ], "pop" : 3706, "state" : "MA", "_id" : "01010" }
{ "city" : "CHESTER", "loc" : [ -72.988761, 42.279421 ], "pop" : 1688, "state" : "MA", "_id" : "01011" }
{ "city" : "CHESTERFIELD", "loc" : [ -72.833309, 42.38167 ], "pop" : 177, "state" : "MA", "_id" : "01012" }
{ "city" : "CHICOPEE", "loc" : [ -72.607962, 42.162046 ], "pop" : 23396, "state" : "MA", "_id" : "01013" }
{ "city" : "CHICOPEE", "loc" : [ -72.576142, 42.176443 ], "pop" : 31495, "state" : "MA", "_id" : "01020" }

Follow the Document Store Example

The MySQL Document Store is designed for storing JSON data and this example will follow its practices by having a two column table -- a JSON column, and another column for a primary key (remember InnoDB wants so badly to have a primary key on each table that it will create one for you but it is better practice to make it yourself; besides we want to search on the zipcode which is labeled as _id in the data. So we use a stored generated column that uses JSON_UNQUOTE(JSON_EXTRACT(doc,"$_id")) and saves that info in a column named zip.

So a simple table is created and it looks like this:

mysql> desc zipcode\g
+-------------+-------------+------+-----+---------+-------------------+
| Field       | Type        | Null | Key | Default | Extra             |
+-------------+-------------+------+-----+---------+-------------------+
| doc         | json        | YES  |     | NULL    |                   |
| zip         | char(5)     | NO   | PRI | NULL    | STORED GENERATED  |
+-------------+-------------+------+-----+---------+-------------------+
2 rows in set (0.00 sec)

Handy Script

So now we have the data, we have the table, and now we need to convert the data into something MySQL can use to laod the data.

Bash is one of those shells with so many rich built-in tools that is hard to remember them all. But it does have a hand read line feature that can be used for the task.


#!/bin/bash
file="/home/dstokes/Downloads/zips.json"
while IFS= read line
do
 echo "INSERT INTO zipcode (doc) VALUES ('$line');"
done <"$file"

Run the script and output the data to a file named foo, ./loader.sh > foo. The output shows how the data is wrapped:

$head foo
INSERT INTO zipcode (doc) VALUES ('{ "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA", "_id" : "01001" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA", "_id" : "01002" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "BARRE", "loc" : [ -72.10835400000001, 42.409698 ], "pop" : 4546, "state" : "MA", "_id" : "01005" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "BELCHERTOWN", "loc" : [ -72.41095300000001, 42.275103 ], "pop" : 10579, "state" : "MA", "_id" : "01007" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "BLANDFORD", "loc" : [ -72.936114, 42.182949 ], "pop" : 1240, "state" : "MA", "_id" : "01008" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "BRIMFIELD", "loc" : [ -72.188455, 42.116543 ], "pop" : 3706, "state" : "MA", "_id" : "01010" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CHESTER", "loc" : [ -72.988761, 42.279421 ], "pop" : 1688, "state" : "MA", "_id" : "01011" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CHESTERFIELD", "loc" : [ -72.833309, 42.38167 ], "pop" : 177, "state" : "MA", "_id" : "01012" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CHICOPEE", "loc" : [ -72.607962, 42.162046 ], "pop" : 23396, "state" : "MA", "_id" : "01013" }');
INSERT INTO zipcode (doc) VALUES ('{ "city" : "CHICOPEE", "loc" : [ -72.576142, 42.176443 ], "pop" : 31495, "state" : "MA", "_id" : "01020" }');

So now the data can be loaded with mysql -u itisme test < foo.

Tuesday, July 18, 2017

Using find() with the MySQL Document Store

The Video

The find() function for the MySQL Document Store is a very powerful tool and I have just finished a handy introductory video. By the way -- please let me have feed back on the pace, the background music, the CGI special effects (kidding!), and the amount of the content.

The Script

For those who want to follow along with the videos, the core examples are below. The first step is to connect to a MySQL server to talk to the world_x schema (Instructions on loading that schema at the first link above).

\connect root@localhost/world_x

db is an object to points to the world_x schema. To find the records in the countryinfo collection, use db.countryinfo.find(). But that returns 237 JSON documents, too many! So lets cut it down to one record by qualifying that we only want the record where the _id is equal to USA, db.countryinfo.find(‘_id = “USA”’)

Deeper Level items in the Document

You can reach deeper level items by providing the path to the object such as db.countryinfo.find(‘geography.Continent = “North America”). Be sure to remember Case Sensitivity!!

Limits, Offsets, and Specifications

It is very simple to place restrictions on the amount of output by post pending .limit(5).skip(6). And you can specify which parameters meet you specification by using find(‘GNP > 8000000’)

Limiting Fields

But what if you do not want all the document but just a few certain fields. Then post pend .fields([“Name”, “GNP”]) to find().

And once again you can dig deeper into the document with specifying the path, such as.fields([“Name”, “GNP”, “geography.Continent”]).sort(“GNP”).

Sorting

Yes, you can easily sort the output from find() by adding .sort(“GNP”,”Name”) at the end.

Parameter Passed Values

And find of parameter passed values will be happy to find they have not been forgotten. db.countryinfo.find(“Name = :country”).bind(“Country”,”Canada”)

Sunday, July 16, 2017

MySQL Document Store Video Series

I am starting a series of videos on the MySQL Document Store. The Document Store allows those who do not know Structured Query Language (SQL) to use a database without having to know the basics of relational databases, set theory, or data normalization. The goal is to have sort 2-3 minute episodes on the various facets of the Document Store including the basics, using various programming languages (Node.JS, PHP, Python), and materializing free form schemaless, NoSQL data into columns for use with SQL.

The first Episode, Introduction, can be found here.

Please provide feedback and let me know if there are subjects you would want covered in the near future.

Thursday, August 4, 2016

MySQL Document Store -- The NoSQL Zipcodes

The MySQL Document Store functionality allows developers to use a relation database with or without SQL (structured Query Language), also known as NoSQL. The example in this blog is hopefully a simple look at this new feature of MySQL. The example data used is from JSONStudio.com and is a JSON formatted data set for US zip (postal) codes (656K compressed). So download your copy of this data set and lets get to work.

Create a collection

Collections are tables and below we create a collection name 'zip' in the test database in the Python dialect.


mysqlsh -u root -p --py test
Creating an X Session to root@localhost:33060/test
Enter password:
Default schema `test` accessible through db.

Welcome to MySQL Shell 1.0.4 Development Preview

Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type '\help', '\h' or '\?' for help.

Currently in Python mode. Use \sql to switch to SQL mode and execute queries.
mysql-py> db.createCollection("zip")

Is it there?

As soon as most of use create a table we want to see if it is there.


mysql-py> db.getCollections();
[
    <Collection:zip>
]
mysql-py>

So it is there. But what is the underlying structure of this table. Switch to SQL dialect (or open a mysql client.


mysql> SHOW CREATE TABLE zip;
+-------+--------------------------------+
| Table | Create Table                                                                                                                                                                                                       |
+-------+--------------------------------+
| zip   | CREATE TABLE `zip` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,
  PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+-------+--------------------------------+
1 row in set (0.00 sec)

mysql>

If you peeked at the zip code file you downloaded, you may have noticed that it has an _id field already. But what if your data set has no _id or you want to use another key/value pair from the data as an index? Simply use a stored generated column on the field of your choice. Remember good indexing practices still count as the underlying relational database still has to keep the infrastructure underneath up to date.

Loading data

I will skip over the loading of the zip code data (I can address that in a later blog post if there is any interest. For now lets take it as a given that the data has been moved into the new collection.

Finding a Rainbow

So lets look for a particular zip code. For out data set the zip code corresponds with _id field.And remember that this column is a generated column using that field from the JSON document.


mysql-py> db.zip.find("_id = '76077'")
[
    {
        "_id": "76077",
        "city": "RAINBOW",
        "loc": [
            -97.70652,
            32.281216
        ],
        "pop": 722,
        "state": "TX"
    }
]
1 document in set (0.00 sec)

mysql-py>

How About Searching a Non-indexed JSON data

Lets look for the state of Texas, or TX in the JSON data. Previous we had the _ID field as a materialized column extracted from the JSON data. Now we are asking the MySQL server to read all the records and return the ones meeting the criteria. This does perform a full table scale of the data (not as efficient as as index) but, thanks to the relatively small amount of records, it does return fairly quickly.


mysql-py> db.zip.find("state = 'TX'")
.
.  (Omitted)

.  
{
        "_id": "79935",
        "city": "EL PASO",
        "loc": [
            -106.330258,
            31.771847
        ],
        "pop": 20465,
        "state": "TX"
    },
    {
        "_id": "79936",
        "city": "EL PASO",
        "loc": [
            -106.30159,
            31.767655
        ],
        "pop": 52031,
        "state": "TX"
    }
]
1676 documents in set (0.06 sec)

mysql-py>

Wrap Up

So now we can create a collection and search it. But what happens when we add records and especially records without our index-able key? That will be covered in another blog soon.

Thursday, July 28, 2016

Is the new MySQL Document Store and JSON data type cheating?

Is it cheating? Is using MySQL without Structured Query Language (SQL) or putting all your data into one column proper? Impossible a year ago and probably thought as a poor/crazy practice until recently, this is a new type of MySQL usage. NoSQL has had a big impact in the SQL world with several relational products from vendors like MySQL, Microsoft, Postgresql and others offering NoSQL interfaces and JSON data types to their databases.

Several old timers have come to me asking if putting lots of data in a JSON column is cheating? After all data normalization is part of relational databases and the way to efficiency and speed is well organized data. This higgledy–piggledy fashion of putting an entire document in a column without breaking it down to its component sections does violate the first rule of data normalization. And that has worked for decades pretty well.

But things change.

Many new development projects start with no idea what their data is going to look like. Maybe, if they are lucky, they will pick an API for interchanging data. Some will decide to use a database as a persistent message queue. Sometimes the service is product and the products offered by that service are nebulous at beast at the onset so the developers do not want to be locked into a schema.

And schema changes get expensive quickly. Code written before the change needs to be updated and you have to get the DBA to run an ALTER TABLE that can take too long to complete. And rolling back is almost impossible.

Compounding this is the lack of database skills in new developers. Very few have any training in Structured Query Language, relational theory, sets, data normalization, or other skills that have been held in high esteem since the days of Codd. And these new developers argue that they do not need these skills as technology and approaches to coding have evolved. So if the foundation of relational databases is the weak link then remove it.

The JSON data type has been very popular with developers. In the past you could dump JSON documents in a CHAR column and used REGEX or have your application dig out the parts you needed from within the document. Now with MySQL 5.7 there is a native JSON data type with supporting functions so that developers can manipulate the document data easily. So storing data in a JSON format in a column of a table within a database becomes a valuable has becomes a useful part of a developers life.

Ruby on Rails was the first piece of software that allowed many developers to see the power of CRUD (Create Read Update Delete) access to a database without needing to know SQL. The MySQL Document Store features that arrived with 5.7.13 also provide CRUD so developers can use 'collections' for schema-less data storage. The developers do not need to know SQL. relational theory, or any of the other old stuff. The good news for the old timers is that the data may be stored in a schema-less fashion but data is stored in a good ol' MySQL table. Which means the decades spent gathering SQL skills still work on this new data.

But is it cheating? Yes, and no. Yes in that you would get better performance out of having all your data in fourth or fifth normalized form just like a good driver can get better performance out of good driving practices with a manual transmission. No in that the data is still in a relational database and pertinent information can still be pried out of the JSON data. Heck you can even make materialized columns from the JSON data to create indexes.

Wednesday, June 29, 2016

MySQL Password Security Changes for PHP Developers

MySQL 5.7 introduced many new facets to password security. The first thing most notice is that you are assigned a random root password at installation time. You then have to search the log file for this random password, use it to login, and then change it. For the examples on the post I am using a fresh install of 5.7.13 on Oracle Linux 7.1 and was provided with the easy to remember password of nLvQRk7wq-NY which to me looked like I forgot to hit escape when trying to get out of vim. A quick ALTER USER to change the password and you are on your way.

Defaults

Password Lifetime and Complexity

5.7.13 now has the default password lifetime set to 0 or 'never expire'. My fresh install shows that the value of mysql.user.password_lifetime is set to NULL which means use the server default value. The lifetime is measured in days and stored in the password_last_changed column of the nysql.users table. If the password is expired, you are put into sandbox mode where the only command you can execute is to change the password. That works great for interactive users. But what about your application? It uses a username password pair to talk to the database but it is very unlikely that anyone planned on changing passwords upon expiration. I seriously doubt anyone has set up the exception routine to handle an expired password properly. And if so, how do you notify all involved about this new password --- securely.

What to do

The best thing would be to set the default password lifetime for accounts used by applications to zero. It simply does not expire. QED & out.

But what if your company wants ALL password changed on a regular basis? And they do mean ALL. Earlier there was a listing of the defaults. The test system are set to a password length of eight characters minimum, requires mixed case, requires at least one upper case letter, one special (nonalphanumeric) character, and is of MEDIUM complexity.

MEDIUM complexity means that passwords need one numeric, one lower case, one upper case, and one special character. LOW tests the password length only. And STRONG adds a condition that sub strings of the length of four characters or long do not match entries in a specified password file (use to make sure swear words, common names, etcetera are not part of a password).

Lets create a dummy account.

CREATE USER 'foobar'@'Localhost' IDENTIFIED BY 'Foo@Localhost1' PASSWORD EXPIRE;

Checking the entry in the user table, you will find that the account's password is expired. For extra credit notice what the authentication string is set to. We can't have just a password string as some authentication tokens or hashes are not really password.

So login as foobar and you will get a notice that the password must be reset before we can do anything else.

ALTER USER 'foobar'@'localhost' IDENTIFIED By '1NewP@assword';

Corporate Standard

Your corporate rules may require you to rotate password every N days and set the corresponding complexity. With MySQL 5.7 you can follow what their model is. If you do not have a standard and want to create one, be sure to DOCUMENT well what your standard is and make sure that standard is well known.

There are ways to use packages like PAM or LDAP for authentication but that is for another day.