Thursday, September 27, 2018

MySQL Track at Southern California Linux Expo 2019 CFP Open

The Call For Papers  for the next Southern California Linux Expo aka SCaLE is open and I need your help with the MySQL Track.  We have had a MySQL track for the past few years and this year I have gotten permission from the organizers of SCaLE to get a group of MySQL community members to review the talks for this track.  This year was pretty good but to make 2019 even better we need more submissions from more people AND THIS MEANS YOU!!

The link above has the details on how to register to submit a talk submission and the process is fairly simple.  But if you would like help with your submission, want to 'rubber duck' ideas, or want a quick review before you submission please contact me  (@stoker, david.stokes @ Oracle.com) or find me at a show.

So what type of talks do we need?  We need to cover material for newbies, intermediate, and the experienced.  There are lots of developers, Devops folks, hobbyists, and students just waiting for you to share your knowledge with them. For a large show, we have a fairly intimate talk space which is friendly to first time presenters.

Some ideas:

1. So how do you do backups and restores?  What sort of things get deleted and what steps do you take to keep your data safe? Policies? Procedures?  How does a developer who fudged a row/table get their data back in your organization?

2. How do you reduce SQL injects, the N+1 problem and 0(N) searches in your code?

3. How is your Kubernetes/Docker/Ansible environment for your developers managed?

4. Five things about being an 'accidental' MySQL DBA I wish I knew when I took over the databases

5. MySQL Replication best practices.

6. Data normalization how-to

7. Query optimization for novices. Query optimization for the advanced SQL coder.

And anything else  you think that at least one other person would like to hear about (and there are more - believe me).

So please make SCaLE 17x even better! You have until the end of October to submit.

Thursday, September 6, 2018

JSON Paths and the MySQL JSON Functions

I wrote MySQL and JSON: A Practical Programming Guide to help developers find their way around the MySQL JSON data type and the supporting functions. The MySQL Documentation on the subject is very good but I had to puzzle through the examples to see how things worked.  I might be a bit 'thick' but good examples always make things easier.  Others seem to have similar difficulties.
MySQL and JSON a Practical Programming Guide should be on your desk as a handy reference to MySQL's JSON data type.

 There was a recent post on Stackoverflow.com where someone had this JSON document:

{  
   "textures":[  
      {  
         "label":"test",
         "types":{  
            "t_1":0,
            "t_2":0
         }
      },
      {  
         "label":"KEK",
         "types":{  
            "t_1":0,
            "t_2":0
         }
      }
   ],
   "weapons":[  
      {  
         "name":"WW_SHT",
         "ammo":0
      },
      {  
         "name":"WW_DSS",
         "ammo":0
      }
   ]
}

And they wanted want to update t_1 to change value from 0 to 1.  I will not repost their code but to my eyes it looked convoluted. 

So How Do You Get There From Here?

Trying to figure out how to get down to a key or value is easy.  To see the top level keys simply use JSON_KEYS()

SELECT JSON_KEYS(doc) FROM zz1 LIMIT 1;

'[\"weapons\", \"textures\"]'

But how to get deeper??

By using select doc->>"$.textures[*]" from zz1 limit 1; we get all the info under the textures key.

[{"label": "test", "types": {"t_1": 0, "t_2": 0}}, {"label": "KEK", "types": {"t_1": 0, "t_2": 0}}]

Okay so we are getting closer to the target!    Now to take one more step closer with select doc->"$.textures[*].types" from zz1 limit 1;

[{"t_1": 0, "t_2": 0}, {"t_1": 0, "t_2": 0}]

I like to use JSON_PRETTY to get a enhanced view:

select json_pretty(doc>"$.textures[*].types")  
from zz1 limit 1;
 [
  {
    "t_1": 0,
    "t_2": 0
  },
  {
    "t_1": 0,
    "t_2": 0
  }


But there are two t_1s!

The next step is to get just those t_1 values and that is done with select 
doc->"$.textures[*].types.t_1" from zz1;

Which gives us:

 [0, 0]

Not really confidence inspiring ,eh? So lets change one of those zeros to a nine.

update zz1 set doc = json_set(doc,"$.textures[0].types.t_1",9);
 
So did we change the first or the second t_1??

select json_pretty(doc>"$.textures[*].types")  
from zz1 ;
 [
  {
    "t_1": 9,
    "t_2": 0
  },
  {
    "t_1": 0,
    "t_2": 0
  }

But lets double check and change the second t_1 also. 

update zz1 set doc = json_set(doc,"$.textures[1].types.t_1",7) ;

Hopefully that second one will end up with a value of seven.

select 
json_pretty(doc->"$.textures[*].types") from zz1;
 [
  {
    "t_1": 9,
    "t_2": 0
  },
  {
    "t_1": 7,
    "t_2": 0
  }

So now we can get to the exact values we want.

Annotated JSON Document

So lets look at the section of the JSON document and annotate in red the paths in the document.

"textures":[  
      {  -- textures[0]
         "label":"test",  
         "types":{  -- textures[0].types
            "t_1":0,--textures[0].types.t_1
            "t_2":0
         }
      },
      {  -- textures[1]
         "label":"KEK",
         "types":{  
            "t_1":0,--textures[1].types.t_1
            "t_2":0
         }
      }
   ]
Hopefully this will ease someone's confusion down the line.  And please do buy my book.

Wednesday, September 5, 2018

Loading Unknown CSV Data into a Table Easily

Peter Zaitsev of Percona posted a Tweet about an interactive book Select Star SQL which 'aims to be the best place on the internet for learning SQL'.  This interactive book uses a data set that can be downloaded or you could work the queries online.  I showed the website to a neighbor who was asking 'what is the best way to learn SQL'.   The neighbor wanted their own copy of the data and asked how do you turn a raw CSV file into a table.

Well, that can get complicated.

CSV


CSV files use commas to designate each field in the file.  This is a great idea until someone plants a comma in the middle of a field but it is sort of a least common denominator way of passing data between systems.

But a CSV file does not directly fit into a relational table.  You could shove all the data into a JSON column but that is not what we want.   What is needed is a peek at the data in the CSV file to see what the various fields are and possible hints at the types of data therein.  So download the data and please follow along.

Traditionally it took a bit of data sleuthing to examine the CSV file to determine how the data needed to be handled to fit into the relational table even before you fired up your database client. 

If you open the file with a spreadsheet program like Libre Calc they usually will make a best effort at putting each field in its own column.  Luckily this data set has the first row which names the columns. Are here are the column headings:

Execution
Date of Birth
Date of Offence
Highest Education Level
Last Name
First Name
TDCJ Number
Age at Execution
Date Received
Execution Date
Race
County
Eye Color
Weight
Height
Native County
Native State
Last Statement

Looking at the first few rows under the headings we can get a good glimpse at the data types for the columns.  What we want to do on the first pass is to see if values are strings (CHAR), Integers, Realm and the like.  It also helps to look for some field that can be used for a primary key, hopefully a column with unique values.

Execution  INT Primary key??  Unique??
Date of Birth DATE
Date of Offence DATE
Highest Education Level INT
Last Name CHAR
First Name CHAR
TDCJ Number INT
Age at Execution INT
Date Received   DATE
Execution Date DATE
Race CHAR
County CHAR
Eye Color CHAR
Weight INT
Height CHAR
Native County  CHAR
Native State CHAR
Last Statement  LONG TEXT

Second Pass


In this case it looks like we could use Execution as the primary key.  Now we need to go through the data to see what fields need special handling.  Are the possible INT fields all positive values (unsigned), could we use ENUMs anywhere (Maybe Native Country or Native State or Eye Color). 

Also, for metric fans, the Height column uses imperial measurements which may not fit a company policy of metric only (or cubits). Plus Height has two components -- Feet and Inches -- that we may need to break down later or we may want to record all this data in inches. We will not cover that in this blog post (but we could cover if I get enough folks asking for how to do this).

Also we need to examine each of the CHAR fields to check their maximum lengths.  This can be dangerous for future addendum where suddenly the maximum length set by our first pass at the data is swamped by some much longer fields.  If the only Last Names we see are Smith and Jones, setting the maximum length of the CHAR field works until we run into someone named Buchanan or Schwarzenegger.    But in this case we have a closed system (NO updates to the data) we will not worry about this, at least for his data

So making a best guess, lets assign lengths to the CHAR fields.

Execution  INT Primary key??  Unique??
Date of Birth DATE
Date of Offence DATE
Highest Education Level INT  unsigned
Last Name CHAR  25
First Name CHAR  25
TDCJ Number INT
Age at Execution INT
Date Received   DATE
Execution Date DATE
Race CHAR  10
County CHAR 20
Eye Color CHAR 10
Weight INT
Height CHAR 10
Native County  CHAR 25
Native State CHAR 25
Last Statement  LONG TEXT

Please note that I will be using VARCHAR instead of CHAR in the above and use CHAR just for clarity.

Create Our Table

We could of course create a table like that below and then load the data file.



MySQL Workbench makes quick work of creating the schema.  But Workbench will also has a wizard to open the CSV, let us name the columns, and import the data.

The Easy Way To Go From CSV to Relational Table

MySQL Workbench's import wizard is reached by clicking on a table icon and selecting Table Data Import Wizard.

Tell the import wizard where the CSV is located


Start the table import wizard

The CSV is split up by fields and we could edit field types 
And the data is loaded



Now I have a big warning in that column names are copied from the CSV table first row.  We could have changed Date of Birth to something like Birth_date but chose not to. This means we have to make queries like SELECT `Date of Birth` FROM tx_deathrow.

But now we have a local copy of the data to use when working through the interactive book.