Saturday, April 21, 2018

MySQL Document Store Document IDs.

Yesterday I was presenting on the MySQL Document Store and was asked if the _id fields created by the server as an InnoDB primary key is a UUID.  I knew that it was not a UUID but I had to hit the documentations (https://dev.mysql.com/doc/x-devapi-userguide/en/understanding-automatic-document-ids.html) to find out what the document ID really is -- a very interesting piece of information.

The Details

 If you are inserting a document lacking a _id key, the server generates a value. The _id is 32 bits of a unique prefix (4 bytes), a time stamp (8 bytes), and serial number (16 bytes). The prefix is assigned by the InnoDB Cluster to help ensure uniqueness across a cluster. The timestamp is the encoded startup time of the server.  The serial numbers uses the auto increment offset and auto increment increment server variables .  From the manual page:
This document ID format ensures that:
The primary key value monotonically increments for inserts originating from a single server instance, although the interval between values is not uniform within a table.
When using multi-primary Group Replication or InnoDB cluster, inserts to the same table from different instances do not have conflicting primary key values; assuming that the instances have the auto_increment_* system variables configured properly.

Properties

Once set, the _id can not be set to another value. Inserting your own value overrides the server assignment. And if you attempt to insert a document with the same _id you the server will generate a duplicate primary key error.

The _id values must be always increasing and sequential for optimal InnoDB performance. The server will keep track of theses numbers across restarts.

The generated _id values for each table/collection  are unique across instances to avoid primary key conflicts and minimize transaction certification in multi-primary Group Replication or InnoDB cluster environments.

Required?

So, you are adding a document to a collection and you get an ERROR: 5115!  That means in the following cast that the _id key/value pair is needed:

JS> db.foobar.add(
-> {
->  Data: "This is a test!"
-> }
-> )
->
ERROR: 5115: Document is missing a required field

JS> db.foobar.add( 
{  Data: "This is a test!" , 
-> _id: "first"
} 
)
Query OK, 1 item affected (0.0076 sec)

Wednesday, April 18, 2018

Storage Engine Vendor Meeting

Oracle will be hosting a MySQL Storage Engine Vendor Meeting next week. I have invited all the usual folks who participate but with the changes in MySQL 8, I may not know if you and or your company is contemplating a new storage engine or are planned to use MySQL 8. Last year we had about ten attendees discussing various topics and I would like to have you too there is you are interested. Please contact me by close of business this Friday if you are interested and will be in the Santa Clara area.

Wednesday, April 4, 2018

Array Ranges in MySQL JSON

Pretend you have a JSON array of data that looks roughly like the following.

mysql> insert into x(y) values('["a","b","c","d"]');
Query OK, 1 row affected (0.10 sec)


You could get all the values from that array using $[*]

mysql> select y->"$[*]" from x;
+----------------------+
| y->"$[*]"            |
+----------------------+
| ["a", "b", "c", "d"] |
+----------------------+
1 row in set (0.00 sec)
Or the individual members of the array with an index that starts with zero.

mysql> select y->"$[0]" from x;
+-----------+
| y->"$[0]" |
+-----------+
| "a"       |
+-----------+
1 row in set (0.00 sec)

But what about the times you want the last item in the array and really do not want to loop through all the items? How about using 'last'?

mysql> select y->"$[last]" as 'last' from x;
+------+
| last |
+------+
| "d"  |
+------+
1 row in set (0.00 sec)

Holey Moley! What is this? Well Roland Bouman, who was my predecessor on the MySQL Certification Team many years ago, still contributes to MySQL on a regular basis. He put in a (feature request for a JSON range operator.
So please thank Roland when you see him!!
Be sides 'last' there is 'to' too!.
mysql> select y->'$[last - 3 to last - 1]' as 'last three' from x;
+-----------------+
| last three      |
+-----------------+
| ["a", "b", "c"] |
+-----------------+
1 row in set (0.01 sec)
You can also use the ordinal number, here $[1], to strip off the first value of the array from the rest.
mysql> select y->'$[1 to last]' as 'strip first' from x;
+-----------------+
| strip first     |
+-----------------+
| ["b", "c", "d"] |
+-----------------+
1 row in set (0.00 sec)

This makes juggling array data in JSON columns much easier.

Friday, March 23, 2018

Caching SHA-2 (or 256) Pluggable Authentication for MySQL 8

If you are like me and you spend chilly spring evenings relaxing by the fire, reading the manual for the upcoming MySQL 8 release, you may have seen Caching SHA-2 Pluggable Authentication in section 6.5.1.3. 

There are now TWO SHA-256 plugsins for MySQL 8 for hashing user account passwords and no, I do not know what the title of the manual pages says SHA-2 when it is SHA-256.  We have sha256_password for basic SHA-256 authentication and  caching_sha2_password that adds caching for better performance.

The default plugin is caching_sha2_password has three features not found in its non caching brother. The first is, predictably, a cache for faster authentication for repeat customers to the database. Next is a RSA-based password exchange that is independent of the SSL library you executable is linked. And it supports Unix socket-files and shared-memory protocols -- so sorry named pipe fans.

If you have been testing the release candidate and use older clients or older libmysqlclient you may have seen Authentication plugin 'caching_sha2_password' is not supported or some other similar message. You need updated clients to work with the updated server.  Old clients used the old MySQL native password password not the new chaching_ha2_password as the default.

When upgrading from 5,7,21 to the 8 GA version, existing accounts are not upgraded,  But if you are starting with a fresh install you get the chaching_sha2_password in your mysql.user entry.   I am sure this will catch someone so please take care. And this goes for new replication servers too! 

Thursday, March 15, 2018

ChickTech Austin PopUp Workshop: Database Basics with MySQL

This Saturday I will be teaching Database Basics with MySQL and there are literally just two seats  left!  The MySQL Community Team is always looking for ways to reach new audiences and we would like to make this class available to other groups (So let me know if you are interested).

And six hours is a really short time so it only scratches the surface.  Maybe we also need some intermediate classes above and beyond the introduction.

Tuesday, March 13, 2018

Windowing Function Tutorials

Windowing Functions are new to MySQL with Version 8.  I have been very lucky in the past few days to find two excellent resources that I want to share with you.  Both are worth the time and effort to study.

At the Southern California Linux Expo last week, Bruce Momjian of EnterpriseDB and PostgreSQL fame gave an amazing tutorial on Common Table Expressions and Windowing Functions (slides for both at https://momjian.us/main/presentations/sql.html).  Bruce is an amazing presenter and I highly recommend going to his sessions at conferences. So, what can a MySQL-er learn from a Postrgrestian?

Plenty.

In this case the two databases are using SQL and the features that turn that language from descriptive to imperative. But showing how these features is very hard.  Bruce said it took him six years to finish the Windowing Function presentation.  And the results are impressive.

You will have to make some changes to the presented code as MySQL does not have sequences -- create a table named x, with a single int column named x, and fill it with the numbers one through 10 for the first windowing function examples.  The big item to study is the various partitioning terminology which can get confusing but is spelled out explicitly in this tutorial.

The next tutorial is from Alex Yeskov and can be found at https://blog.statsbot.co/sql-window-functions-tutorial-b5075b87d129 and is a great addendum to the other tutorial.  He stresses learning by doing and do he does!!

His examples include calculating revenue growth, running totals, dealing with duplicated data, finding the top N rows, and examining repeat customer purchases. 

Doing these two tutorials will not turn you into a windowing function expert.  But you will have a better understanding of how they work and that will make you a better SQL programmer

Sunday, March 4, 2018

MySQL without the SQL - Oh My!

I will be speaking about the MySQL Document Store at the San Diego PHP Meetup Wednesday March 7th.  This is a great group and it is always a pleasure to head back to my hometown.

The the 8th-11th is the Southern California Linux Expo where I will be speaking again on the MySQL Document Store as part of the MySQL track and will be manning booth 617/619 with plenty of swag.

And then on March 17th I will be teaching Database Basics with MySQL at Chick Tech Austin.

Friday, February 16, 2018

A Tale of Three Computer Conferences, Two Communities

Three conferences in three weeks! FOSDEM, SunshinePHP, and PHP UK are three excellent conferences that this year are back to back to back. 

FOSDEM is to the computer world what Renaissance Fairs are to those who have their own maces and armor.  FOSDEM is held on the campus of the Free University of Brussels, there is no registration -- pre, onsite, or post -- and they attempt to guess attendees by MAC addresses on devices that connect to network.  No tickets, no badges, and no reserved seats but FOSDEM is free to attend.  Rooms are requested by various groups including my MySQL Community Team partner LeFred for the MySQL ecosystem. The MySQL and Friends Devroom was packed from early morning to evening with engaging 30 minute presentations from a number of companies.  This show in the last few years has become one of the most important technical shows on the MySQL Community Team Schedule. LeFred and the presenters did a tremendous job of putting together amazing talks for the MySQL Community.

SunshinePHP is held in Miami and organized by the amazing Adam Culp.  He and his team have an amazing knack of pulling fantastic talks together into a great show.  Be advised that this is a show where you can go from airport to hotel for the conference and then return to the airport at the end without ever leaving the venue.  I spoke on MySQL 8 and received a lot of feedback that I used to update my presentation for the next show. 

And the next show is PHP UK.  The PHP Community is very strong, supportive, and radiant in new advancements in the PHP 7 series.  As with SunshinePHP, the PHP folks are warm, supportive, and invigorated.  The organizers of the London show have also assembled a talented group of presenters and I seem to be the only carryover from the previous show with my talk on MySQL 8.

A Comparison of the Communities


The MySQL and PHP Communities are both roughly the same age.  Both are now confident twenty years olds with plenty of recent self improvement.  PHP 7 is light years ahead in speed and capabilities from the four and five series.  MySQL is about to take a giant step with MySQL 8.  Both had version sixes that never quite made it into production but the subsequent engineering have produced much stronger products.  Both face competition from newer products but still dominate what is the modern implementation of the LAMP stack.  And the two products have strong communities working hard to improve the product.

The PHP Community is much better than its counterpart in aiding novices, mentoring, stressing the basics of good style in coding.  Many members have had to add JavaScript skills of one order or another in recent years but still try to keep PHP as their core tool.  And there are many more local PHP user groups than MySQL.

Next Up

I will be talking to the San Diego PHP User Group before heading to the Southern California Linux Expo. More on those shows later.

Monday, January 29, 2018

MySQL 8 Windowing Functions

Windowing functions are way to group rows of data for analysis. This provides a 'window' to look at only the relevant data only.  Plus there are about a dozen supporting functions to take advantage of all this. There is some carry over logically from the aggregate (or group by) functions but they open up some easy ways to dig statistically into your data.

Now for a contrived example.


mysql> Select row_number() over w as '#', 
       Name, Continent, Population, 
       sum(Population) over w as 'Pop' 
       from country where continent='South America' 
       window w as (partition by Continent Order by Continent);                                                                                
+----+------------------+---------------+------------+-----------+
| #  | Name             | Continent     | Population | Pop       |
+----+------------------+---------------+------------+-----------+
|  1 | Argentina        | South America |   37032000 | 345780000 |
|  2 | Bolivia          | South America |    8329000 | 345780000 |
|  3 | Brazil           | South America |  170115000 | 345780000 |
|  4 | Chile            | South America |   15211000 | 345780000 |
|  5 | Colombia         | South America |   42321000 | 345780000 |
|  6 | Ecuador          | South America |   12646000 | 345780000 |
|  7 | Falkland Islands | South America |       2000 | 345780000 |
|  8 | French Guiana    | South America |     181000 | 345780000 |
|  9 | Guyana           | South America |     861000 | 345780000 |
| 10 | Peru             | South America |   25662000 | 345780000 |
| 11 | Paraguay         | South America |    5496000 | 345780000 |
| 12 | Suriname         | South America |     417000 | 345780000 |
| 13 | Uruguay          | South America |    3337000 | 345780000 |
| 14 | Venezuela        | South America |   24170000 | 345780000 |
+----+------------------+---------------+------------+-----------+
14 rows in set (0.00 sec)


In the above example, we created a window named 'w' to allows us to 'peek' at the data arranged by Continent. I am cheating here by only looking at the Continent of South America. You will get a much better look at all this if you remove the WHERE continent='South America' from the query but I use it here for brevity.

And I used two Windows functions on the data from that window.  The ROW_NUMBER()  function provides a nice way to do exactly what the name of this function says.  And the SUM() function adds up all the population columns.

We can even set up ranks, or buckets, to divide up the data.



mysql> Select row_number() over w as '#', Name, Population, 
      ntile(5) over w as 'tile', sum(Population) over w as 'Pop' 
      from country where continent='South America' 
      window w as (partition by Continent Order by Continent);
+----+------------------+------------+------+-----------+
| #  | Name             | Population | tile | Pop       |
+----+------------------+------------+------+-----------+
|  1 | Argentina        |   37032000 |    1 | 345780000 |
|  2 | Bolivia          |    8329000 |    1 | 345780000 |
|  3 | Brazil           |  170115000 |    1 | 345780000 |
|  4 | Chile            |   15211000 |    2 | 345780000 |
|  5 | Colombia         |   42321000 |    2 | 345780000 |
|  6 | Ecuador          |   12646000 |    2 | 345780000 |
|  7 | Falkland Islands |       2000 |    3 | 345780000 |
|  8 | French Guiana    |     181000 |    3 | 345780000 |
|  9 | Guyana           |     861000 |    3 | 345780000 |
| 10 | Peru             |   25662000 |    4 | 345780000 |
| 11 | Paraguay         |    5496000 |    4 | 345780000 |
| 12 | Suriname         |     417000 |    4 | 345780000 |
| 13 | Uruguay          |    3337000 |    5 | 345780000 |
| 14 | Venezuela        |   24170000 |    5 | 345780000 |
+----+------------------+------------+------+-----------+
14 rows in set (0.00 sec)
mysql>

Here we used NTILE(5) to divide the results into five bucks.  By the way change the 5 to a 4 and you have quartiles or 100 for percentiles.  This does not really mean much statistically since the countries are arranged alphabetically.

So lets make this a little more statistically meaningful. Lets look at the population of South America with the largest countries first and broken into quartiles.


mysql> Select row_number() over w as '#', Name, Population, ntile(4) over w as 'tile', 
       sum(Population) over w as 'Pop' 
       from country where continent='South America' 
       window w as (partition by Continent Order by Population desc);
+----+------------------+------------+------+-----------+
| #  | Name             | Population | tile | Pop       |
+----+------------------+------------+------+-----------+
|  1 | Brazil           |  170115000 |    1 | 170115000 |
|  2 | Colombia         |   42321000 |    1 | 212436000 |
|  3 | Argentina        |   37032000 |    1 | 249468000 |
|  4 | Peru             |   25662000 |    1 | 275130000 |
|  5 | Venezuela        |   24170000 |    2 | 299300000 |
|  6 | Chile            |   15211000 |    2 | 314511000 |
|  7 | Ecuador          |   12646000 |    2 | 327157000 |
|  8 | Bolivia          |    8329000 |    2 | 335486000 |
|  9 | Paraguay         |    5496000 |    3 | 340982000 |
| 10 | Uruguay          |    3337000 |    3 | 344319000 |
| 11 | Guyana           |     861000 |    3 | 345180000 |
| 12 | Suriname         |     417000 |    4 | 345597000 |
| 13 | French Guiana    |     181000 |    4 | 345778000 |
| 14 | Falkland Islands |       2000 |    4 | 345780000 |
+----+------------------+------------+------+-----------+
14 rows in set (0.00 sec)

Now notice the Pop column as it suddenly becomes a very useful running total.

 I have only touched a few of the new functions to support Windowing functions but there is much more of interest here. 

Monday, January 8, 2018

Common Table Expressions (CTEs) Part 1

Occasionally at conference or a Meetup, someone will approach me and ask me for help with a MySQL problem.  Eight out of ten times their difficulty includes a sub query. "I get an error message about a corrugated or conflabugated sub query or some such,"  they say, desperate for help.  Usually with a bit of fumbling we can get their problem solved.  The problem is not a lack of knowledge for either of us but that sub queries are often hard to write. 

MySQL 8 will be the first version of the most popular database on the web with Common Table Expressions or CTEs.  CTEs are a way to create temporary tables and then use that temporary table for queries. Think of them as easy to write sub queries!

WITH is The Magic Word

The new CTE magic is indicated with the WITH clause.

mysql> WITH myfirstCTE 
      AS (SELECT * FROM world.city WHERE CountryCode='USA')
SELECT Name, District
FROM myfirstCTE 
ORDER BY Name
LIMIT 5;
+-------------+------------+
| Name        | District   |
+-------------+------------+
| Abilene     | Texas      |
| Akron       | Ohio       |
| Albany      | New York   |
| Albuquerque | New Mexico |
| Alexandria  | Virginia   |
+-------------+------------+
5 rows in set (0.01 sec)

mysql> 

So in the above example we create at temporary table named 'myfirstCTE' and then query from that table. The CTE itself isa very simple query, SELECT * FROM world.city. And then two columns are plucked out of 'myfirstCTE'. The CTE can even have modifiers on the query for ORDER BY, LIMIT or any other SQL qualifier.

A Little More Complex Example 


Lets try a little more complex query.

mysql> WITH secondCTE AS 
     (SELECT city.Name AS A, country.Name as B, city.District AS C 
     FROM city 
     JOIN country on (city.CountryCode=country.Code)) 
     SELECT A, B, C FROM secondCTE LIMIT 5;
+----------------+-------------+----------+
| A              | B           | C        |
+----------------+-------------+----------+
| Oranjestad     | Aruba       | –        |
| Kabul          | Afghanistan | Kabol    |
| Qandahar       | Afghanistan | Qandahar |
| Herat          | Afghanistan | Herat    |
| Mazar-e-Sharif | Afghanistan | Balkh    |
+----------------+-------------+----------+
5 rows in set (0.04 sec)

mysql> 

Here we select columns A, B, C from temporary table 'secondCTE'. In this case it is easier to see that we are getting data from the CTE and not the two tables city or country.  At least directly.

The reason I use this examples is often times you have to join two or more tables and the columns with the name name have different sources, here name of country and name of city.  Yes, you could alias them in a sub query but the legibility of the query increases.  And as query complexity builds things like the old N+1 problems raise their ugly heads.

More Than One Way to Skin a Cat 

So lets play again with the information from the second example but this time we will write two separate CTEs and then join them together.  First we are going to get the Name column from the city table and the Name column from the country table. We do have to drag along the columns for a join in the CTEs.  I think for many with limited SQL experience that this version is a little easier to comprehend than the second example.

mysql> WITH threeA AS (SELECT Name AS N1, CountryCode AS x FROM city),
    -> threeB AS (SELECT Name AS N2, Code as y FROM country)
    -> SELECT N1, N2 FROM threeA JOIN threeB WHERE threeA.x = threeB.y LIMIT 5;
+----------------+-------------+
| N1             | N2          |
+----------------+-------------+
| Oranjestad     | Aruba       |
| Kabul          | Afghanistan |
| Qandahar       | Afghanistan |
| Herat          | Afghanistan |
| Mazar-e-Sharif | Afghanistan |
+----------------+-------------+
5 rows in set (0.01 sec)

mysql> 
 
Hopefully the use of CTEs will produce easier to read SQL statements. 

Tuesday, January 2, 2018

Two New MySQL Books!

There are two new MySQL books both from Apress Press. One is an in depth master course on the subject and the other is a quick introduction.


ProMySQL NDB Cluster is subtitled Master the MySQL Cluster Lifecycle and at nearly 700 pages it is vital resource to anyone that runs or is thinking about running NDB Cluster. The authors, Jesper Wisborg Krogh and Mikiya Okuno, have distilled their vast knowledge of this difficult subject in a detail packed but easily readable book.  MySQL Cluster is much more complex in many areas than a regular MySQL server and here you will find all those details. If you run MySQL NDB Cluster then you need this book. The partitioning information in chapter 2 is worth the price of the book alone.  I am only a third of the way through the book and have found it as clear and concise as any technical book I have read and it is actually an easy read. 

MariaDB and MySQL Common Table Expressions and Window Functions Revealed by Daniel Bartholomew is a slender introduction to CTEs and Window functions.  If you were raised on MySQL and do not know either subject well, then I highly recommend this book.  CTEs are going to have a big impact on the way developers write sub queries and may cause self joins to become extinct.  Windowing functions will bring a new eave of analytical analysis to MySQL. This book is just over 100 pages and has useful examples for novices in either area. 

More books! Well yes, there are more MySQL books in the works so save your pocket change in order to buy them when they appear.