Thursday, May 13, 2021

JSON Validation and ALGORITHM = Instant

Today I was presenting on JSON Schema validation and covering how MySQL implemented the JSON-Schema.org ideas on making sure JSON data meets criteria before being allowed into the database.  I was asked by Percona Live attendee Nickolay Ihalainin if the JSON_SCHEMA_VALID function worked with ALTER TABLE's ALGORITHM = INSTANT option.

I admitted that I did not know as a I had not tested. While I was answering questions, Nickolay did this test.

 alter table foo17  add CHECK
    (json_schema_valid(
            _utf8mb4'{ "type":"object",
            "properties":{
                "latitude":{"type":"number", "minimum":-90, "maximum":90},
               "longitude":{"type":"number", "minimum":-180, "maximum":180} 
             },
      "required": ["latitude", "longitude"]\n }',doc)
),
ALGORITHM=INSTANT; 

And the answer (drum roll here):

ERROR: 1845: ALGORITHM=INSTANT is not supported for this operation. Try ALGORITHM=COPY.

Dang!

And yes, it does work with ALGORITHM=COPY.

Wednesday, May 12, 2021

MySQL Track

May 13th is the day for the MySQL Tracks at the Percona Live  conference. 

Besides an announcement for an exciting new product, you will find sessions on:


  • MySQL Server Component Manifest Files
  • Insights into the new Oracle MySQL Database Service
  • Oracle MySQL Database Service with HeatWave for Real-Time Analytics
  • MySQL Architectures in a Nutshell
  • Successfully run your MySQL NDB Cluster in Kubernetes
  • Validating JSON
  • Migration from 5.6 to 8.xx.xx
  • Dbdeployer in action - Optimised MySQL sandboxes
  • MySQL Shell for DBAs (including Dump & Load)


Monday, May 3, 2021

May is Double Check Time

     Early in my career my boss would ask me to audit the equipment each May.  I had to double check the hardware and software versions to ensure they were up to date, or as close as we could get them.  I was working at a university and things quieted down after the students left in June which gave us a window for maintenance. And to this day the first Monday in May is when I double check the status of my equipment.

    MySQL 8.0.24 came out last month and I had not upgraded my various computers.  Fedora & Ubuntu have also recently sent out new versions and I am up to date on one and not the other.  My favorite email program had an update and the two browsers I use most also had updates.  And my corporate IT folks have some updates they wanted me to run.  

    Why do this effort?  First is that there are bug fixes, updates, and new features.  A lot of engineering talent's time is wasted by folks not performing updates. The following is a example of a conversation that I have regularly.  

"Hey Dave, when will MySQL have feature X? I really, really need it!" 

"We added it two years ago in version 8.0.14." 

    I understand the 'if it is not broke do not fix it mentality' on a lot of things but not software updates that can be vital to your organization.  There may be a security path you need, or a memory leak plugged, or a more efficient function in that update.  

   I can understand a reluctance to upGrade (not upDate) until a new release has proven itself. By the way for those of you wanting to wait for MySQL 8.0 to mature should note that it has been out for over three years! So those wanting histograms, window functions, and many new other features should be well on their upgrade path.

  So take some time week and double check to see what you are running.  This is an area where you do not want surprises.  


Wednesday, April 21, 2021

Default Values & Check Constraint and Midwest PHP

 Consider the following table definition:

CREATE TABLE check_please (
       c1 int constraint c1_not_42 check (c1 <> 42) default 42);

CHECK Constraints are a few years old in MySQL and are very useful. You set up a constraint and the MySQL server ensures that the data being inserted does not violate that constraint. 

But what if the DEFAULT value violates that constraint?  Which takes precedence - the default value of the constraint?  We start with some direct action:

insert into check_please (c1) value(42);
ERROR: 3819: Check constraint 'c1_not_42' is violated.

That worked as expected.  There is no easy way to cram a value of 42 into column c1.

That was a direct test. But what if we try something different?  A second column is added so we can test not supplying a value to column c1.  

CREATE TABLE check_please2 (
       c1 int constraint c1x_not_42 check (c1 <> 42) default 42,
       c2 int default 42);

And the test supplying no value for c1 which should try to insert the default value of 42, which violates the constraint check.

insert into check_please2 (c2) values (NULL);
ERROR: 3819: Check constraint 'c1x_not_42' is violated.

It does seem pretty obvious that a default value would not override a constraint but sometimes you just have to check your assumptions.

Midwest PHP


Midwest PHP is an online conference  April 22nd and 23rd with 52 Sessions, Certifications, & More! And there are free tickets! MySQL is again proud to sponsor this event and I speak on the first day on new MySQL 8.0 features.  


Monday, March 29, 2021

QUEST Insync & Open Source 101

 Not one but two virtual presentations on March 30th for me.

First up at 10:45 Central is MySQL Indexes, Histograms Other Ways To Speed Up Queries  and look for Session ID: 101290 

Second at 3:45 Central is An Introduction to MySQL Indexes & Histograms (a shorter version of the first talk) and I will be staffing the virtual booth.

Slides will be posted at htps:://slideshare.net/davestokes 

So if you really, really want more information on indexes and histograms then March 30th is your day!

Wednesday, March 24, 2021

From Spreadsheet to Database with MySQL Workbench

In the last post I covered some of the many reasons to migrate data from a spreadsheet to a database and now it is time to show how to do just that.  Moving data from a spreadsheet into MySQL can actually be pretty simple in many cases. Now I will not mislead you by telling you that all cases are easy but there is an easy way to make this migration.  And I highly recommend the following process when possible as it can save lots of time. More painful migrations will be covered in the future but this post is my favorite 'fast and easy' process.

This is a guide to taking data from a spreadsheet and easily moving that data into a MySQL database instance. The trick is the Import Wizard in MySQL Workbench that does a lot of the detail work for you. In a future post we will go into what you have to do when you are not using Workbench and have many more steps to take to get the job done.

You will find the data for this example at https://catalog.data.gov/dataset/texas-dot-txdot-work-zone-data  and this dataset provides lane closure occurrences within the Texas Department of Transportation (TxDOT) highway system in a tabular format.  By the way, Texans love to argue about who has the best BBQ and freeway construction. This is a continuously updatid archive of the TxDOT WZDx feed data can be found at ITS WorkZone Raw Data Sandbox and the ITS WorkZone Semi-Processed Data Sandbox. The live feed is currently compliant with the Work Zone Data Exchange (WZDx) Specification version 2.0.  But most importantly for this blog is that it is a good example dataset.  So please download it and import it into the spreadsheet of your choice.

1. Export Data

There are many ways to covert the data from a spreadsheet to something MySQL can consume.  The easiest is to save it in CSV or Comma Separated Variables were every column of data has a comma between it and the next column.  In Sheets you will use FILE->Download->CSV , Excel is File->Save As->CSV, and other spread sheet software will have something similar.  Please not that for most programs this exports the current sheet and not the entire spreadsheet. 

road_event_feed_info_feed_update_date,road_event_feed_info_version,road_event_id,subidentifier,road_name,road_number,direction,beginning_cross_street,ending_cross_street,beginning_milepost,ending_milepost,beginning_accuracy,ending_accuracy,start_date,end_date,start_date_accuracy,end_date_accuracy,event_status,total_num_lanes,vehicle_impact,workers_present,reduced_speed_limit,restrictions,description,issuing_organization,creation_date,update_date,types_of_work,lanes,geometry_linestring,geometry_multipoint

03/17/2021 04:58:33 PM,2,ELP_2019-08-01_00014+2021-03-17,ELP,SH-20,,eastbound,US-54,,,,Estimated,Estimated,03/17/2021 12:00:00 AM,03/17/2021 11:59:59 PM,Estimated,Estimated,,3,some-lanes-closed,,,[],2019 - 08/01 On-Going Closure - Right lane closure,TxDOT,,,[],"[{""lane_edge_reference"": ""left"", ""lane_number"": 1, ""lane_status"": ""open"", ""lane_type"": ""left-lane""}, {""lane_edge_reference"": ""left"", ""lane_number"": 2, ""lane_status"": ""open"", ""lane_type"": ""middle-lane""}, {""lane_edge_reference"": ""left"", ""lane_number"": 3, ""lane_status"": ""closed"", ""lane_type"": ""right-lane""}]",,MULTIPOINT ((-106.445503 31.773454))

The data above shows only the first two rows of data.  The first row of data has the column headings and the second row is sample data.  Workbench can read the first row and use those columns names to create the column names for the table in the database.  It will read the other lines in the table to make an educated guess on the data types for the columns of the table.

2. Import Data

Start MySQL Workbench and login your to instance. 

Open a new SQL tab by clicking on the icon indicated below.









Issue a CREATE DATABASE texdot; command in that new SQL tab. Do not forget the semicolon at the end of the line that indicates the end of the Structured Query Language (SQL) command.








Next execute that create database command by hitting the second lightening bolt icon.  The second lightening bolt icon executes the SQL under the cursor.






You will see that the command has executed with the 1 row(s) effected message. 










Now we have to tell Workbench that we want to use that new database we just created. No, that does not automatically happen.  You have to select the little 'eyeball' icon on schema list to reload the list of available schemas (schemas and databases are used interchangeably in the MySQL world).


 




Now scroll down in the schemas window until you see your new database.  Click on 'texdot' to tell Workbench that this is the database we want to use. When you click on it you will see underneath the active schema name in green.











Then select the right pointing arrowhead to expand the information on the schema.










Right click on the Tables line and you will see the Table Data Import Wizard option.  











Now comes the neat stuff.









Enter the name of the file where the CSV data resides.


In my case it was in D:\tex\ and I was able to browse for it.

We want to create a new table in the texdot directory 


And I was able to enter the name zone for the new table.


The wizard starts to work it's magic.  It takes the data in the first row of the CSV file and converts them to file names. It then looks at the rest of the datafile to determine data type.


You will note that you can change them if you want and it will automatically make a best for types like JSON. Select Next when you are ready to proceed.

















You are now ready to import.  Select Show Logs from the lower left corner and then select Next.

















The data is now uploaded.

















The time to import is partially a function of the CSV file length but eventually you will get the Import data file finished message. Again select Next.

















There are some statistics on the performance of the import that may be of use. In this case we import just over 1600 records. Select Finished.

















Hit that eyeball icon again to reload the table table.  








And we can now use SQL to query the data. Enter SELECT * FROM texas_dzone LIMIT 5; and again select that second lightening bolt icon.  (please note that in an earlier run is uesed the name 'zone' instead of 'texas_dzone')












And the results of that query.









In just a few moments the data went from being in a spreadsheet to being in a database.  No muss, no fuss.  In the future we will look at datasets that take a little more 'engineering' to get into the database as there are cases where we and not the wizard have to do the 'heavy lifting'.  

More on MySQL Workbench

For on MySQL Workbench please see https://www.mysql.com/products/workbench/  to download and https://dev.mysql.com/doc/workbench/en/  for the documentation.

Tuesday, March 16, 2021

Is a Database Just a Fancy Spreadsheet?!?

 Want to make a data person groan?  Ask them if a database is just a fancy spreadsheet.  Then ask them to convert a spreadsheet to a database.  Add in that there are lots of funny calculations, odd cells, and company 'secret' formulas.

What is a Spreadsheet?

For those of you who did not see the changes brought early personal computer spreadsheet software packages like VisiCalc and Lotus 1-2-3 back in the 1980's, they probably sold more personal computers (who needs more the 640K of memory?) to businesses than any other application.  Accountants used to use real 'sheets' of paper 'spread' across their desks.  Summing a column by hand or with a 10 key calculator and then writing down the results all day long was a common task (or career).  Wikipedia describes a spreadsheet as a computer application for organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cells of a table. Each cell may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells. A spreadsheet may also refer to one such electronic document.

Those spreadsheet power users who could program formulas, pivot data, and other tricks were very important persons.  However there was no way for two or more people to share the same spreadsheet. At least if you wanted to ensure the integrity of the data or programmed formulas.  No transactions, locking, or anything like that (and to honest not many of the databases back then had those features either).  But the computer did sums, standard deviations, calculated depreciation, net present value, and other vital calculations, not a human.  

Converting a Spreadsheet to a Database

The limitations of a spreadsheet can choke the growth of a small business and so there you can hear pleas for conversion help on a regular basis.  

I am working on a new project  to document how to do such a conversion.  Right now I am still looking at a good source of data (please share if you have a good one) in Excel format (or similar) to show how to convert the data, how to use the Import Wizard in MySQL Workbench (and the utilities in mysqlsh too), common problems in import, checking the data for completeness, setting up structure, some normalizing, and then some query optimization.  

One example of a typical problem found in a spreadsheet to database conversion is the need to scan the columns for consistency.  If a column is supposed to yes/no, y/n, true/false, or some binary 0/1 variation it is almost inevitable to find an asterixis, a 'm' (for 'maybe'), or the 'See John for details on this'.  Cells in a spreadsheet are much more flexible than typed column in a relational databases.

I will be posting over the little while the steps I am taking with this project.  Hopefully it can be the first step is a series. Please let me know if you have input.

CSV or not to CSV

Comma Separated Variables are common way of outputting data from a spreadsheet to another spreadsheet or another platform.  If you use Excel you version may offer three versions of CSV to pick from.  Then you might need to demark the end of each line, the start & end of each cell, and other similar encodings.  

JSON is another option but not for my version of Excel.  Nor LibreOffice nor Google Sheets.  And XML is not a favorite of mine.   

SO CSV becomes the least common form factor.  (insert golf clap sound effect here)

CIA World Factbook

The CIA World Factbook is the basis of the famous MySQL world and world_x databases used in MySQL documentation, classes, and other literature.  It is still free but a MySQL devotee would not recognize the majority of the data.  It had evolved a lot and I am tinkering with using this data to show how to do the conversion, use the migration wizard, at etcetera.  And it is free.

Next Time

Next time we will look at the spreadsheet data and the CSV.  And yes, as a preview of things to come, there are issues with the data.  But that is for the next time. 

Tuesday, March 2, 2021

Invisible MySQL?

 Is MySQL going invisible?  Invisible Indexes were included in MySQL 8.0 and now with version 8.0.23 we have Invisible Columns.

Indexes You Can Not See!

The value of the invisible index is that it allows you to make an index disappear from the view of the optimizer.  In the distant days before 8.0, you would often delete an index you were pretty much definitively positive nobody or no query was using.  And then you would find out that yes, not only was that index you just deleted necessary to everyone in the galaxy (but maybe you)  but it was going to take some serious clock time to rebuild that index. 

But with Invisible Indexes, you issue a command like ALTER TABLE t1 ALTER INDEX i_idx INVISIBLE; and it was removed from use.  Now you can run EXPLAIN on your queries and compare results.  And if you want that index back among the visible, ALTER TABLE t1 ALTER INDEX i_idx VISIBLE;  returns you to full functionality.

I do recommend making an index invisible as part of a process of decommissioning an index, similar to a soft delete of a column to avoid hurried recreation.  And the Sys Schema will show you indexes that have not been used, just make sure you have long enough of a time period to let those queries that only run once a week/month/quarter or longer show themselves.  

Columns You Can Not See

MySQL 8.0.23 now allows you to have columns you can sort of not see.  There are not really invisible or obfuscated but those columns are harder to see.  If we create a table with an invisible column we have to explicitly call out that column to see the values as a * wildcard will not return the value.

SQL > create table stuff (id serial, c1 int, c2 int invisible);
Query OK, 0 rows affected (0.0393 sec)
SQL > insert into stuff (c1, c2) values row(1,2), row(3,4), row(5,6);
Query OK, 3 rows affected (0.0073 sec)
Records: 3  Duplicates: 0  Warnings: 0
SQL > select * from stuff;  <- the wildcard, no invisible column
+----+----+
| id | c1 |
+----+----+
|  1 |  1 |
|  2 |  3 |
|  3 |  5 |
+----+----+
3 rows in set (0.0005 sec)
SQL > select id,c1,c2 from stuff; <- have to call c2 to see c2
+----+----+----+
| id | c1 | c2 |
+----+----+----+
|  1 |  1 |  2 |
|  2 |  3 |  4 |
|  3 |  5 |  6 |
+----+----+----+
3 rows in set (0.0005 sec)

So you could somewhat hide a column by using this method but please do not call this secure.  This will allow you to add columns, say a primary key to a table lacking one, without having to worry about needing to modify existing queries.

And if you are creating a table from a table with an invisible column you need to explicitly reference it or you will not get that data.

Limitations?


The first limitation is that all columns can not invisible so at least one column needs to be visible.  Columns can be defined as NOT NULL and have DEFAULT values.  

What Else Is Going To Be Inviable?!


Well, in my job as a MySQL Community Manager, I do on rare occasion have access to software pre launch and I can tell you there are some really cool things in the product pipeline but I can not let you see them yet. (sorry, very bad joke)

Friday, February 26, 2021

Searching JSON Data

 A post on the MySQL forums caught my eye today. The distilled question was why was an index not being used to search JSON data.  The poster (in blue) had done most of the steps needed almost correctly but just missed the crucial steps. 

Suppose we have this table with an index on a virtual generated column as JSON attribute index pattern

```
CREATE TABLE applications (
id BINARY(16) PRIMARY KEY,
data JSON NOT NULL,
) ENGINE=InnoDB;
CREATE INDEX idx ON applications ((CAST(data->>"$.name" AS CHAR(10))));

Friday, February 5, 2021

MySQL & Friends FOSDEM Dev Room

 FOSDEM has become one of the biggest shows on the calendar of the MySQL Community Team.  FOSDEM itself, until this year, is barely contained chaos at a university in Brussels with no pre-enrollment for any of the dozens of themed rooms and no head count.  This year the event is virtual so you do not have to get to Belgium.


The MySQL and Friends Developer Room is Sunday and starts at 10:00am  Belgium time with a retrospective of the first twenty five years of MySQL. This is followed 15 other talks of about 25 minutes on everything from ARM to user management. 


TITLESPEAKERSTARTEND
25 years of MySQL – A RetrospectiveDave Stokes10:0010:20
Open Source Database Infrastructure with VitessShlomi Noach10:3010:55
From single MySQL instance to HA
The journey to InnoDB Cluster
Frédéric Descamps11:0011:25
Group Replication: Best Practices for Network DeployAníbal Pinto11:3011:55
Review of the Percona Operator from a DBA prospectiveMarco Tusa (the Grinch)12:0012:25
MySQL Router REST APIFrédéric Descamps12:3012:55

Better User Management under MySQL 8.0
Dave Stokes13:0013:25
Running MySQL on ARMKrunal Bauskar13:3013:55
Making MySQL-8.0 XA transaction processing crash safe
The key to use MySQL as storage nodes for distributed databases
Wei Zhao14:0014:25
Passing query attributes through the MySQL protocol
Extension to the wire format to allow data/metadata
Joro Kodinov14:3014:55
Linux /proc filesystem for MySQL DBAs
Sampling /proc content for troubleshooting
Valerii Kravchuk15:0015:20
Rewrite Your Complex MySQL Queries for Better PerformanceØystein Grøvlen15:3015:55
Automatic Asynchronous Replication Connection FailoverHemant Dangi16:0016:25
Sure you can run your database in kubernetes
Successfully run your MySQL NDB Cluster in kubernetes
Bernd Ocklin16:3016:55
18 Things To Do When You Have a MySQL BottleneckPeter Zaitsev17:0017:25
Hitchhiker’s Guide to MySQL
Don’t panic.
Liz van Dijk17:3017:55

Monday, January 25, 2021

Bye bye MySQL 5.6!

Adieu MySQL 5.6!

When you arrived in 2013 you had a lot of cool new features.  There was the NoSQL Memchache plug-in that was blazingly fast, an improved Performance Shema, full texted searching for InnoDB, big improvements in the optimizer, and great enhancements to replication. You were a great replacement for 5.5!

But in a few days you become history, a museum piece.  No more updates for big fixes for you.  You will become part of the 'back in the old MySQL 5.6' days stories senior DBAs and developers will tell.  You were a big improvement over 5.5 and a stepping stone to 5.7 & 8.0.  

You arrived with the mysql_config_editor, started the SHA256 password options, and you let us do some table alterations online.  

But now you time is passed and the MySQL Community Moves onward.  A lot of us old timers will hear your echoes in MySQL 8 and the newer users will not know what you brought to us.  

Bye bye 5.6. Rest well knowing you did your part.

JOINs - The Basics

 JOINs confuse a lot of those new to Structured Query Language (SQL). If you read the various web forums popularly used to ask questions then you know that using JOINs is pretty scary for a lot of folks. The sheer number of the 'how do I use JOIN to link two tables together?' questions is staggering. Not just because the JOIN itself is often obfuscated with relational algebra, Venn Diagrams, and the syntactical oddities of SQL -- and that is when you do find an online resource that tries to help. Plus adding to the frustration on top of all that are that the various options for using JSON can be downright perplexing.

Example 1

Let us start with an example of customers and their orders.  There is one table named customer for the information pertaining to the customer. There is another table with order information named orders holding the details of any orders from those folks in the customer table. 

Each of these tables has a column for the identification number of a customer.  In the customer table that column is named id and in the orders table that column is named customer_id.  Besides the inconsistencies in plurals of the table names, which is all too common in databases, there is the difference in column names for what is essentially the same data.  If you can get past that issue and realize that that you can use either column to link to the other table, then you are almost ready for the 'tricky stuff'.

To find the list of orders and their correspond customer, we can JOIN the two tables using the id/customer_id columns between the table with a simple JOIN. 

SELECT
orders.order_id
, customer.id
, customer.name  
from orders 
join customer 
on (customer.id = orders.customer_id) ;

And the results show the data from the orders table with the corresponding data with customer table.

The results of JOIN






LEFT Joins

But sometimes the two tables do not always evenly match up.  If we change the query above from join customer to LEFT join customer we get a much different result.

The results of LEFT JOIN




The LEFT JOIN displays all the orders and the corresponding customer information, if any.  If there is no corresponding data for the customer information, the server reports NULL(1).  In this case NULL is used to represent that we have no data for that element and order number 2 does not have a matching entry in the customer table.

RIGHT Joins

Changing the query to a RIGHT join will display all the customers and an orders they may have placed.

The Results of a RIGHT JOIN




The above shows that two orders have been placed by customers 1 & 2 and no orders from the customers 3 & 4.

CROSS Join

A CROSS Join will match every row in the first table with every row in the second table. Notice that the query is modified with no ON statement.  

The revised query for a
CROSSJOIN and the results











STRAIGHT_JOIN 

The STRAIGHT_JOIN is used to force the left table to be read first in cases where the optimizer wants to change things around for you.  With optimizer hints, this maybe redundant but it is still very handy.


The STRAIGHT_JOIN









The Natural JOIN

The NATURAL join creates an implicit join clause on the common columns in the two tables being joined. Common columns are columns that have the same name in both tables which means that you need to be VERY certain the columns with the same names are actually using the same data -- You do not want the 'id' for customers to get confused with those from the orders, employees, or other tables.


The Natural Join










(1) NULL

Null is used to indicate the absence of data. If you go back to the second example. the LEFT JOIN, we received records where there was no matching customer id number or name.  This could be caused by bad data in the tables, a badly entered record, or many other (frustrating) causes.  But the records that are holding NULLs can be found.  Hopefully I can loop back to that at a later date.



Tuesday, January 5, 2021

MySQL 5.6 End of Life in ONE MONTH!!!

 For the past year or so I have been reminding folks that MySQL 5.6 reaches End of Life Status in ONE MONTH!!  No more updates or security fixes!  Nostalgia is a fine things and I like antiques but not for my database!

So if you are on 5.6 please upgrade to 5.7 -- https://dev.mysql.com/doc/refman/5.7/en/upgrading.html and consider going to 8.0 - https://dev.mysql.com/doc/refman/8.0/en/upgrading.html  plus there is a5.7 to 8.0 upgrade checker in the new shell - https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-upgrade.html