Open Source Databases

Friday, October 23, 2020

Data Con LA 2020 and New Introductory Video Series on MySQL

Data Con LA starts October 23rd and I will be speaking on Sunday and I will be speaking on Sunday the 25th on MySQL's NoSQL JSON Store in a talk titled MySQL Without the SQL -- Oh My!

The talk is pre-recorded but I will be live for the question and answer segment. So this is your chance to ask me your question.

Dave's MySQL Basics is a new video based series I am starting to teach the basics of MySQL. I have been talking with book publishers, hiring managers, DBAs of all levels, and many others. They have all said they would like to see a simple, modular way to learn MySQL. So rather than take a few years to produce a book, I am creating a series of videos that will be short (the goal is five minutes or less), not too pedantic (still have to teach the very low level basics), and that can be updated quickly when/if the material changes. The videos will be on Youtube and course materials I create will be at https://github.com/davidmstokes/MySQLBasics, where you can also find the video links.

Sunday, October 11, 2020

The Second Edition of MySQL and JSON - A Practical Programming Guide is now Available

I am please to announce that the Second Edition of MySQL and JSON - A Practical Programming Guide is now Available. The new book is twice the length of the first edition due to the many advancements made by MySQL Engineering over the past two years.

Who needs this book: Anyone wanting to take advantage of the MySQL JSON data type for storing JSON formatted data, those wanting to turn relational data into JSON, those who want to make their JSON data relational, and those seeking a through introduction into the use of MySQL as a NoSQL JSON MySQL & JSON - A Practical Programming Guide - Second Edition

document store. The second edition features even more illustrations, code examples, and covers the many new features added in the last two years including JSON document validation (yes, you can have required fields, type checking, and constraints on your JSON data before it gets into the database), new MySQL Shell features for bulk loading JSON data in parallel, the use of multi-value indexes for JSON array, and much more.

MySQL & JSON - A Practical Programming Guide Second Edition is now available from Amazon in digital and paperback formats.

The author is a long time user of MySQL and has been a member of the Oracle MySQL Community Team for a decade.

Tuesday, September 22, 2020

MySQL + WePay MeetUp

Description

Join this virtual meetup featuring WePay and Oracle MySQL! They will review three great talks on MySQL Compatibility Check, Running MySQL on Kubernetes and Scalable Bookkeeping.

Talk 1: MySQL Compatibility Check

Talk 2: Running MySQL on Kubernetes

Talk 3: Scalable Bookkeeping

Date & Time

Oct 1, 2020 10:00 AM in Pacific Time (US and Canada)

Wednesday, September 9, 2020

MySQL 5.6 End of Life Is Less Than Six Months Away

If you are running MySQL 5.6 please note that it will hit support End of Life status in February 2021! MySQL 5.7 has been Generally Available for almost five years and MySQL 8.0 has been out for over two years. You need to upgrade as soon as possible.

Upgrading to 5.7 is pretty easy. And I heartily suggest that after upgrading to 5.7 that you also upgrade to 8.0 using the upgrade checker utility.

MySQL 8.0 has a lot of improvements including CTEs, Windowing Functions, Histograms, Multi-value indexes, upgraded JSON support, a contention aware transaction scheduler. dual passwords, and more. Plus there are InnoDB Cluster, MySQL Router, the New Shell (mysqlsh) and ReplicaSets. Heck, you no longer have to run the old mysql_upgrade script (often a forgotten step) as the install process now does that for you!

If you have not started planning to upgrade yet, please put it down on your to-do list.

MySQL Community Downloads

Monday, August 31, 2020

Three Bad MySQL Query Types You May Be Writing

One of the problems with Structured Query Language as opposed to languages like C++, Java, Python, or PHP is that it is hard to tell if a query is good or bad just by looking at it. A programmer with a few months of coding experience can be shown a snippet of code from a project and have a pretty good chance of telling if that code is good or not. Some call this a 'code smell'.

A programmer can tell if the code is legible, understandable, and many times if it performs well often times at just a glance. Want to do an experiment on this? Go out to Github and check out a handful of random projects. Good code often just looks good.

But SQL is different. It is mostly a declarative language, telling the computer what it wants rather than how to make what it wants. You can random pull SQL queries off Gitthub or Stackoverflow or Reddit and can not tell if a query is really good or really bad.

Why? SQL depends a lot on the underlying structure of the data. Looking for a phone number? Is it in a column designed to hold the phone number, or buried in a long string of data? Separate country/area code from the main number? Is a second phone number kept in different records, different columns, or not stored at all?

Is there relational logic, foreign keys, and how are they implemented? Are there keys? What do the indexes really index? Are there any histograms? These things are not evident by just looking at the query. So you end up using some variant of EXPLAIN along with DESCRIBE TABLE to start to dig into things.

Plus SQL has a unique learning curve once you get past the basic UPDATE, SELECT, and DELETE commands. So it is inevitable that people write poor queries. Or they turn to Object Relational Mappers to write bad queries for them.

So on Sunday when I got an urgent plea for help from a well practiced programmer friend who did not have a lot of SQL skills and a new 'to be updated' project filled with horrible queries. How horrible? Well, read on!

Bad Query Type Number 1 -- Wild Wildcards

SELECT * FROM foo WHERE x=y;

Everyone uses the * wildcard to get all the columns from a table. But what if you need only a few columns of data and the table is a few hundred columns wide, with a few CHAR(255)s, and a pair of BLOBS. All that data has to read off disk, transferred into memory, and then drained off a network interfaces into an application. The program code in question wanted an pair of INTs and a DATE. Not the several thousand other bits of data being pulled out of the database by the query.

This query, from a glance at the code, was the original core of the program and the proficient programmer was pretty confident that the original table had only a handful of columns when it was created. But sadly the table got bloated and the original fast performance is now history.

So use * as a quick-and-dirty to look at the data but please avoid it in production unless your really, really need all of the data.

Bad Query Type Number 2 -- No JOINS

One of the early developers left a comment that JOINs did not perform adequately. So what did they do instead? They read three different tables into memory and pared down the data within the application. So beside the problems of pulling a lot of data from the database into application the application gets bloated to perform an operation that database should be doing.

So where did the original code get the idea that JOINs do not perform? Well, on the Internet everything lives forever, especially if it is bad or outdated information. The early developer was nice enough to date their comments and the first version was built in the MySQL 4.x days. And there were JOINs back in those but I do not remember their performance being poor. But in this case this other developer was still around the company and was available for a quick phone call. Yes, they admitting writing the code and being warned by some bulletin board (Are there still BBS?) that performance of JOINs was less than optimum. While I was rewriting the query the author called back and pointed to a very old post where someone was complaining about needing to join 65 tables when the hard coded limit in MySQL was 64 as the original instigator.

And they there were other URLS of old, bad admonitions against using JOINS. Folks, a lot has changed over the 25 year life of MySQL. JOINs work very well, support the relational model that you should be using, and put the work on the DATABASE and not the application. And if you do search the internet for an answer to a problem please try to avoid coding advice that is old enough to vote or buy adult beverages.

Bad Query Type Number 3 -- Obfuscated Queries

One query was the equivalent of this:

SELECT (City.name)
   , (Country.name)
   FROM City
   JOIN Country on (City.CountryCode = Country.code)
   WHERE
       ((City.CountryCode=('USA'))
       or ((((City.District IS NULL)))))
       AND (((City.CountryCode IS NOT NULL)))
limit 10;

Wow! That is a lot of parentheses! Looking at the table description had both of the columns being tested against NULL were both created as not being null-able. The 'or' and 'and' clauses to the WHERE were not immediately evident of why they were there. Well, after another phone call to the original developer I found out the third or forth code developer was a fan of writing tricky to read code for reasons of job security. And yes they were eventually fired for writing code that even they could not maintain.

So do not obfuscate code to keep the next developer from figuring out what you are doing as most times you will be that next developer to work on that code but you will not remember you cute little code trick. Now the optimizer is smart enough to work out the proper query plan but the MySQL optimizer wants to optimize a query each time it sees it; so writing extra clauses puts more load on the optimizer and degrades performance marginally. Exercise for you -- See how the MySQL optimizer cleans up the above query against the World database, and the query plan may surprise you.

Wrap Up

So with a little bit of effort on both our parts, the well practiced programmer was able to get acceptable performance out of their software with a promise to refactor the rest of the queries very quickly. But in one setting we ran in to what I feel are three of the most common types of bad queries sent to MySQL servers, or other database servers.

Wednesday, August 26, 2020

Better VIEWs with the WITH CHECK OPTION

VIEWs have been a handy feature of MySQL for many years but do you know about the WITH CHECK OPTION clause? The WITH CHECK OPTION clause is used for a updatable views to prohibit the changes to those views that would produce rows which are not included in the defining query.

VIEW Definition

Since a lot of SQL novices read my blog, I'd like to start with the definition of a view "Views are stored queries that when invoked produce a result set. A view acts as a virtual table" according the MySQL Manual.

SQL > CREATE VIEW x_city AS

SELECT Name,

CountryCode,

District

FROM city;

Views are used to obfuscate column names, capture vital queries for later reuse, and many other reasons. For the most part we can treat it as a table. The above example joins data from two tables but we can treat the view as a separate, different table. As can be seen above we can select all the columns from the view or a subset of the columns defined in the view.

Updatable Views

Some views can be used to specify tables to be updated in data change statements. In the example below we will use a simple one integer column table.

SQL > create view v_2 as select * from t2;

Query OK, 0 rows affected (0.0072 sec)

SQL > select * from v_2;

+----+

| id |

+----+

| 1 |

| 3 |

| 5 |

| 7 |

| 9 |

| 11 |

+----+

6 rows in set (0.0009 sec)

SQL > insert into v_2 (id) values (99);

Query OK, 1 row affected (0.0060 sec)

SQL > select * from v_2;

+----+

| id |

+----+

| 1 |

| 3 |

| 5 |

| 7 |

| 9 |

| 11 |

| 99 |

+----+

The v_2 view updated the underlying table t2.

WITH CHECK OPTION

The WITH CHECK OPTION prevents inserts to rows for which the WHERE clause in the select_statement is not true. The view v_2a only retrieves data where the id value is greater than 4. If it is used in an attempt to update a value less than 5 it will raise and error.

SQL > CREATE VIEW v_2a AS

SELECT * FROM t2 WHERE id > 4 WITH CHECK OPTION;

Query OK, 0 rows affected (0.0102 sec)

SQL > SELECT * FROM v_2a;

+----+

| id |

+----+

| 5 |

| 7 |

| 9 |

| 11 |

| 99 |

+----+

5 rows in set (0.0009 sec)

SQL > insert into v_2a (id) values (2);

ERROR: 1369: CHECK OPTION failed 'demo.v_2a'

And note the error provides the name of the view and the schema to helpt track it down.

For more details please refer to the manual page on The View WITH CHECK OPTION Clause

Monday, August 24, 2020

MySQL Comparing INTs and CHARs

How does MySQL compare INTs with CHARs? I was asked this very question over the weekend and wanted to share that information plus some of the changes in MySQL 8.0.21 in this area. And yes those changes are pretty big.

Casting

Computers are good at making comparisons of two values but only if everything else the same. Comparing an integer with another integer is simple. Same data with same data type comparisons are a good thing. But what about when you need to compare a numeric 7 with a "7" where the number is in a string? In such cases one or both numbers need to be changed into the same basic data type. Imagine your favorite Harry Potter character waving their magic wand and shouting 'accio data' to change two different magical pieces of data into one common data type. No, Hogwarts was the the reason this conversion is called casting but this 'magic' needs to be made for a good comparison.

If you read the Optimizer Notes section of the MySQL 8.0.21 Release Notes you will run into a notice that MySQL injects casts to avoid mismatches for numeric and temporal data with string data. The big trick was keeping backward compatibility with previous versions while matching the SQL standard. Now when the optimizer is comparing numeric and temporal types and the expected data type does not match it will now add casting operations in the item tree inside expressions and conditions. For instance if you are comparing a YEAR to string they will both be converted to a DOUBLE.

Example

We have two tables and are comparing an INT to a CHAR. If we run EXPLAIN ANALYZE of the query we get the details.

explain analyze select *
from t1
join t2 on t2.k = t1.k\G

*************************** 1. row ***************************

EXPLAIN: -> Inner hash join (cast(t2.k as double) = cast(t1.k as double)) (cost=4.75 rows=5) (actual time=1.759..1.771 rows=5 loops=1)

-> Table scan on t1 (cost=0.22 rows=6) (actual time=1.670..1.677 rows=6 loops=1)

-> Hash

-> Table scan on t2 (cost=0.75 rows=5) (actual time=0.034..0.043 rows=5 loops=1)

If we look at the original query we are trying to join two tables where the CHAR t2.k is equal to the INT t1.k. The magenta highlighted text above shows where both the t2.k and the t1.k columns are cast as doubles.

Running EXPLAIN without the ANALYZE we can see the query plan's version of the query that has been generated by the optimizer.

explain select * from t1 join t2 on t2.k = t1.k\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t2
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 100
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 6
filtered: 16.666667938232422
Extra: Using where; Using join buffer (hash join)
2 rows in set, 1 warning (0.0010 sec)
Note (code 1003): /* select#1 */ select `demo`.`t1`.`id` AS `id`,`demo`.`t1`.`k` AS `k`,`demo`.`t2`.`id` AS `id`,`demo`.`t2`.`k` AS `k` from `demo`.`t1` join `demo`.`t2` where (cast(`demo`.`t2`.`k` as double) = cast(`demo`.`t1`.`k` as double))

We can see that the original query of select * from t1 join t2 on t2.k = t1.k has been rewritten to select `demo`.`t1`.`id` AS `id`,`demo`.`t1`.`k` AS `k`,`demo`.`t2`.`id` AS `id`,`demo`.`t2`.`k` AS `k` from `demo`.`t1` join `demo`.`t2` where (cast(`demo`.`t2`.`k` as double) = cast(`demo`.`t1`.`k` as double)) by the optimizer.

I highly recommend looking at the query plan to help understand what the MySQL server needs to do to make your query work.