Monday, August 31, 2020

Three Bad MySQL Query Types You May Be Writing

    One of the problems with Structured Query Language as opposed to languages like C++, Java, Python, or PHP is that it is hard to tell if a query is good or bad just by looking at it.  A programmer with a few months of coding experience can be shown a snippet of code from a project and have a pretty good chance of telling if that code is good or not.  Some call this a 'code smell'.

    A programmer can tell if the code is legible, understandable, and many times if it performs well often times at just a glance.  Want to do an experiment on this?  Go out to Github and check out a handful of random projects.  Good code often just looks good.

    But SQL is different.  It is mostly a declarative language, telling the computer what it wants rather than how to make what it wants. You can random pull SQL queries off Gitthub or Stackoverflow or Reddit and can not tell if a query is really good or really bad.

    Why?  SQL depends a lot on the underlying structure of the data.  Looking for a phone number? Is it in a column designed to hold the phone number, or buried in a long string of data?  Separate country/area code from the main number?  Is a second phone number kept in different records, different columns, or not stored at all?

    Is there relational logic, foreign keys, and how are they implemented?  Are there keys? What do the indexes really index? Are there any histograms?  These things are not evident by just looking at the query. So you end up using some variant of EXPLAIN along with DESCRIBE TABLE to start to dig into things.  

    Plus SQL has a unique learning curve once you get past the basic UPDATE, SELECT, and DELETE commands.  So it is inevitable that people write poor queries.  Or they turn to Object Relational Mappers to write bad queries for them.  

    So on Sunday when I got an urgent plea for help from a well practiced programmer friend who did not have a lot of SQL skills and a new 'to be updated' project filled with horrible queries.  How horrible? Well, read on!

Bad Query Type Number 1 -- Wild Wildcards

    SELECT * FROM foo WHERE x=y;

    Everyone uses the * wildcard to get all the columns from a table.  But what if you need only a few columns of data and the table is a few hundred columns wide, with a few CHAR(255)s, and a pair of BLOBS.  All that data has to read off disk, transferred into memory, and then drained off a network interfaces into an application.  The program code in question wanted an pair of INTs and a DATE.  Not the several thousand other bits of data being pulled out of the database by the query.

    This query, from a glance at the code, was the original core of the program and the proficient programmer was pretty confident that the original table had only a handful of columns when it was created.  But sadly the table got bloated and the original fast performance is now history.

    So use * as a quick-and-dirty to look at the data but please avoid it in production unless your really, really need all of the data.

Bad Query Type Number 2 -- No JOINS

    One of the early developers left a comment that JOINs did not perform adequately.  So what did they do instead?  They read three different tables into memory and pared down the data within the application.  So beside the problems of pulling a lot of data from the database into application the application gets bloated to perform an operation that database should be doing.  

    So where did the original code get the idea that JOINs do not perform?  Well, on the Internet everything lives forever, especially if it is bad or outdated information.  The early developer was nice enough to date their comments and the first version was built in the MySQL 4.x days.  And there were JOINs back in those  but I do not remember their performance being poor.  But in this case this other  developer was still around the company and was available for a quick phone call. Yes, they admitting writing the code and being warned by some bulletin board (Are there still BBS?)  that performance of JOINs was less than optimum.  While I was rewriting the query the author called back and pointed to a very old post where someone was complaining about needing to join 65 tables when the hard coded limit in MySQL was 64 as the original instigator.

    And they there were other URLS of old, bad admonitions against using JOINS.  Folks, a lot has changed over the 25 year life of MySQL. JOINs work very well, support the relational model that you should be using, and put the work on the DATABASE and not the application.  And if you do search the internet for an answer to a problem please try to avoid coding advice that is old enough to vote or buy adult beverages.

Bad Query Type Number 3 -- Obfuscated Queries

    One query was the equivalent of  this:

SELECT (City.name)
    , (Country.name) 
    FROM City 
    JOIN Country on (City.CountryCode = Country.code) 
    WHERE 
        ((City.CountryCode=('USA')) 
        or ((((City.District IS NULL))))) 
        AND (((City.CountryCode IS NOT NULL))) 
limit 10;

    Wow! That is a lot of parentheses!  Looking at the table description had both of the columns being tested against NULL were both created as not being null-able.  The 'or' and 'and' clauses to the WHERE were not immediately evident of why they were there. Well, after another phone call to the original developer I found out the third or forth code developer was a fan of writing tricky to read code for reasons of job security.  And yes they were eventually fired for writing code that even they could not maintain.

    So do not obfuscate code to keep the next developer from figuring out what you are doing as most times you will be that next developer to work on that code but you will not remember you cute little code trick.   Now the optimizer is smart enough to work out the proper query plan but the MySQL optimizer wants to optimize a query each time it sees it; so writing extra clauses puts more load on the optimizer and degrades performance marginally.  Exercise for you -- See how the MySQL optimizer cleans up the above query against the World database, and the query plan may surprise you.

Wrap Up

    So with a little bit of effort on both our parts, the well practiced programmer was able to get acceptable performance out of their software with a promise to refactor the rest of the queries very quickly.  But in one setting we ran in to what I feel are three of the most common types of bad queries sent to MySQL servers, or other database servers.