Thursday, August 19, 2021

EXPLAINing the Different EXPLAINS In MySQL

     The main tool for tuning MySQL queries is the EXPLAIN statement (https://dev.mysql.com/doc/refman/8.0/en/explain.html) and one of the hurdles on the learning curve of the EXPLAIN statement is explaining what EXPLAIN is explaining.  Basically EXPLAIN is prepended to a SELECT, TABLE, DELETE, UPDATE, INSERT or REPLACE statement.  To add to an already steep learning curve is that there are many types of EXPLAIN in MySQL.

Let Me Explain

It is very simple to add EXPLAIN at the very beginning of a query to see how the server wants to execute a query. 

EXPLAIN SELECT col1, col2 FROM my_table;

The output will show the query plan (the actual query the optimizer will instruct the MySQL server to run) and some preliminary information about how the query plan was picked among the many possible options.  Learning to use EXPLAIN to tune queries is a long process and beyond the scope of this simple article.   This article is about the various EXPLAIN variants available with MySQL 8.0.

Starting with a relatively simple query using the tried and true world database that MySQL has used forever for examples, we will join two tables.

SQL > SELECT city.name as 'City',
             country.name as 'country' 
           from city 
           join country on (city.countrycode = country.code) 
           where district = 'Texas';
+----------------+---------------+
| City           | country       |
+----------------+---------------+
| Houston        | United States |
| Dallas         | United States |
| San Antonio    | United States |
| Austin         | United States |
| El Paso        | United States |
| Fort Worth     | United States |
| Arlington      | United States |
| Corpus Christi | United States |
| Plano          | United States |
| Garland        | United States |
| Lubbock        | United States |
| Irving         | United States |
| Laredo         | United States |
| Amarillo       | United States |
| Brownsville    | United States |
| Pasadena       | United States |
| Grand Prairie  | United States |
| Mesquite       | United States |
| Abilene        | United States |
| Beaumont       | United States |
| Waco           | United States |
| Carrollton     | United States |
| McAllen        | United States |
| Wichita Falls  | United States |
| Midland        | United States |
| Odessa         | United States |
+----------------+---------------+
26 rows in set (0.0017 sec)

Prepending EXPLAIN to the query gives us the following:


SQL > EXPLAIN SELECT city.name as 'City', 
                     country.name as 'country' 
          from city j
            on country on (city.countrycode = country.code) 
         where district = 'Texas'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: city
   partitions: NULL
         type: ALL
possible_keys: CountryCode
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4188
     filtered: 10
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: country
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: world.city.CountryCode
         rows: 1
     filtered: 100
        Extra: NULL
2 rows in set, 1 warning (0.0006 sec)
Note (code 1003): /* select#1 */ select `world`.`city`.`Name` AS `City`,`world`.`country`.`Name` AS `country` from `world`.`city` join `world`.`country` where ((`world`.`country`.`Code` = `world`.`city`.`CountryCode`) and (`world`.`city`.`District` = 'Texas'))

By the way, that last line that starts with ‘Note’ is the query plan or the query the server will run to return the data requested in the query.  You may notice that the query plan’s query is the optimized version of the original query and in this case it looks a bit different.

As mentioned earlier, the peering into of the entrails of the output is past the scope of this article. But the output above may overwhelm a novice to those using EXPLAIN but there is, believe it or not, some information missing.  We can modify the EXPLAIN statement with a change in output format.  

TREE Format

FORMAT=TREE provides information on the cost of the queries. What is that important? MySQL and most other relational databases use a cost based optimizer that evaluates the many choices to retrieve the data requested in a query.

Also the TREE format is a little more verbose in some areas.  In the previous example EXPLAIN does tell those familiar with the terse out that a table scan is performed on the city table while the TREE format explicitly reports this fact. So TREE format is a little easier to read but does not have as much detail as the ‘traditional’ version.

SQL > EXPLAIN FORMAT=TREE SELECT city.name as 'City', country.name as 'country' from city join country on (city.countrycode = country.code) where district = 'Texas'\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join  (cost=822.91 rows=419)
    -> Filter: (city.District = 'Texas')  (cost=425.05 rows=419)
        -> Table scan on city  (cost=425.05 rows=4188)
    -> Single-row index lookup on country using PRIMARY (Code=city.CountryCode)  (cost=0.85 rows=1)
1 row in set (0.0006 sec)

JSON Format

But what if you need more information, even more than the traditional option. Well, there is the JSON format. If you want details, this format is for you!

SQL > EXPLAIN FORMAT=JSON SELECT city.name as 'City', country.name as 'country' from city join country on (city.countrycode = country.code) where district = 'Texas'\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "822.91"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "city",
          "access_type": "ALL",
          "possible_keys": [
            "CountryCode"
          ],
          "rows_examined_per_scan": 4188,
          "rows_produced_per_join": 418,
          "filtered": "10.00",
          "cost_info": {
            "read_cost": "383.17",
            "eval_cost": "41.88",
            "prefix_cost": "425.05",
            "data_read_per_join": "29K"
          },
          "used_columns": [
            "Name",
            "CountryCode",
            "District"
          ],
          "attached_condition": "(`world`.`city`.`District` = 'Texas')"
        }
      },
      {
        "table": {
          "table_name": "country",
          "access_type": "eq_ref",
          "possible_keys": [
            "PRIMARY"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "Code"
          ],
          "key_length": "3",
          "ref": [
            "world.city.CountryCode"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 418,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "355.98",
            "eval_cost": "41.88",
            "prefix_cost": "822.91",
            "data_read_per_join": "107K"
          },
          "used_columns": [
            "Code",
            "Name"
          ]
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.0010 sec)
Note (code 1003): /* select#1 */ select `world`.`city`.`Name` AS `City`,`world`.`country`.`Name` AS `country` from `world`.`city` join `world`.`country` where ((`world`.`country`.`Code` = `world`.`city`.`CountryCode`) and (`world`.`city`.`District` = 'Texas'))

VISUAL Explain

Well the tree options shown so far are not exactly esthetically pleasing but they are useful for tuning queries.  But there is another EXPLAIN that is available in MySQL Workbench to provide yet another look at how the query works. VISUAL Explain provides much of the information from the other EXPLAINs but in a visually pleasing format.  And it is a great addition to documentation on queries.  

 











Estimated Performance versus Real Numbers

Now you know the difference between the various EXPLAIN formats available in MySQL 8.0.  But all those numbers and information are based on estimates from the server statistics.  Sadly those statistics may be old or skewed by changes to the database. The query itself is not run and the optimizer is only using historical data which may not be accurate.  If you need real statistics there is an ability to add ANALYZE to any EXPLAIN statement to have the query run to see the actual performance numbers.

SQL > EXPLAIN ANALYZE FORMAT=TREE SELECT city.name as 'City', country.name as 'country' from city join country on (city.countrycode = country.code) where district = 'Texas'\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join  (cost=822.91 rows=419) (actual time=1.611..1.734 rows=26 loops=1)
    -> Filter: (city.District = 'Texas')  (cost=425.05 rows=419) (actual time=1.580..1.692 rows=26 loops=1)
        -> Table scan on city  (cost=425.05 rows=4188) (actual time=0.051..1.265 rows=4079 loops=1)
    -> Single-row index lookup on country using PRIMARY (Code=city.CountryCode)  (cost=0.85 rows=1) (actual time=0.001..0.001 rows=1 loops=26)
1 row in set (0.0023 sec)

Hopefully you can now explain the various EXPLAIN output formats and as you work to tune your queries. Each has their use in query tuning and making your MySQL server more responsive.