Open Source Databases: August 2021

Tuesday, August 31, 2021

The MySQL Shell and the MySQL Config Editor

One of the things I love about the MySQL Shell is that is saves your authentication credentials (no longer just passwords) in an obfuscated login path file named .mylogin.cnf. You can find that file in the %APPDATA%\MySQL directory on Windows or the current user's home directory on non-Windows systems. The file can be read later by MySQL client programs, such as the MySQL Shell, to obtain your authentication credentials for connecting to MySQL Server.

Which Credentials?

I had an email this morning from someone wanting to a) change the password stored by mysqlsh and b) find out which credentials are stored on the shell account being used.

Let's start out in reverse order. The mysql_config_editor will show you the accounts with credentials saved for that login.

$ mysql_config_editor print --all
[root@localhost]
user = "root"
password = *****
host = "localhost"
[demouser@localhost]
user = "demouser"
password = *****
host = "localhost"

The two accounts on the shell account being used are root@localhost and demouser@localhost.

Quick Password Reset

To reset the server password on the second account use ALTER USER 'demouser'@'localhost' IDENTIFIED BY '<newpasswordhere>';

If you login again from mysqlsh and the the stored password does not work, you will be prompted for the new password. And that new password will be encrypted in stores in the .mylogin.cnf file.

Start From Scratch

Is you want remove all the stored authentication information then type mysql_config_editor reset and all will be removed.

Monday, August 30, 2021

Novices and Their SQL Misconceptions

Do yourself a favor and spend ten minutes watching this video on the CER2021 paper Identifying SQL Misconceptions of Novices: Findings from a Think-Aloud study. Structured Query Language is hard for many new programmers as it is a declarative language which makes it much different that the object orientated or procedural languages with which they started their software journeys.

The authors have categories problems in four general misconceptions programmers have -- Previous Knowledge, Generalization, Language, and Bad Mental Model. Those are then refined more and I am sure anyone who has to fix poor queries will immediately recognize.

Why novices should read this: Structured Query Language or SQL is a lot different than other languages and its logic is designed around data sets. The assumptions you bring in from other languages can hinder your developing SQL skills. The syntax and keywords are not what you are used to and frankly things about data groups instead of one row of data can be hard for some to master. This video and the paper will help you hurdle those knowledge gaps.

Why SQL professional should read this: After you have mastered SQL, it is easy to forget the points that you stumbled over learning it. it will also help you identify where your journals flounder so that hopefully you can aid them.

A few times each week I run across someone asking why SQL is 'so weird', where is the 'for....next' or while (...) flow control, or why you can not have a table of tables pointing to other tables. MySQL is a very easy database to learn but there is still a learning curve and trying to make it act like other software many not work.

So a big round of applause to the authors : Daphne Miedema, Efthimia Aivaloglou and George Fletcher.

Tuesday, August 24, 2021

PHP MySQL Connectors -- MySQLi, PDO, and/or X DevAPI

Help!

I am preparing a presentation for the Longhorn PHP conference titled PHP & MySQL -- How do PDO, MySQLi, and X DevAPI do what they do. This will be a comparison of using PHP with different MySQL Connectors. As far as I can tell there are no three way comparisons of mysqli, PDO, and the X DevAPI PECL extension. And much of the PDO versus myqli stuff looks to me like it has not aged well.

I have good material in the raw presentation about overall features, parameters for prepared queries. And a good section on the how's and whys on prepared queries.

But what else would you like to see in such a presentation? I have read some postings on the web about turning off buffering (fairly simple to do). But what else are would you like to see compared?

Performance? Well, my rudimentary and preliminary benchmarks do not really so much of a difference running on the same platform against the data. But do you know a case where one shines over the others?

Coding style? Are you more object orientated or procedural? So you like the you can use the X Protocol to skip having to write structured query language.

So if you have any input on what you would like to see in this presentation PLEASE send it to me!

Thursday, August 19, 2021

EXPLAINing the Different EXPLAINS In MySQL

The main tool for tuning MySQL queries is the EXPLAIN statement (https://dev.mysql.com/doc/refman/8.0/en/explain.html) and one of the hurdles on the learning curve of the EXPLAIN statement is explaining what EXPLAIN is explaining. Basically EXPLAIN is prepended to a SELECT, TABLE, DELETE, UPDATE, INSERT or REPLACE statement. To add to an already steep learning curve is that there are many types of EXPLAIN in MySQL.

Let Me Explain

It is very simple to add EXPLAIN at the very beginning of a query to see how the server wants to execute a query.

EXPLAIN SELECT col1, col2 FROM my_table;

The output will show the query plan (the actual query the optimizer will instruct the MySQL server to run) and some preliminary information about how the query plan was picked among the many possible options. Learning to use EXPLAIN to tune queries is a long process and beyond the scope of this simple article. This article is about the various EXPLAIN variants available with MySQL 8.0.

Starting with a relatively simple query using the tried and true world database that MySQL has used forever for examples, we will join two tables.

Prepending EXPLAIN to the query gives us the following:

SQL > EXPLAIN SELECT city.name as 'City',
country.name as 'country'
from city j
on country on (city.countrycode = country.code)
where district = 'Texas'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: city
partitions: NULL
type: ALL
possible_keys: CountryCode
key: NULL
key_len: NULL
ref: NULL
rows: 4188
filtered: 10
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: country
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: world.city.CountryCode
rows: 1
filtered: 100
Extra: NULL
2 rows in set, 1 warning (0.0006 sec)
Note (code 1003): /* select#1 */ select `world`.`city`.`Name` AS `City`,`world`.`country`.`Name` AS `country` from `world`.`city` join `world`.`country` where ((`world`.`country`.`Code` = `world`.`city`.`CountryCode`) and (`world`.`city`.`District` = 'Texas'))

By the way, that last line that starts with ‘Note’ is the query plan or the query the server will run to return the data requested in the query. You may notice that the query plan’s query is the optimized version of the original query and in this case it looks a bit different.

As mentioned earlier, the peering into of the entrails of the output is past the scope of this article. But the output above may overwhelm a novice to those using EXPLAIN but there is, believe it or not, some information missing. We can modify the EXPLAIN statement with a change in output format.

TREE Format

FORMAT=TREE provides information on the cost of the queries. What is that important? MySQL and most other relational databases use a cost based optimizer that evaluates the many choices to retrieve the data requested in a query.

Also the TREE format is a little more verbose in some areas. In the previous example EXPLAIN does tell those familiar with the terse out that a table scan is performed on the city table while the TREE format explicitly reports this fact. So TREE format is a little easier to read but does not have as much detail as the ‘traditional’ version.

SQL > EXPLAIN FORMAT=TREE SELECT city.name as 'City', country.name as 'country' from city join country on (city.countrycode = country.code) where district = 'Texas'\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join (cost=822.91 rows=419)
-> Filter: (city.District = 'Texas') (cost=425.05 rows=419)
-> Table scan on city (cost=425.05 rows=4188)
-> Single-row index lookup on country using PRIMARY (Code=city.CountryCode) (cost=0.85 rows=1)
1 row in set (0.0006 sec)

JSON Format

But what if you need more information, even more than the traditional option. Well, there is the JSON format. If you want details, this format is for you!

SQL > EXPLAIN FORMAT=JSON SELECT city.name as 'City', country.name as 'country' from city join country on (city.countrycode = country.code) where district = 'Texas'\G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "822.91"
},
"nested_loop": [
{
"table": {
"table_name": "city",
"access_type": "ALL",
"possible_keys": [
"CountryCode"
],
"rows_examined_per_scan": 4188,
"rows_produced_per_join": 418,
"filtered": "10.00",
"cost_info": {
"read_cost": "383.17",
"eval_cost": "41.88",
"prefix_cost": "425.05",
"data_read_per_join": "29K"
},
"used_columns": [
"Name",
"CountryCode",
"District"
],
"attached_condition": "(`world`.`city`.`District` = 'Texas')"
}
},
{
"table": {
"table_name": "country",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"Code"
],
"key_length": "3",
"ref": [
"world.city.CountryCode"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 418,
"filtered": "100.00",
"cost_info": {
"read_cost": "355.98",
"eval_cost": "41.88",
"prefix_cost": "822.91",
"data_read_per_join": "107K"
},
"used_columns": [
"Code",
"Name"
]
}
}
]
}
}
1 row in set, 1 warning (0.0010 sec)
Note (code 1003): /* select#1 */ select `world`.`city`.`Name` AS `City`,`world`.`country`.`Name` AS `country` from `world`.`city` join `world`.`country` where ((`world`.`country`.`Code` = `world`.`city`.`CountryCode`) and (`world`.`city`.`District` = 'Texas'))

VISUAL Explain

Well the tree options shown so far are not exactly esthetically pleasing but they are useful for tuning queries. But there is another EXPLAIN that is available in MySQL Workbench to provide yet another look at how the query works. VISUAL Explain provides much of the information from the other EXPLAINs but in a visually pleasing format. And it is a great addition to documentation on queries.

Estimated Performance versus Real Numbers

Now you know the difference between the various EXPLAIN formats available in MySQL 8.0. But all those numbers and information are based on estimates from the server statistics. Sadly those statistics may be old or skewed by changes to the database. The query itself is not run and the optimizer is only using historical data which may not be accurate. If you need real statistics there is an ability to add ANALYZE to any EXPLAIN statement to have the query run to see the actual performance numbers.

SQL > EXPLAIN ANALYZE FORMAT=TREE SELECT city.name as 'City', country.name as 'country' from city join country on (city.countrycode = country.code) where district = 'Texas'\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join (cost=822.91 rows=419) (actual time=1.611..1.734 rows=26 loops=1)
-> Filter: (city.District = 'Texas') (cost=425.05 rows=419) (actual time=1.580..1.692 rows=26 loops=1)
-> Table scan on city (cost=425.05 rows=4188) (actual time=0.051..1.265 rows=4079 loops=1)
-> Single-row index lookup on country using PRIMARY (Code=city.CountryCode) (cost=0.85 rows=1) (actual time=0.001..0.001 rows=1 loops=26)
1 row in set (0.0023 sec)

Hopefully you can now explain the various EXPLAIN output formats and as you work to tune your queries. Each has their use in query tuning and making your MySQL server more responsive.

JavaScript and Friends this Friday

JavaScript and Friends starts tomorrow and I will be speaking on JavaScript and the MySQL Shell. JavaScript is one of the three modes of input for mysqlsh!

Plus you see how to register reports with the shell so they will be ready when you are.

Tickets range from $0 to $75.

Friday, August 6, 2021

Finding Your Hidden InnoDB Primary Indexes

InnoDB is the default storage engine for MySQL and InnoDB prefers that there is a PRIMARY KEY defined when a table is defined. Having a unique non-nullable primary key can vastly speed up queries and data is stored by the primary key in a B+ Tree structure.

What if a primary key is not defined, InnoDB will use the first unique key defined as NOT NULL. Failing that, InnoDB generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column that contains a 6-byte automatically increasing number when rows are inserted. This is a key that you can not use for searches (it is hidden from you!) and is not directly benefitting you. And that is probably not what you want.

To find those columns you need to look in the INFORMATION_SCHEMA with a query like this:

SELECT i.TABLE_ID,
t.NAME
FROM INNODB_INDEXES i
JOIN

INNODB_TABLES t ON (i.TABLE_ID = t.TABLE_ID)
WHERE

i.NAME='GEN_CLUST_INDEX';

Run this on your instance and see if there are any surprises where you thought you had a primary key but in reality you do not.

+----------+----------------------+
| TABLE_ID | NAME |
+----------+----------------------+
| 1204 | json/t1 |
| 1206 | so/testfloat |
| 1209 | so/tab1 |
| 1210 | so/q |
| 1211 | so/j |
| 1238 | sunshine/gentest |
| 1368 | testx/austin |
| 1374 | gis/geometries |
| 1376 | gis/xy |
| 1377 | gis/geom |
| 1386 | gis/worldmap |
| 1381 | gis/geometry_columns |
| 1407 | gis/mypoint |
| 1409 | gis/zipcode |
| 1410 | zip/zipcode |
+----------+----------------------+
15 rows in set (0.0060 sec)

Can you leave them like this, with the hidden index? Yup, sure you can. If the columns with the name GEN_CLUST_INDEX are there then the server has been making some use of them.

Or you can run ALTER TABLE and designate an existing column as a primary key or to add a new column for that purpose.

Or add a invisible column to be the new primary key if you have any doubts about not bothering existing queries that would be bothered by the sudden appearance of a new column in the data.

Thursday, August 5, 2021

Work at MySQL

I was asked today what opportunities Oracle and MySQL had for new college graduates. I referred them to Start Your Success Story as the place to start. Oracle is a big company and always on the lookout for talent. The list of internships & programs is extensive.

But specifically what about MySQL? For someone who is not a recent graduate?

The Oracle Job Search page is a great place to start. If you enter the keyword 'MySQL' today, you will find over ninety jobs listed in jobs ranging from Quality Analysis, software engineers, sales, technical writers and more.

If you don't see an exact match for your skilled, send in your resume or CV as the hiring managers are aggressively looking for skilled individuals and the web site might be just a smidgeon out of date.

Tuesday, August 3, 2021

Writing Data From the Pandas Data Analysis Library To MySQL

In the last installment the flexible and easy to use open source data analysis and manipulation tool Pandas, built on top of the Python programming language, was used with the MySQL Shell, aka mysqlsh. Pandas is a very popular open source project that features data structures and tools to manipulate numerical tables. And the name is derived from Panel Data and not the bamboo chewing critters.

Tasks like generating a series is simple:

dstokes@localhost pandas]$ python
Python 3.9.6 (default, Jul 16 2021, 00:00:00)
[GCC 11.1.1 20210531 (Red Hat 11.1.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series([1, 3, 5, 11, 29, 42])
>>> s
0 1
1 3
2 5
3 11
4 29
5 42
dtype: int64
>>>

Data frame is a table presentation concept that is easy to use. Below some data on various guitars is presented:

>>> df = pd.DataFrame(
... { "Brand":

[ "Fender Telecaster", "Gretsch 5420", "Gibson Flying-V"],
... "color": [ "blonde", "black", "natural"],
... }
... )
>>> df
Brand color
0 Fender Telecaster blonde
1 Gretsch 5420 black
2 Gibson Flying-V natural
>>>

Hmm, does the above data frame definition look a bit like JSON to anyone besides me? I will have to experiment with loading and saving from MySQL's JSON data type. In a future installment I will dig a bit more deeply.

But How Do You Save Data??

The last blog on Pandas generated some questions about saving data from Pandas into MySQL.

In the example below the data from a series is saved to a table. This time we are not using mysqlsh and instead uses a overly simple script that uses the MySQL Python connector.

Numpy's random number generator is combined with Panda's one dimensional array class s = pd.Series(np.random.randn(1)) to produce a numeric value.

#!/usr/bin/python

import pandas as pd

import numpy as np

import mysql.connector

import random
import string

#Generate a random letter
randomLetter = random.choice(string.ascii_letters)

#Generate a random number
s = pd.Series(np.random.randn(1))

try:

mydb = mysql.connector.connect(host="localhost",

database="bear", user="panda", passwd="123Bear!")
mycursor = mydb.cursor()
sql = "INSERT INTO t1 (myletter, myvalue) VALUES (%s, %s)"
val = [randomLetter , str(s[0])]
mycursor.execute(sql,val)
mydb.commit()

except Exception as e:
mydb.close()
print(str(e))

Now there are other connectors but my preference is the MySQL Python Connector for the reasons that it works exceedingly well and in simple to use. Now this is a very simple example and yes the code works within a script and within mysqlsh.