Open Source Databases

Friday, August 6, 2021

Finding Your Hidden InnoDB Primary Indexes

InnoDB is the default storage engine for MySQL and InnoDB prefers that there is a PRIMARY KEY defined when a table is defined. Having a unique non-nullable primary key can vastly speed up queries and data is stored by the primary key in a B+ Tree structure.

What if a primary key is not defined, InnoDB will use the first unique key defined as NOT NULL. Failing that, InnoDB generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column that contains a 6-byte automatically increasing number when rows are inserted. This is a key that you can not use for searches (it is hidden from you!) and is not directly benefitting you. And that is probably not what you want.

To find those columns you need to look in the INFORMATION_SCHEMA with a query like this:

SELECT i.TABLE_ID,
t.NAME
FROM INNODB_INDEXES i
JOIN

INNODB_TABLES t ON (i.TABLE_ID = t.TABLE_ID)
WHERE

i.NAME='GEN_CLUST_INDEX';

Run this on your instance and see if there are any surprises where you thought you had a primary key but in reality you do not.

+----------+----------------------+
| TABLE_ID | NAME |
+----------+----------------------+
| 1204 | json/t1 |
| 1206 | so/testfloat |
| 1209 | so/tab1 |
| 1210 | so/q |
| 1211 | so/j |
| 1238 | sunshine/gentest |
| 1368 | testx/austin |
| 1374 | gis/geometries |
| 1376 | gis/xy |
| 1377 | gis/geom |
| 1386 | gis/worldmap |
| 1381 | gis/geometry_columns |
| 1407 | gis/mypoint |
| 1409 | gis/zipcode |
| 1410 | zip/zipcode |
+----------+----------------------+
15 rows in set (0.0060 sec)

Can you leave them like this, with the hidden index? Yup, sure you can. If the columns with the name GEN_CLUST_INDEX are there then the server has been making some use of them.

Or you can run ALTER TABLE and designate an existing column as a primary key or to add a new column for that purpose.

Or add a invisible column to be the new primary key if you have any doubts about not bothering existing queries that would be bothered by the sudden appearance of a new column in the data.

Thursday, August 5, 2021

Work at MySQL

I was asked today what opportunities Oracle and MySQL had for new college graduates. I referred them to Start Your Success Story as the place to start. Oracle is a big company and always on the lookout for talent. The list of internships & programs is extensive.

But specifically what about MySQL? For someone who is not a recent graduate?

The Oracle Job Search page is a great place to start. If you enter the keyword 'MySQL' today, you will find over ninety jobs listed in jobs ranging from Quality Analysis, software engineers, sales, technical writers and more.

If you don't see an exact match for your skilled, send in your resume or CV as the hiring managers are aggressively looking for skilled individuals and the web site might be just a smidgeon out of date.

Tuesday, August 3, 2021

Writing Data From the Pandas Data Analysis Library To MySQL

In the last installment the flexible and easy to use open source data analysis and manipulation tool Pandas, built on top of the Python programming language, was used with the MySQL Shell, aka mysqlsh. Pandas is a very popular open source project that features data structures and tools to manipulate numerical tables. And the name is derived from Panel Data and not the bamboo chewing critters.

Tasks like generating a series is simple:

dstokes@localhost pandas]$ python
Python 3.9.6 (default, Jul 16 2021, 00:00:00)
[GCC 11.1.1 20210531 (Red Hat 11.1.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series([1, 3, 5, 11, 29, 42])
>>> s
0 1
1 3
2 5
3 11
4 29
5 42
dtype: int64
>>>

Data frame is a table presentation concept that is easy to use. Below some data on various guitars is presented:

>>> df = pd.DataFrame(
... { "Brand":

[ "Fender Telecaster", "Gretsch 5420", "Gibson Flying-V"],
... "color": [ "blonde", "black", "natural"],
... }
... )
>>> df
Brand color
0 Fender Telecaster blonde
1 Gretsch 5420 black
2 Gibson Flying-V natural
>>>

Hmm, does the above data frame definition look a bit like JSON to anyone besides me? I will have to experiment with loading and saving from MySQL's JSON data type. In a future installment I will dig a bit more deeply.

But How Do You Save Data??

The last blog on Pandas generated some questions about saving data from Pandas into MySQL.

In the example below the data from a series is saved to a table. This time we are not using mysqlsh and instead uses a overly simple script that uses the MySQL Python connector.

Numpy's random number generator is combined with Panda's one dimensional array class s = pd.Series(np.random.randn(1)) to produce a numeric value.

#!/usr/bin/python

import pandas as pd

import numpy as np

import mysql.connector

import random
import string

#Generate a random letter
randomLetter = random.choice(string.ascii_letters)

#Generate a random number
s = pd.Series(np.random.randn(1))

try:

mydb = mysql.connector.connect(host="localhost",

database="bear", user="panda", passwd="123Bear!")
mycursor = mydb.cursor()
sql = "INSERT INTO t1 (myletter, myvalue) VALUES (%s, %s)"
val = [randomLetter , str(s[0])]
mycursor.execute(sql,val)
mydb.commit()

except Exception as e:
mydb.close()
print(str(e))

Now there are other connectors but my preference is the MySQL Python Connector for the reasons that it works exceedingly well and in simple to use. Now this is a very simple example and yes the code works within a script and within mysqlsh.

Tuesday, July 20, 2021

Pandas and The MySQL Shell

Pandas is a Python software library for data analysis and manipulation. It allows you to import data from CSV files, SQL, and Excel. And it is open source.

Pandas works well with the MySQL Shell. After installing Pandas with pip install pandas on my Fedora test system, I was able to load Pandas. The documentation is excellent at pandas.pydata.org and I was quickly able to get one of their examples working from within mysqlsh.

Did I mention SQL as a source of data? Reading from MySQL is very easy.

import mysql.connector as connection
import pandas as pd

try:
mydb = connection.connect(host="localhost", database = 'world',user="panda", passwd="123Bear!")
query = "SELECT * FROM country LIMIT 5;"
result_dataFrame = pd.read_sql(query,mydb)
mydb.close() #close the connection

except Exception as e:

mydb.close()
print(str(e))

I will investigate Pandas more and report anything I find.

MySQL 8.0.26 And Interesting Items in the Release Notes

You may have noticed that MySQL 8.0.26 was released today (yea!) and I recommend reading the release notes but here are some of the highlights.

TLS version 1 and 1.1 are deprecated. Please use later versions such as 1.2 and 1.3 (you made need OpenSSL 1.1.1 or higher too).
More information in the server log on client timeouts includes the timeout value, and client user and host when that information is available
Internal functions to copy values between columns are improved and test at about11% faster (YMMV)

There are a lot of interesting bits of information in the release notes and they show the amount of work that the MySQL Engineering Teams put into a release to give you a better product.

But What Are Those Warnings?

But some of you may be seeing the following messages:

The syntax 'sync_master_info' is deprecated and will be removed in a future release. Please use sync_source_info instead.

'NO_ZERO_DATE', 'NO_ZERO_IN_DATE' and 'ERROR_FOR_DIVISION_BY_ZERO' sql modes should be used with strict mode. They will be merged with strict mode in a future release.

What does that mean?

The first warning is part of a continuing process of replacing offensive terms (see MySQL Terminology Updates) and the new syntax is sync_source_info, replacing sync_master_info. These changes are much more involved than a simple global search and replace in the source code. According to the release notes most of the changes have been made along with the corresponding help text.

The second is a warning about upcoming changes in STRICT MODE (stricter mode?) adding settings.

From the MySQL Manual

The default SQL mode in MySQL 8.0 includes these modes: ONLY_FULL_GROUP_BY, STRICT_TRANS_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, and NO_ENGINE_SUBSTITUTION.

And also peek at Strict SQL Mode

And as of 8.0.26 you can set your session to strict mode only if you the privileges' to set restricted variables.

As usual you can download from https://www.mysql.com/downloads/

Tuesday, July 13, 2021

Do You Really Need to Upgrade from MySQL Native Password to Caching SHA256 Password Authentication?

If you have not read MDS, PHP and authentication by my brilliant colleague LeFred please do so. The TL;DR is that most recent versions of the PHP support the MySQL 8.0 standard for authentication, Caching SHA 256, but with PHP 7.3 and earlier you will have to use the older MySQL Native Password authentication.

Someone reached out to me with a direct message to ask me if they really needed to change from MySQL native to Caching SHA 256.

Well, what versions of MySQL are you using? The good news is that they are running 8.0.23 in production, '25 in test and development. But all the accounts are using MySQL Native Authentication.

What versions of PHP? There was some hesitation before this question was answered. Dev is 8.0.8, test is a mixture 8.0.8, 7.4.21, and 7.4.21, and production 7.4.21. With one little exception of 7.4.20 that they are trying to move away from, they promised, by the end of the month.

Do we have to re-do all the accounts to access MySQL to this new method? Well, it depends. In this case they had several projects using for the most part unique schemas. And each had separate usernames & authentication strings -- they were not using the MySQL root account for everything thing. Passwords were rotated a few times each year.

No direct access from the internet to the instances? Well, there is one application that directly connects through a dedicated port on the firewall. Everything else that touches the MySQL instances are within the firewall. And nobody really seems to check on just who is using that connection or audits activity on that direct connection.

The Caching SHA 256 method is much more secure hashing method. It is required to use either a TLS connection or an unencrypted connection that supports password exchange using an RSA key pair. Please see SHA-256 Pluggable Authentication. By default MySQL 8.0 sets up secure connections between the clients and servers.

I asked if there was anything in their data that was sensitive, proprietary, covered by privacy laws, or anything they just did not want made public. Well, yes there was.

Do They Need to Upgrade?

I told this person, no, you do not need to immediately upgrade from MySQL native to SHA 256 but I would suggest that they plan to upgrade over the next several months. Other than being paranoid about the one direct connection through the firewall, it sounded like they were keeping on top of things.

Being paranoid about the security of your data should be the default mode of and DBA or developer. Going with a lower standard than the MySQL 8.0 defaults means you are actively downgrading your protection.

I recommend reading Protecting MySQL Passwords With the sha256_password Plugin from the MySQL Server Team Blog for more technical details on using SHA 256 authentication.

Tuesday, July 6, 2021

SERIAL Columns for Indexes

SERIAL is a column definition attribute that I use frequently and it caught me by surprise when someone looking at a slide deck of a past presentation of mine asked about my use of it. This person is a long time user of MySQL but did not know this keyword. So I need to spread the word about SERIAL. This might save you some typing and some related fat fingering.

Defining Your Index Column

When creating a table you want a PRIMARY KEY column. And even with the SERIAL attribute you have to designate which column this is. You cannot use more than one SERIAL designated column or you will get ERROR: 1075: Incorrect table definition; there can be only one auto column and it must be defined as a key. Why PRIMARY KEY is not added to the other good stuff include with SERIAL does not make sense to me but there are probably dozens of reasons plus the SQL standard arguing against me here.

Most of the time INTEGERS are used as the data type for that key column. Having the system automatically increment the count for the column is not only handy but provides uniqueness, at least to some extent. You want to avoid NULL values as they can cause a whole bucket of problems on an indexed field. And it helps to use only positive values (up to 2^63 - 1). And most of the time you have no idea how many records there will be so you use the BIGINT type just so that you should never run out of numbers (even though smaller indexes have some performance benefits you should be aware of).

So that means you have to specify an unsigned big integer, not nullable, auto incremented column. That is a lot of things to specify and I like to take shortcuts that make sense. That is why I like SERIAL. In the following example it is easy to see that the six letter 'SERIAL' is a lot less typing than 'bigint unsigned NOT NULL AUTO_INCREMENT' string it encapsulates.

SQL > create table pp1 (id SERIAL PRIMARY KEY, doc_number INT UNSIGNED)\G
Query OK, 0 rows affected (0.0334 sec)
SQL > SHOW CREATE TABLE pp1\G
*************************** 1. row ***************************
Table: pp1
Create Table: CREATE TABLE `pp1` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`doc_number` int unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.0049 sec)
SQL >

Anything that improves readability and reduces the opportunity for yours truly to fat finger is a big plus.