Tuesday, July 20, 2021

Pandas and The MySQL Shell

    Pandas is a Python software library for data analysis and manipulation.  It allows you to import data from CSV files, SQL, and Excel.  And it is open source.

    Pandas works well with the MySQL Shell.  After installing Pandas with pip install pandas on my Fedora test system, I was able to load Pandas.  The documentation is excellent at pandas.pydata.org   and I was quickly able to get one of their examples working from within mysqlsh.


Did I mention SQL as a source of data? Reading from MySQL is very easy.  


import mysql.connector as connection
import pandas as pd

try:
    mydb = connection.connect(host="localhost", database = 'world',user="panda", passwd="123Bear!")
    query = "SELECT * FROM country LIMIT 5;"
    result_dataFrame = pd.read_sql(query,mydb)
    mydb.close() #close the connection

except Exception as e:
    mydb.close()
    print(str(e))

    I will investigate Pandas more and report anything I find.




MySQL 8.0.26 And Interesting Items in the Release Notes

You may have noticed that MySQL 8.0.26 was released today (yea!)  and I recommend reading the release notes but here are some of the highlights.

  • TLS version 1 and 1.1 are deprecated. Please use later versions such as 1.2 and 1.3 (you made need OpenSSL 1.1.1 or higher too).
  • More information in the server log on client timeouts includes the timeout value, and client user and host when that information is available
  • Internal functions to copy values between columns are improved and test at about11% faster (YMMV)
There are a lot of interesting bits of information in the release notes and they show the amount of work that the MySQL Engineering Teams put into a release to give you a better product.

But What Are Those Warnings?

But some of you may be seeing the following messages:

The syntax 'sync_master_info' is deprecated and will be removed in a future release. Please use sync_source_info instead.

'NO_ZERO_DATE', 'NO_ZERO_IN_DATE' and 'ERROR_FOR_DIVISION_BY_ZERO' sql modes should be used with strict mode. They will be merged with strict mode in a future release.

What does that mean?

The first warning is part of a continuing process of replacing offensive terms (see MySQL Terminology Updates) and the new syntax is sync_source_info, replacing sync_master_info. These changes are much more involved than a simple global search and replace in the source code.  According to the release notes most of the changes have been made along with the corresponding help text.

The second is a warning about upcoming changes in STRICT MODE (stricter mode?) adding settings.

From the MySQL Manual

The default SQL mode in MySQL 8.0 includes these modes: ONLY_FULL_GROUP_BYSTRICT_TRANS_TABLESNO_ZERO_IN_DATENO_ZERO_DATEERROR_FOR_DIVISION_BY_ZERO, and NO_ENGINE_SUBSTITUTION.

And also peek at Strict SQL Mode 

And as of 8.0.26 you can set your session to strict mode only if you the privileges' to set restricted variables. 

As usual you can download from https://www.mysql.com/downloads/

Tuesday, July 13, 2021

Do You Really Need to Upgrade from MySQL Native Password to Caching SHA256 Password Authentication?

    If you have not read MDS, PHP and authentication by my brilliant colleague LeFred please do so. The TL;DR is that most recent versions of the PHP support the MySQL 8.0 standard for authentication, Caching SHA 256, but with PHP 7.3 and earlier you will have to use the older MySQL Native Password authentication.

    Someone reached out to me with a direct message to ask me if they really needed to change from MySQL native to Caching SHA 256.  

    Well, what versions of MySQL are you using?  The good news is that they are running 8.0.23 in production, '25 in test and development. But all the accounts are using MySQL Native Authentication.

    What versions of PHP?  There was some hesitation before this question was answered.  Dev is 8.0.8, test is a mixture 8.0.8, 7.4.21, and 7.4.21, and production 7.4.21. With one little exception of 7.4.20 that they are trying to move away from, they promised, by the end of the month.

    Do we have to re-do all the accounts to access MySQL to this new method?  Well, it depends.   In this case they had several projects using for the most part unique schemas. And each had separate usernames & authentication strings -- they were not using the MySQL root account for everything thing.  Passwords were rotated a few times each year.  

    No direct access from the internet to the instances?  Well, there is one application that directly connects through a dedicated port on the firewall. Everything else that touches the MySQL instances are within the firewall.  And nobody really seems to check on just who is using that connection or audits activity on that direct connection.  

    The Caching SHA 256 method is much more secure hashing method.  It is required to use either a TLS connection or an unencrypted connection that supports password exchange using an RSA key pair.  Please see SHA-256 Pluggable Authentication. By default MySQL 8.0 sets up secure connections between the clients and servers.

    I asked if there was anything in their data that was sensitive, proprietary, covered by privacy laws, or anything they just did not want made public. Well, yes there was.   

Do They Need to Upgrade?

    I told this person, no, you do not need to immediately upgrade from MySQL native to SHA 256 but I would suggest that they plan to upgrade over the next several months.  Other than being paranoid about the one direct connection through the firewall, it sounded like they were keeping on top of things.  

    Being paranoid about the security of your data should be the default mode of and DBA or developer.  Going with a lower standard than the MySQL 8.0 defaults means you are actively downgrading your protection.  

    I recommend reading Protecting MySQL Passwords With the sha256_password Plugin from the MySQL Server Team Blog for more technical details on using SHA 256 authentication. 

Tuesday, July 6, 2021

SERIAL Columns for Indexes

    SERIAL is a column definition attribute that I use frequently and it caught me by surprise when someone looking at a slide deck of a past presentation of mine asked about my use of it. This person is a long time user of MySQL but did not know this keyword. So I need to spread the word about SERIAL. This might save you some typing and some related fat fingering.

Defining Your Index Column

    When creating a table you want a PRIMARY KEY column. And even with the SERIAL attribute you have to designate which column this is. You cannot use more than one SERIAL designated column or you will get ERROR: 1075: Incorrect table definition; there can be only one auto column and it must be defined as a key.  Why PRIMARY KEY is not added to the other good stuff include with SERIAL does not make sense to me but there are probably dozens of reasons plus the SQL standard arguing against me here.

    Most of the time INTEGERS are used as the data type for that key column. Having the system automatically increment the count for the column is not only handy but provides uniqueness, at least to some extent.  You want to avoid NULL values as they can cause a whole bucket of problems on an indexed field.  And it helps to use only positive values (up to 2^63 - 1).  And most of the time you have no idea how many records there will be so you use the BIGINT type just so that you should never run out of numbers (even though smaller indexes have some performance benefits you should be aware of).  



    So that means you have to specify an unsigned big integer, not nullable, auto incremented column. That is a lot of things to specify and I like to take shortcuts that make sense. That is why I like SERIAL.  In the following example it is easy to see that the six letter 'SERIAL' is a lot less typing than 'bigint unsigned NOT NULL AUTO_INCREMENT' string it encapsulates. 

 SQL > create table pp1 (id SERIAL PRIMARY KEY, doc_number INT UNSIGNED)\G
Query OK, 0 rows affected (0.0334 sec)
SQL > SHOW CREATE TABLE pp1\G
*************************** 1. row ***************************
       Table: pp1
Create Table: CREATE TABLE `pp1` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `doc_number` int unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.0049 sec)
SQL >

Anything that improves readability and reduces the opportunity for yours truly to fat finger is a big plus. 

Wednesday, June 30, 2021

Automating MySQL Shell util.dumpInstance and util.dumpShema backups With CRON

    The UNIX/Linux cron command is a popular way to schedule repetitive tasks.  There are many examples on the web on how to use cron with mysqldump to backup MySQL Instances.  I am sure many systems have something similar to the following:

Local Host mysql Backup:
0 1 * * * /usr/bin/mysqldump -uroot -ppassword --opt myDatabase > /directory/MyDumpFilename.sql

    But what if you want to use the new, super-slick MySQL Shell dump utilities? Well, the good news is that it is as simple as it was with the old program. Want to use dumpSchema for the world schema?

0 1 * * * /usr/bin/ mysqlsh root@localhost -e "util.dumpSchemas(['world'],'/tmp/w2')"

    Or dump the entire instance?

0 2 * * * /usr/bin/mysqlsh root@localhost -e "util.dumpInstance('/tmp/i2',{ 'showProgress' : 'false'})"  

     Be sure to double check the path of the mysqlsh program with which:

[dstokes@localhost ~]$ which mysqldump
/usr/bin/mysqldump

    I highly recommend using these new utilities as they are not only much faster than the old mysqldump program but they are also more flexible.  And the security conscious of you may have spotted the embedded password in the first example which can be a major security problem.  But mysqlsh can cache you passwords which takes them off the crontab line.


 

Tuesday, June 29, 2021

Test Drive the MySQL Data Service July 1st or July 8th

 Hands-On-Lab Series

Maximize MySQL Performance with HeatWave and Oracle Cloud

Sessions on July 1st and 8th

10:30 a.m. to 12:30 p.m. CEST


MySQL is the most popular database among the tech community. However, not all the instances perform the same. As systems and applications scale, IT leaders need to constantly improve the performance of their solutions without compromising its security to keep customers' data safe.

HeatWave is a new, in-memory query accelerator for MySQL Database Service available in Oracle Cloud. You will not need any additional skills, tools nor costs to move real-time data from your OLTP database into an OLAP database. Benchmark results show that HeatWave is 2.7x faster and 70% cheaper than Amazon Redshift.  


This series of three hands-on-lab sessions will provide you with the following key takeaways:

  • Understand how HeatWave improves MySQL performance without the need of changing a single line of code
  • See how working with external data sources using the Oracle Data Integrator Cloud is fast and easy
  • Master the process of moving your MySQL data to the MySQL Database Service on Oracle Cloud Infrastructure
  • Select your Workshops and Register

Monday, June 14, 2021

It Is Time To Double Check Your Backups

    It is time to double check your backups!   Every so often you should randomly check that data you have been backing up.  Set up some fresh bare metal or cloud based fresh metal and get your MySQL instance running.  Simple, easy, and no problem right?

    Well, I hope that is true. But please give it a try, just to put your mind at ease. 

Catastrophe

    I had a call this morning from an old friend who received an automated message overnight that a software build failed. After some digging, the cause of the failure was tracked to an on premises system with hardware issues.  Luckily spare parts were on hand and the server was soon back on line and the build completed.

    Then the friend doubled checked their cron based backup scripts and the logs looked good. A spare machine was fired up and the friend decided to show some new hires how to restore a backup. The new machine was built with a later version of the Linux distribution the company favored. By default that distro enables a stricter version of selinux than used in their shop. Some coffee and RTFM time later, they were able to install their version of MySQL (8.0.24) and have it run.  They added their configuration settings to the configuration file and were ready to reinstall. Right?

Wrong!

    Do to some network configuration issues they could not reach their backup machine and they highly restricted access to this machine.  That is a great practice to secure your data. But in this case they had to wait for their security team to end their Monday morning standup to add the new machine to the allowed server list.  

    Then they only had ten gig of data to restore from a mysqldump.  Just a simple mysql -u root -p < 10gigbackup.sql and we are ready to roll, right? Alas, nope.  The accounts used for this server had not been setup on the server (Please see https://github.com/lefred/mysqlshell-plugins/wiki/user for how to do this easily with the MySQL Shell), were not part of the dump, and that took a little more time to get them setup.

  By now the new hires were referring to the internal documentation on database restorations and making revisions.  

  The restoration with the dump file finished.  Done? Nope.  The old friend randomly checked a few dozen records and was happy for about five minutes.  Then one of the new hires asked about Instance Dump Utility (https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-dump-instance-schema.html). The new hire started asking some good questions.

  • Do we need to update our backup methods?
  • Should we be saving the account records with the data?
  • Is the new utility feature rich enough to switch to that as the backup methodology? 
  • Can any of this be automated?
  • How does this fit with our corporate methodology, including security practices? 

    Hopefully you can do a quick check on your backups even if it is for your peace of mind.

Tuesday, June 8, 2021

MySQL Group on Linkedin.Com passes 20,000 Members

     A few weeks ago the MySQL Group on Linked.in passed 20,000 members. For those of you who do not know about Linked.in, it is a business and employment oriented online platform that is mainly used for professional networking, and allows job seekers to post their resumes and employers to post jobs.

    You can find job positing, announcements, relevant blog posts, and more on the MySQL page. Please join us if you already have an account.

Friday, June 4, 2021

Dutch PHP Conference -- Learn about using indexes versus histograms with MySQL

 The Dutch PHP Conference is June 17th and 18th and I am lucky enough to be presenting again this years.  This is an amazing event to attend but this year it is virtual. At 10:00 to 13:00 CEST on the 17th I will be presenting on MySQL 8.0 Indexes, Histograms, and Other Ways to Speed Up Your Queries.  This talk starts with assuming you are new to adding indexes or histograms, do not know how to find your query plan, and have not been introduced to ways of speeding up your database queries.  Now if you have some knowledge in this area, this session will help you build on what you know already. 

Now if you want to learn more on indexes and histograms you can attend this conference too!  And this conference is FREE!  However for workshops, like my session, there are a limited number of virtual seats and you need to preregister

And like all my other session, I love to get your questions as they arise rather than have you wait, just looking for the perfect moment to quiz me, and then losing focus on the talk.  So if you have a question please provide it when you think of it!  

Dutch PHP Conference

Need a quick row number on your query output?

    Recently on Reddit someone asked how to automatically get a row number generated in front of the output from a query.  There is a ROW_NUMBER() function and it very simple to use with Windowing Functions.

    The sample data for this example is very simple.

> SELECT name, qty FROM sample;
+------+-----+
| name | qty |
+------+-----+
| a    |   5 |
| b    |   2 |
| c    |   4 |
+------+-----+
3 rows in set (0.0019 sec)


    There are two way to write Windowing Functions.  The simplest is to add the ROW_NUMBER() function and  OVER() keyword to the query in the middle of the query. 

> select row_number() over () as 'no'
         name, 
         qty 
  from sample;
+----+------+-----+
| no | name | qty |
+----+------+-----+
|  1 | a    |   5 |
|  2 | b    |   2 |
|  3 | c    |   4 |
+----+------+-----+
3 rows in set (0.0011 sec)


    Or define the window at the end of the query. I prefer this version for readability reasons.

> select row_number() over w as 'no'
         name, 
         qty 
from sample 
window w as();
+----+------+-----+
| no | name | qty |
+----+------+-----+
|  1 | a    |   5 |
|  2 | b    |   2 |
|  3 | c    |   4 |
+----+------+-----+
3 rows in set (0.0009 sec)

    So now you know how to add a simple row number to your query output and two ways to format that request.





Tuesday, May 25, 2021

What Does This Query Really Do?

Computers are dumb. And they will do exactly what you ask them to do.  The trick often is to think as dumb as the computer. Sadly it is all to easy to assume that the computer is 'thinking' like you are and blunder into a head scratching puzzle.  Recently there was a post on MySQL Community Space Groundbreakers Developer Community site that shows that sometimes what is intended is not what you want but you are getting exactly what you asked. 

Quiz -- What happens if you run the following query?

SELECT concat('CREATE TABLE if does not exists sakila1.', 
    TABLE_NAME, 
    ' like sakila.', 
    TABLE_NAME,  ';') 
FROM information_schema.`TABLES` 
WHERE TABLE_SCHEMA = 'sakila'

A) You will create copies of the tables in the sakila schema in sakila1 schema.

B) You will create the SQL queries to create copies of the tables in the sakila schema in the sakila1 schema.

C) This will fail as you have to create sakila1 before you can run this query.

D) This query has a syntax error

Your answer??


To help you take this quiz, here is the output of the query.  And yes, I have sakila schema but not a sakila1 schema. 














The author of this query was expecting the query to create copies of all the tables in the sakila schema in the sakila1 schema. 

Since the query ran, we can eliminate D as an answer.  

And since it ran without the sakila1 schema already created, we can eliminate C.

A simple SHOW SCHEMAS will show that A is not the answer.  Yes, even if you create the new schema before running the query.

The answer is that this query creates the query to do the work but does not perform the actual work.  So the query does exactly as asked but not as intended.

So what if you want this new schema?

Well, you could save the output from the query in a file and then execute that file.

Or put the query above in a CTE or subquery and use the output to do the intended work.

Thursday, May 20, 2021

Extending the MySQL Shell with Python

One of the problems that comes with age is that there is so much 'baggage' filling valuable brain area that it sometimes requires a mental sweeping before trying to learn a new skill. I am still getting rid of FORTRAN and COBOL factoids but need to come up to speed on extending MySQL for some talks I am giving.  So away with the PROCEDURE DIVISION stuff... 

The MySQL Shell or Mysqlsh is very extensible and it is easy to create a plugin for some handy routines.  Your routines can go in functions and there are examples below.  The tricky part is that you have to tell Mysqlsh that you are creating an extension and then hook up your function(s) to that extension object. Finally that extension object has to be registered with the shell.

So below are two functions which query the MySQL World Database for the contents of the world.city and the world.country tables.  The first thing for both of these functions is to make sure we are connected to the MySQL database instance or, in other words, that our shell has a session establish with the server. Then it is a quick query and dump the results.

Now the tricky stuff!

We need an object for our extension plugin_obj = shell.create_extension_object()
and then we provide the informationfor our functions with shell.add_extension_object_member(plugin_obj, "city", show_tables, {"brief" : "Dave's demo 2 - city"} ) and 
shell.add_extension_object_member(plugin_obj, "countryinfo", show_country, {"brief" : "Dave's demo 2 - country"} ) before we tell the shell to use our new plugin  with shell.register_global("world", plugin_obj, {"brief": "A 2nd demo for Dave"}) to tell the shell we want these functions available under the name world

When a new Mysqlsh is session is started, we can use \h to list the available plugins. And we look for our new plugin.

The world plugin



If you type 'world' and then a TAB key, you will be able to see that the two functions are ready!

The available functions


The you can type the name of the function desired and then the query will run.

Ready to run a function



I have omitted the output from the query. Please note that you can use plugins written in either JavaScript or Python and that you do can run programs written in either language from the other language mode.

The Code

def show_tables(session=None):
   """ Lists all the records in the world.city table
   Simple function to query records in table
   Args:
      session (object): Option session object or use current mysqlsh session
   Returns:
Nothing
   """
   if session is None:
      session = shell.get_session()
   if session is None:
      print("No session specified - pass a session or connect shell to database")
      return
   if session is not None:
      r = session.run_sql("SELECT * FROM world.city")
      shell.dump_rows(r)

def show_country(session=None):
   """  Yada - yada -> see above function
   """
   if session is None:
      session = shell.get_session()
   if session is None:
      print("No session specified - pass a session or connect shell to database")
      return
   if session is not None:
      r = session.run_sql("SELECT * FROM world.country")
      shell.dump_rows(r)

plugin_obj = shell.create_extension_object()

shell.add_extension_object_member(plugin_obj, "city", show_tables, {"brief" : "Dave's demo 2 - city"} )
shell.add_extension_object_member(plugin_obj, "countryinfo", show_country, {"brief" : "Dave's demo 2 - country"} )

shell.register_global("world", plugin_obj, {"brief": "A 2nd demo for Dave"})

Thursday, May 13, 2021

JSON Validation and ALGORITHM = Instant

Today I was presenting on JSON Schema validation and covering how MySQL implemented the JSON-Schema.org ideas on making sure JSON data meets criteria before being allowed into the database.  I was asked by Percona Live attendee Nickolay Ihalainin if the JSON_SCHEMA_VALID function worked with ALTER TABLE's ALGORITHM = INSTANT option.

I admitted that I did not know as a I had not tested. While I was answering questions, Nickolay did this test.

 alter table foo17  add CHECK
    (json_schema_valid(
            _utf8mb4'{ "type":"object",
            "properties":{
                "latitude":{"type":"number", "minimum":-90, "maximum":90},
               "longitude":{"type":"number", "minimum":-180, "maximum":180} 
             },
      "required": ["latitude", "longitude"]\n }',doc)
),
ALGORITHM=INSTANT; 

And the answer (drum roll here):

ERROR: 1845: ALGORITHM=INSTANT is not supported for this operation. Try ALGORITHM=COPY.

Dang!

And yes, it does work with ALGORITHM=COPY.

Wednesday, May 12, 2021

MySQL Track

May 13th is the day for the MySQL Tracks at the Percona Live  conference. 

Besides an announcement for an exciting new product, you will find sessions on:


  • MySQL Server Component Manifest Files
  • Insights into the new Oracle MySQL Database Service
  • Oracle MySQL Database Service with HeatWave for Real-Time Analytics
  • MySQL Architectures in a Nutshell
  • Successfully run your MySQL NDB Cluster in Kubernetes
  • Validating JSON
  • Migration from 5.6 to 8.xx.xx
  • Dbdeployer in action - Optimised MySQL sandboxes
  • MySQL Shell for DBAs (including Dump & Load)


Monday, May 3, 2021

May is Double Check Time

     Early in my career my boss would ask me to audit the equipment each May.  I had to double check the hardware and software versions to ensure they were up to date, or as close as we could get them.  I was working at a university and things quieted down after the students left in June which gave us a window for maintenance. And to this day the first Monday in May is when I double check the status of my equipment.

    MySQL 8.0.24 came out last month and I had not upgraded my various computers.  Fedora & Ubuntu have also recently sent out new versions and I am up to date on one and not the other.  My favorite email program had an update and the two browsers I use most also had updates.  And my corporate IT folks have some updates they wanted me to run.  

    Why do this effort?  First is that there are bug fixes, updates, and new features.  A lot of engineering talent's time is wasted by folks not performing updates. The following is a example of a conversation that I have regularly.  

"Hey Dave, when will MySQL have feature X? I really, really need it!" 

"We added it two years ago in version 8.0.14." 

    I understand the 'if it is not broke do not fix it mentality' on a lot of things but not software updates that can be vital to your organization.  There may be a security path you need, or a memory leak plugged, or a more efficient function in that update.  

   I can understand a reluctance to upGrade (not upDate) until a new release has proven itself. By the way for those of you wanting to wait for MySQL 8.0 to mature should note that it has been out for over three years! So those wanting histograms, window functions, and many new other features should be well on their upgrade path.

  So take some time week and double check to see what you are running.  This is an area where you do not want surprises.  


Wednesday, April 21, 2021

Default Values & Check Constraint and Midwest PHP

 Consider the following table definition:

CREATE TABLE check_please (
       c1 int constraint c1_not_42 check (c1 <> 42) default 42);

CHECK Constraints are a few years old in MySQL and are very useful. You set up a constraint and the MySQL server ensures that the data being inserted does not violate that constraint. 

But what if the DEFAULT value violates that constraint?  Which takes precedence - the default value of the constraint?  We start with some direct action:

insert into check_please (c1) value(42);
ERROR: 3819: Check constraint 'c1_not_42' is violated.

That worked as expected.  There is no easy way to cram a value of 42 into column c1.

That was a direct test. But what if we try something different?  A second column is added so we can test not supplying a value to column c1.  

CREATE TABLE check_please2 (
       c1 int constraint c1x_not_42 check (c1 <> 42) default 42,
       c2 int default 42);

And the test supplying no value for c1 which should try to insert the default value of 42, which violates the constraint check.

insert into check_please2 (c2) values (NULL);
ERROR: 3819: Check constraint 'c1x_not_42' is violated.

It does seem pretty obvious that a default value would not override a constraint but sometimes you just have to check your assumptions.

Midwest PHP


Midwest PHP is an online conference  April 22nd and 23rd with 52 Sessions, Certifications, & More! And there are free tickets! MySQL is again proud to sponsor this event and I speak on the first day on new MySQL 8.0 features.  


Monday, March 29, 2021

QUEST Insync & Open Source 101

 Not one but two virtual presentations on March 30th for me.

First up at 10:45 Central is MySQL Indexes, Histograms Other Ways To Speed Up Queries  and look for Session ID: 101290 

Second at 3:45 Central is An Introduction to MySQL Indexes & Histograms (a shorter version of the first talk) and I will be staffing the virtual booth.

Slides will be posted at htps:://slideshare.net/davestokes 

So if you really, really want more information on indexes and histograms then March 30th is your day!

Wednesday, March 24, 2021

From Spreadsheet to Database with MySQL Workbench

In the last post I covered some of the many reasons to migrate data from a spreadsheet to a database and now it is time to show how to do just that.  Moving data from a spreadsheet into MySQL can actually be pretty simple in many cases. Now I will not mislead you by telling you that all cases are easy but there is an easy way to make this migration.  And I highly recommend the following process when possible as it can save lots of time. More painful migrations will be covered in the future but this post is my favorite 'fast and easy' process.

This is a guide to taking data from a spreadsheet and easily moving that data into a MySQL database instance. The trick is the Import Wizard in MySQL Workbench that does a lot of the detail work for you. In a future post we will go into what you have to do when you are not using Workbench and have many more steps to take to get the job done.

You will find the data for this example at https://catalog.data.gov/dataset/texas-dot-txdot-work-zone-data  and this dataset provides lane closure occurrences within the Texas Department of Transportation (TxDOT) highway system in a tabular format.  By the way, Texans love to argue about who has the best BBQ and freeway construction. This is a continuously updatid archive of the TxDOT WZDx feed data can be found at ITS WorkZone Raw Data Sandbox and the ITS WorkZone Semi-Processed Data Sandbox. The live feed is currently compliant with the Work Zone Data Exchange (WZDx) Specification version 2.0.  But most importantly for this blog is that it is a good example dataset.  So please download it and import it into the spreadsheet of your choice.

1. Export Data

There are many ways to covert the data from a spreadsheet to something MySQL can consume.  The easiest is to save it in CSV or Comma Separated Variables were every column of data has a comma between it and the next column.  In Sheets you will use FILE->Download->CSV , Excel is File->Save As->CSV, and other spread sheet software will have something similar.  Please not that for most programs this exports the current sheet and not the entire spreadsheet. 

road_event_feed_info_feed_update_date,road_event_feed_info_version,road_event_id,subidentifier,road_name,road_number,direction,beginning_cross_street,ending_cross_street,beginning_milepost,ending_milepost,beginning_accuracy,ending_accuracy,start_date,end_date,start_date_accuracy,end_date_accuracy,event_status,total_num_lanes,vehicle_impact,workers_present,reduced_speed_limit,restrictions,description,issuing_organization,creation_date,update_date,types_of_work,lanes,geometry_linestring,geometry_multipoint

03/17/2021 04:58:33 PM,2,ELP_2019-08-01_00014+2021-03-17,ELP,SH-20,,eastbound,US-54,,,,Estimated,Estimated,03/17/2021 12:00:00 AM,03/17/2021 11:59:59 PM,Estimated,Estimated,,3,some-lanes-closed,,,[],2019 - 08/01 On-Going Closure - Right lane closure,TxDOT,,,[],"[{""lane_edge_reference"": ""left"", ""lane_number"": 1, ""lane_status"": ""open"", ""lane_type"": ""left-lane""}, {""lane_edge_reference"": ""left"", ""lane_number"": 2, ""lane_status"": ""open"", ""lane_type"": ""middle-lane""}, {""lane_edge_reference"": ""left"", ""lane_number"": 3, ""lane_status"": ""closed"", ""lane_type"": ""right-lane""}]",,MULTIPOINT ((-106.445503 31.773454))

The data above shows only the first two rows of data.  The first row of data has the column headings and the second row is sample data.  Workbench can read the first row and use those columns names to create the column names for the table in the database.  It will read the other lines in the table to make an educated guess on the data types for the columns of the table.

2. Import Data

Start MySQL Workbench and login your to instance. 

Open a new SQL tab by clicking on the icon indicated below.









Issue a CREATE DATABASE texdot; command in that new SQL tab. Do not forget the semicolon at the end of the line that indicates the end of the Structured Query Language (SQL) command.








Next execute that create database command by hitting the second lightening bolt icon.  The second lightening bolt icon executes the SQL under the cursor.






You will see that the command has executed with the 1 row(s) effected message. 










Now we have to tell Workbench that we want to use that new database we just created. No, that does not automatically happen.  You have to select the little 'eyeball' icon on schema list to reload the list of available schemas (schemas and databases are used interchangeably in the MySQL world).


 




Now scroll down in the schemas window until you see your new database.  Click on 'texdot' to tell Workbench that this is the database we want to use. When you click on it you will see underneath the active schema name in green.











Then select the right pointing arrowhead to expand the information on the schema.










Right click on the Tables line and you will see the Table Data Import Wizard option.  











Now comes the neat stuff.









Enter the name of the file where the CSV data resides.


In my case it was in D:\tex\ and I was able to browse for it.

We want to create a new table in the texdot directory 


And I was able to enter the name zone for the new table.


The wizard starts to work it's magic.  It takes the data in the first row of the CSV file and converts them to file names. It then looks at the rest of the datafile to determine data type.


You will note that you can change them if you want and it will automatically make a best for types like JSON. Select Next when you are ready to proceed.

















You are now ready to import.  Select Show Logs from the lower left corner and then select Next.

















The data is now uploaded.

















The time to import is partially a function of the CSV file length but eventually you will get the Import data file finished message. Again select Next.

















There are some statistics on the performance of the import that may be of use. In this case we import just over 1600 records. Select Finished.

















Hit that eyeball icon again to reload the table table.  








And we can now use SQL to query the data. Enter SELECT * FROM texas_dzone LIMIT 5; and again select that second lightening bolt icon.  (please note that in an earlier run is uesed the name 'zone' instead of 'texas_dzone')












And the results of that query.









In just a few moments the data went from being in a spreadsheet to being in a database.  No muss, no fuss.  In the future we will look at datasets that take a little more 'engineering' to get into the database as there are cases where we and not the wizard have to do the 'heavy lifting'.  

More on MySQL Workbench

For on MySQL Workbench please see https://www.mysql.com/products/workbench/  to download and https://dev.mysql.com/doc/workbench/en/  for the documentation.

Tuesday, March 16, 2021

Is a Database Just a Fancy Spreadsheet?!?

 Want to make a data person groan?  Ask them if a database is just a fancy spreadsheet.  Then ask them to convert a spreadsheet to a database.  Add in that there are lots of funny calculations, odd cells, and company 'secret' formulas.

What is a Spreadsheet?

For those of you who did not see the changes brought early personal computer spreadsheet software packages like VisiCalc and Lotus 1-2-3 back in the 1980's, they probably sold more personal computers (who needs more the 640K of memory?) to businesses than any other application.  Accountants used to use real 'sheets' of paper 'spread' across their desks.  Summing a column by hand or with a 10 key calculator and then writing down the results all day long was a common task (or career).  Wikipedia describes a spreadsheet as a computer application for organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cells of a table. Each cell may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells. A spreadsheet may also refer to one such electronic document.

Those spreadsheet power users who could program formulas, pivot data, and other tricks were very important persons.  However there was no way for two or more people to share the same spreadsheet. At least if you wanted to ensure the integrity of the data or programmed formulas.  No transactions, locking, or anything like that (and to honest not many of the databases back then had those features either).  But the computer did sums, standard deviations, calculated depreciation, net present value, and other vital calculations, not a human.  

Converting a Spreadsheet to a Database

The limitations of a spreadsheet can choke the growth of a small business and so there you can hear pleas for conversion help on a regular basis.  

I am working on a new project  to document how to do such a conversion.  Right now I am still looking at a good source of data (please share if you have a good one) in Excel format (or similar) to show how to convert the data, how to use the Import Wizard in MySQL Workbench (and the utilities in mysqlsh too), common problems in import, checking the data for completeness, setting up structure, some normalizing, and then some query optimization.  

One example of a typical problem found in a spreadsheet to database conversion is the need to scan the columns for consistency.  If a column is supposed to yes/no, y/n, true/false, or some binary 0/1 variation it is almost inevitable to find an asterixis, a 'm' (for 'maybe'), or the 'See John for details on this'.  Cells in a spreadsheet are much more flexible than typed column in a relational databases.

I will be posting over the little while the steps I am taking with this project.  Hopefully it can be the first step is a series. Please let me know if you have input.

CSV or not to CSV

Comma Separated Variables are common way of outputting data from a spreadsheet to another spreadsheet or another platform.  If you use Excel you version may offer three versions of CSV to pick from.  Then you might need to demark the end of each line, the start & end of each cell, and other similar encodings.  

JSON is another option but not for my version of Excel.  Nor LibreOffice nor Google Sheets.  And XML is not a favorite of mine.   

SO CSV becomes the least common form factor.  (insert golf clap sound effect here)

CIA World Factbook

The CIA World Factbook is the basis of the famous MySQL world and world_x databases used in MySQL documentation, classes, and other literature.  It is still free but a MySQL devotee would not recognize the majority of the data.  It had evolved a lot and I am tinkering with using this data to show how to do the conversion, use the migration wizard, at etcetera.  And it is free.

Next Time

Next time we will look at the spreadsheet data and the CSV.  And yes, as a preview of things to come, there are issues with the data.  But that is for the next time. 

Tuesday, March 2, 2021

Invisible MySQL?

 Is MySQL going invisible?  Invisible Indexes were included in MySQL 8.0 and now with version 8.0.23 we have Invisible Columns.

Indexes You Can Not See!

The value of the invisible index is that it allows you to make an index disappear from the view of the optimizer.  In the distant days before 8.0, you would often delete an index you were pretty much definitively positive nobody or no query was using.  And then you would find out that yes, not only was that index you just deleted necessary to everyone in the galaxy (but maybe you)  but it was going to take some serious clock time to rebuild that index. 

But with Invisible Indexes, you issue a command like ALTER TABLE t1 ALTER INDEX i_idx INVISIBLE; and it was removed from use.  Now you can run EXPLAIN on your queries and compare results.  And if you want that index back among the visible, ALTER TABLE t1 ALTER INDEX i_idx VISIBLE;  returns you to full functionality.

I do recommend making an index invisible as part of a process of decommissioning an index, similar to a soft delete of a column to avoid hurried recreation.  And the Sys Schema will show you indexes that have not been used, just make sure you have long enough of a time period to let those queries that only run once a week/month/quarter or longer show themselves.  

Columns You Can Not See

MySQL 8.0.23 now allows you to have columns you can sort of not see.  There are not really invisible or obfuscated but those columns are harder to see.  If we create a table with an invisible column we have to explicitly call out that column to see the values as a * wildcard will not return the value.

SQL > create table stuff (id serial, c1 int, c2 int invisible);
Query OK, 0 rows affected (0.0393 sec)
SQL > insert into stuff (c1, c2) values row(1,2), row(3,4), row(5,6);
Query OK, 3 rows affected (0.0073 sec)
Records: 3  Duplicates: 0  Warnings: 0
SQL > select * from stuff;  <- the wildcard, no invisible column
+----+----+
| id | c1 |
+----+----+
|  1 |  1 |
|  2 |  3 |
|  3 |  5 |
+----+----+
3 rows in set (0.0005 sec)
SQL > select id,c1,c2 from stuff; <- have to call c2 to see c2
+----+----+----+
| id | c1 | c2 |
+----+----+----+
|  1 |  1 |  2 |
|  2 |  3 |  4 |
|  3 |  5 |  6 |
+----+----+----+
3 rows in set (0.0005 sec)

So you could somewhat hide a column by using this method but please do not call this secure.  This will allow you to add columns, say a primary key to a table lacking one, without having to worry about needing to modify existing queries.

And if you are creating a table from a table with an invisible column you need to explicitly reference it or you will not get that data.

Limitations?


The first limitation is that all columns can not invisible so at least one column needs to be visible.  Columns can be defined as NOT NULL and have DEFAULT values.  

What Else Is Going To Be Inviable?!


Well, in my job as a MySQL Community Manager, I do on rare occasion have access to software pre launch and I can tell you there are some really cool things in the product pipeline but I can not let you see them yet. (sorry, very bad joke)

Friday, February 26, 2021

Searching JSON Data

 A post on the MySQL forums caught my eye today. The distilled question was why was an index not being used to search JSON data.  The poster (in blue) had done most of the steps needed almost correctly but just missed the crucial steps. 

Suppose we have this table with an index on a virtual generated column as JSON attribute index pattern

```
CREATE TABLE applications (
id BINARY(16) PRIMARY KEY,
data JSON NOT NULL,
) ENGINE=InnoDB;
CREATE INDEX idx ON applications ((CAST(data->>"$.name" AS CHAR(10))));

Friday, February 5, 2021

MySQL & Friends FOSDEM Dev Room

 FOSDEM has become one of the biggest shows on the calendar of the MySQL Community Team.  FOSDEM itself, until this year, is barely contained chaos at a university in Brussels with no pre-enrollment for any of the dozens of themed rooms and no head count.  This year the event is virtual so you do not have to get to Belgium.


The MySQL and Friends Developer Room is Sunday and starts at 10:00am  Belgium time with a retrospective of the first twenty five years of MySQL. This is followed 15 other talks of about 25 minutes on everything from ARM to user management. 


TITLESPEAKERSTARTEND
25 years of MySQL – A RetrospectiveDave Stokes10:0010:20
Open Source Database Infrastructure with VitessShlomi Noach10:3010:55
From single MySQL instance to HA
The journey to InnoDB Cluster
Frédéric Descamps11:0011:25
Group Replication: Best Practices for Network DeployAníbal Pinto11:3011:55
Review of the Percona Operator from a DBA prospectiveMarco Tusa (the Grinch)12:0012:25
MySQL Router REST APIFrédéric Descamps12:3012:55

Better User Management under MySQL 8.0
Dave Stokes13:0013:25
Running MySQL on ARMKrunal Bauskar13:3013:55
Making MySQL-8.0 XA transaction processing crash safe
The key to use MySQL as storage nodes for distributed databases
Wei Zhao14:0014:25
Passing query attributes through the MySQL protocol
Extension to the wire format to allow data/metadata
Joro Kodinov14:3014:55
Linux /proc filesystem for MySQL DBAs
Sampling /proc content for troubleshooting
Valerii Kravchuk15:0015:20
Rewrite Your Complex MySQL Queries for Better PerformanceØystein Grøvlen15:3015:55
Automatic Asynchronous Replication Connection FailoverHemant Dangi16:0016:25
Sure you can run your database in kubernetes
Successfully run your MySQL NDB Cluster in kubernetes
Bernd Ocklin16:3016:55
18 Things To Do When You Have a MySQL BottleneckPeter Zaitsev17:0017:25
Hitchhiker’s Guide to MySQL
Don’t panic.
Liz van Dijk17:3017:55