Wednesday, March 24, 2021

From Spreadsheet to Database with MySQL Workbench

In the last post I covered some of the many reasons to migrate data from a spreadsheet to a database and now it is time to show how to do just that.  Moving data from a spreadsheet into MySQL can actually be pretty simple in many cases. Now I will not mislead you by telling you that all cases are easy but there is an easy way to make this migration.  And I highly recommend the following process when possible as it can save lots of time. More painful migrations will be covered in the future but this post is my favorite 'fast and easy' process.

This is a guide to taking data from a spreadsheet and easily moving that data into a MySQL database instance. The trick is the Import Wizard in MySQL Workbench that does a lot of the detail work for you. In a future post we will go into what you have to do when you are not using Workbench and have many more steps to take to get the job done.

You will find the data for this example at https://catalog.data.gov/dataset/texas-dot-txdot-work-zone-data  and this dataset provides lane closure occurrences within the Texas Department of Transportation (TxDOT) highway system in a tabular format.  By the way, Texans love to argue about who has the best BBQ and freeway construction. This is a continuously updatid archive of the TxDOT WZDx feed data can be found at ITS WorkZone Raw Data Sandbox and the ITS WorkZone Semi-Processed Data Sandbox. The live feed is currently compliant with the Work Zone Data Exchange (WZDx) Specification version 2.0.  But most importantly for this blog is that it is a good example dataset.  So please download it and import it into the spreadsheet of your choice.

1. Export Data

There are many ways to covert the data from a spreadsheet to something MySQL can consume.  The easiest is to save it in CSV or Comma Separated Variables were every column of data has a comma between it and the next column.  In Sheets you will use FILE->Download->CSV , Excel is File->Save As->CSV, and other spread sheet software will have something similar.  Please not that for most programs this exports the current sheet and not the entire spreadsheet. 

road_event_feed_info_feed_update_date,road_event_feed_info_version,road_event_id,subidentifier,road_name,road_number,direction,beginning_cross_street,ending_cross_street,beginning_milepost,ending_milepost,beginning_accuracy,ending_accuracy,start_date,end_date,start_date_accuracy,end_date_accuracy,event_status,total_num_lanes,vehicle_impact,workers_present,reduced_speed_limit,restrictions,description,issuing_organization,creation_date,update_date,types_of_work,lanes,geometry_linestring,geometry_multipoint

03/17/2021 04:58:33 PM,2,ELP_2019-08-01_00014+2021-03-17,ELP,SH-20,,eastbound,US-54,,,,Estimated,Estimated,03/17/2021 12:00:00 AM,03/17/2021 11:59:59 PM,Estimated,Estimated,,3,some-lanes-closed,,,[],2019 - 08/01 On-Going Closure - Right lane closure,TxDOT,,,[],"[{""lane_edge_reference"": ""left"", ""lane_number"": 1, ""lane_status"": ""open"", ""lane_type"": ""left-lane""}, {""lane_edge_reference"": ""left"", ""lane_number"": 2, ""lane_status"": ""open"", ""lane_type"": ""middle-lane""}, {""lane_edge_reference"": ""left"", ""lane_number"": 3, ""lane_status"": ""closed"", ""lane_type"": ""right-lane""}]",,MULTIPOINT ((-106.445503 31.773454))

The data above shows only the first two rows of data.  The first row of data has the column headings and the second row is sample data.  Workbench can read the first row and use those columns names to create the column names for the table in the database.  It will read the other lines in the table to make an educated guess on the data types for the columns of the table.

2. Import Data

Start MySQL Workbench and login your to instance. 

Open a new SQL tab by clicking on the icon indicated below.









Issue a CREATE DATABASE texdot; command in that new SQL tab. Do not forget the semicolon at the end of the line that indicates the end of the Structured Query Language (SQL) command.








Next execute that create database command by hitting the second lightening bolt icon.  The second lightening bolt icon executes the SQL under the cursor.






You will see that the command has executed with the 1 row(s) effected message. 










Now we have to tell Workbench that we want to use that new database we just created. No, that does not automatically happen.  You have to select the little 'eyeball' icon on schema list to reload the list of available schemas (schemas and databases are used interchangeably in the MySQL world).


 




Now scroll down in the schemas window until you see your new database.  Click on 'texdot' to tell Workbench that this is the database we want to use. When you click on it you will see underneath the active schema name in green.











Then select the right pointing arrowhead to expand the information on the schema.










Right click on the Tables line and you will see the Table Data Import Wizard option.  











Now comes the neat stuff.









Enter the name of the file where the CSV data resides.


In my case it was in D:\tex\ and I was able to browse for it.

We want to create a new table in the texdot directory 


And I was able to enter the name zone for the new table.


The wizard starts to work it's magic.  It takes the data in the first row of the CSV file and converts them to file names. It then looks at the rest of the datafile to determine data type.


You will note that you can change them if you want and it will automatically make a best for types like JSON. Select Next when you are ready to proceed.

















You are now ready to import.  Select Show Logs from the lower left corner and then select Next.

















The data is now uploaded.

















The time to import is partially a function of the CSV file length but eventually you will get the Import data file finished message. Again select Next.

















There are some statistics on the performance of the import that may be of use. In this case we import just over 1600 records. Select Finished.

















Hit that eyeball icon again to reload the table table.  








And we can now use SQL to query the data. Enter SELECT * FROM texas_dzone LIMIT 5; and again select that second lightening bolt icon.  (please note that in an earlier run is uesed the name 'zone' instead of 'texas_dzone')












And the results of that query.









In just a few moments the data went from being in a spreadsheet to being in a database.  No muss, no fuss.  In the future we will look at datasets that take a little more 'engineering' to get into the database as there are cases where we and not the wizard have to do the 'heavy lifting'.  

More on MySQL Workbench

For on MySQL Workbench please see https://www.mysql.com/products/workbench/  to download and https://dev.mysql.com/doc/workbench/en/  for the documentation.