Tuesday, March 16, 2021

Is a Database Just a Fancy Spreadsheet?!?

 Want to make a data person groan?  Ask them if a database is just a fancy spreadsheet.  Then ask them to convert a spreadsheet to a database.  Add in that there are lots of funny calculations, odd cells, and company 'secret' formulas.

What is a Spreadsheet?

For those of you who did not see the changes brought early personal computer spreadsheet software packages like VisiCalc and Lotus 1-2-3 back in the 1980's, they probably sold more personal computers (who needs more the 640K of memory?) to businesses than any other application.  Accountants used to use real 'sheets' of paper 'spread' across their desks.  Summing a column by hand or with a 10 key calculator and then writing down the results all day long was a common task (or career).  Wikipedia describes a spreadsheet as a computer application for organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cells of a table. Each cell may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells. A spreadsheet may also refer to one such electronic document.

Those spreadsheet power users who could program formulas, pivot data, and other tricks were very important persons.  However there was no way for two or more people to share the same spreadsheet. At least if you wanted to ensure the integrity of the data or programmed formulas.  No transactions, locking, or anything like that (and to honest not many of the databases back then had those features either).  But the computer did sums, standard deviations, calculated depreciation, net present value, and other vital calculations, not a human.  

Converting a Spreadsheet to a Database

The limitations of a spreadsheet can choke the growth of a small business and so there you can hear pleas for conversion help on a regular basis.  

I am working on a new project  to document how to do such a conversion.  Right now I am still looking at a good source of data (please share if you have a good one) in Excel format (or similar) to show how to convert the data, how to use the Import Wizard in MySQL Workbench (and the utilities in mysqlsh too), common problems in import, checking the data for completeness, setting up structure, some normalizing, and then some query optimization.  

One example of a typical problem found in a spreadsheet to database conversion is the need to scan the columns for consistency.  If a column is supposed to yes/no, y/n, true/false, or some binary 0/1 variation it is almost inevitable to find an asterixis, a 'm' (for 'maybe'), or the 'See John for details on this'.  Cells in a spreadsheet are much more flexible than typed column in a relational databases.

I will be posting over the little while the steps I am taking with this project.  Hopefully it can be the first step is a series. Please let me know if you have input.

CSV or not to CSV

Comma Separated Variables are common way of outputting data from a spreadsheet to another spreadsheet or another platform.  If you use Excel you version may offer three versions of CSV to pick from.  Then you might need to demark the end of each line, the start & end of each cell, and other similar encodings.  

JSON is another option but not for my version of Excel.  Nor LibreOffice nor Google Sheets.  And XML is not a favorite of mine.   

SO CSV becomes the least common form factor.  (insert golf clap sound effect here)

CIA World Factbook

The CIA World Factbook is the basis of the famous MySQL world and world_x databases used in MySQL documentation, classes, and other literature.  It is still free but a MySQL devotee would not recognize the majority of the data.  It had evolved a lot and I am tinkering with using this data to show how to do the conversion, use the migration wizard, at etcetera.  And it is free.

Next Time

Next time we will look at the spreadsheet data and the CSV.  And yes, as a preview of things to come, there are issues with the data.  But that is for the next time.