Friday, August 2, 2019

Parallel Table Importer in MySQL Shell

The ability to import data in parallel is now a feature of the new MySQL Shell.  And you can run it from a command line instead of from within an interactive shell.

How Does It Work?

The importTable utility, new on MySQL Shell 8.0.17,  analyzes the input file, divides that file into chunks, and then uploads that data using parallel connections.  This is much faster than the LOAD DATA statement. You get the ability to define how the data is delimited as far a field and lines are defined.  And it works on DOS CSVs, Unix CSVs, TSVs, and JSON if that JSON is in one document per line mode. You also can adjust the number of threads, number of bytes sent per each chunk, and the maximum rate of data transfer per thread so you can balance the load on the network and the speed of data transfer.

The parallel table import utility works only  on the MySQL Classic Protocol and not the newer X Protocol. The X Protocol connections do not support LOAD DATA statements. The parallel table import utility makes use of the  LOAD DATA LOCAL INFILE statements to upload data chunks from the input file. Make sure that the data file you want to import is     in a location that is accessible to the client host as a local disk. And  local_infile system variable must be set to ON on the target server.

What to specify

Besides the name of your data file you can specify the schema, specify the table name, an array of column names to map to the table in cases where you data file does not have all the columns as the table, designate if you have unique keys (or if you desire duplicates), the termination of individual lines, the terminations of columns, what fields are enclosed by, the number of threads for uploading, the bytes per chunk, the maximum I/O rate, and the dialect of your data file (CSV, TSV, JSON). And you can get a status report on the progress.

From the Command Line Too

You can use util.inputTable from within an interactive shell or from a command line.  The following is the import of one million rows from a command line.

$ mysqlsh mysql://root@localhost --ssl-mode=DISABLED -- util import-table foo.csv --schema=test --table=foo
Importing from file 'foo.csv' to table `test`.`foo` in MySQL Server at /var/lib/mysql/mysql.sock using 3 threads
[Worker000] Records: 1496823  Deleted: 0  Skipped: 0  Warnings: 0
[Worker001] Records: 4204841  Deleted: 0  Skipped: 0  Warnings: 0
[Worker002] Records: 4298336  Deleted: 0  Skipped: 0  Warnings: 0
100% (117.80 MB / 117.80 MB), 490.81 KB/s
File 'foo.csv' (117.80 MB) was imported in 2 min 25.9817 sec at 806.94 KB/s
Total rows affected in Records: 10000000  Deleted: 0  Skipped: 0  Warnings: 0

For details and more examples please see the MySQL Shell Manual.