Wednesday, March 27, 2019

MySQL 8.0 Geographic Information Systems (GIS) - How to get the distance between two cities


MySQL before version 5.7 had less than stellar Geographic Information Systems (GIS) support.  In version 5.7 the Boost.Geometry two dimensional or 2D libraries were added.  And with 8.0 came the three dimensional or 3D libraries. But how do you use these features?

I would like to state up front that this is all new to me and this is an attempt for me to document what it takes to go from zero knowledge on GIS to something hopefully better.  What I want to do as an exercise is to get the distance between two places from their longitude and latitude, say two cities near where I live.  So what do we have to do to accomplish that?

It is actually easy with the functions provided if we have the longitude and latitude in an SRID 4326 format.

SELECT ST_Distance(
 (SELECT loc FROM cities WHERE name = 'Trondheim'),
 (SELECT loc FROM cities WHERE name = 'San Francisco')
) AS distance_in_meters;

Okay, but what is an SRID 4326 format, where do I get them, and how do I get data into that format? So let us dig deeper.

I have to thank Norvald Ryeng who is a senior software manager at Oracle for that query roughly transposed by me from his MySQL GIS 8.0 Overview.
Harrison's Clocks were the first successful one to be used by the Royal Navy for navigation and can be seen at the Royal Observatory, Greenwich, United Kingdom. Very interesting story of a man on the cutting edge of his technology for the time who had to overcome so many technical and political problems.

Geometry Class: Background


Geometric data is stored in a geometry data type. Each geometry has a type, a SRID, coordinates, and some other attributes that will be ignored for now. And we will mainly deal with the POINT class.  Think of a point as someplace on a map. We will skip CURVE, LineString, Surface, Polygons, GeometryCollection, MultiPoint, MultiCurve, MultiLineString, MultiSurface, and MultiPolygon.

Columns with a spatial data type have a SRID attribute to indicate the spatial reference system, or SRS, used to store data in that column.  There are SRSs for projecting a globe onto a flat surface (think Mercator Projection maps of the world), non projected representing longitude-latitude on an ellipsoid. In MySQL a SRID value of 0 represents an infinite flat Cartesian plane with no units assigned to the axes and is the default.  Now a group named the European Petroleum Survey Group has their own system with a value of 4326, which you will see later.

Coordinates are represented as double-precision (8-byte) numbers.  These number pairs are either planar (flat) or geodetic (think Earth's surface) and the same pair values from one system can be wildly different than the other.

In the query above we used a column named loc as the point where the cities were located.  You can see the details in Norvald's excellent slides or take my word that these queries were used in a column defined as loc POINT SRID 4326 NOT NULL for now.

INSERT INTO cities (name, loc) VALUES (
 'Trondheim',
 ST_GeomFromText('POINT(64.43048320193547 10.394972698312927)', 4326)
);
INSERT INTO cities (name, loc) VALUES (
 'San Francisco',
 ST_GeomFromText('POINT(37.272615666666667 -122.4425455194445)', 4326)
);

Now Wikipedia says Trondheim is at 63 degrees 25"47'N and 10 degrees 23"36'E which looks pretty close to the numbers above.  And the Nidaros Cathedral is listed as 63.4305 degrees North latitude and 10.3951 degrees East longitude. So saying POINT('latitude' 'longitude') seems to be what I am looking for to designate a location.

To peel back another layer of the onion, lets look at what ST_GeomFromText() does.  The states thusly that ST_GeomFromText 'Constructs a geometry value of any type using its WKT representation and SRID.'

Okay, that cleared up nothing. So what the heck is a WKT representation? The MySQL Manual has a section on Functions That Create Geometry Values from WKT Values.  The Well-Known Text (WKT) representation of geometry values is designed for exchanging geometry data in ASCII form. The OpenGIS specification provides a Backus-Naur grammar that specifies the formal production rules for writing WKT values.  So 'POINT(15 20)' is a WKT.

This is not getting easier, is it?  And we still do not know how to get the distance between two sets of longitude and latitude.

Getting Some Data


I was able to find a very nice list of US Zip Codes (postal codes) at https://gist.github.com/erichurst/7882666/ and used the MySQL Workbench data import agent to create a table. Just three columns ZIP for the Zip Code, LAT for latitude, and LNG for longitude. I stored the ZIP filed as TEXT (leading zeros that are significant disappear if the column in an INT) and the other two fields as REALs.  And there are 33,144 rows of data.

Lets look at the distance between the lovely cities of Justin and Trophy Club, both in the state of Texas. Justin is zip code 76247 and Trophy Club is 76262

 mysql> select * from data where ZIP in (76247, 76262);
+-------+-----------+------------+
| ZIP   | LAT       | LNG        | +-------+-----------+------------+
| 76247 | 33.099993 | -97.340499 |
| 76262 | 33.009335 |  -97.22672 | +-------+-----------+------------+
2 rows in set (0.03 sec)
 
And lets try a query.


mysql> select 
 st_distance(
  st_geomfromtext('point(33.099993 -97.34099)',4326),                     
  st_geomfromtext('point(33.009335 -97.22672)',4326)
  )  as distance;

+--------------------+


| distance           |
+--------------------+
| 14662.554615102901 |
+--------------------+
1 row in set (0.00 sec)



So there is roughly 15,000 meters between the cities of Justin and Trophy Club. And please note this is a flat model not one that takes in the curvature of the earth.  But we can get that easily enough by using ST_Distance_Sphere.

mysql> select st_distance_sphere(
    st_geomfromtext('point(33.099993 -97.34099)',4326),
    st_geomfromtext('point(33.009335 -97.22672)',4326)) 

   as sphere,
    st_distance(

     st_geomfromtext('point(33.099993 -97.34099)',4326),
     st_geomfromtext('point(33.009335 -97.22672)',4326)) 

    as flat;
+--------------------+--------------------+
| sphere             | flat               |
+--------------------+--------------------+
| 14664.184945803418 | 14662.554615102901 |
+--------------------+--------------------+
1 row in set (0.00 sec)







So even in the fairly flat area of North Texas, there is some difference between a spherical and flat world.