Sunday, June 17, 2018

MongoDB versus MySQL Document Store Command Comparisons III

This time we will look at the differences in updating records between MongoDB and the MySQL Document Store.  Syntactically they are pretty different.  I am still following the Getting Started With MongoDB article for example queries.

Updating Records

In Mongo we update thusly:
> db.restaurants.update(
... { "name" : "Juni" },
... {
...  $set: { "cuisine" : "American (new)" },
...  $currentDate: { "lastModified" : true }
... }
... )
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

The same update in the MySQL Document Store can be a lot different.  We could update using SQL or NoSQL.  I would like to update the document with the change to the cuisine and set the last modified to the current data.  The first change is pretty simple:

db.restaurants.modify("name = 'Juni'").set("cuisine","American (new)")

 But what about that last modified value? Well, that on the SQL side would the values of now() but the MySQL NoSQL side does not have that concept, at least it is not documented.  And lastModified is not in the document itself?!?!  Is it metadata hiding somewhere and hiding somewhere in the Mongo documentation??

Well, after some frustration with searching for document metadata and timestamps with Mongo documents, I decided to circle back to this later.

Updating Embedded Fields

The updating of embedding fields can be as mess but both products can handle  this operation.  The Mongo version is:

> db.restaurants.update(
... { "restaurant_id" : "41156888" },
... { $set: { "address.street": "East 31st Street" }}
... )

And the MySQL Document Store's version:

db.restaurants.modify("restaurant_id ='41156888'").set("address.street","East 31st Street")

So both do what is wanted but have much different syntax.

Updating Multiple Documents

Changing multiple records can be tricky, especially for novices or those learning new software.  Usually it is best to run the SQL version of the old  MySQL shell in --i-am-a-dummy mode for novices where forgetting a WHERE clause can be disastrous.  But the new MySQL shell does not have this option and Mongo forces you to expressly state you want to change multiple records.  Here is a distinction between the two products:

> db.restaurants.update( 
   { "address.zipcode" : "10016", "cuisine" : "Other"}, 
   { $set: { "cuisine" : "Category to be determined"}},
   { multi: true }
WriteResult({ "nMatched" : 20, "nUpserted" : 0, "nModified" : 20 })

Mongo will update only one record unless multi is set to true.  If not set you will get only one record updated. 

MySQL has no such limitation and will change multiple records. 

db.restaurants.modify('address.zipcode = "10016" and cuisine = "Other"').set('cuisine','TBD')
Query OK, 20 items affected (0.2997 sec)

I am also a fan of the explicit and in the MySQL query and not much of a fan of the implied and in the Mongo query. Why? When you try to debug things at two in the morning it is very easy to assume an or or other comparison operator.  When you program assembler you get picky about things like or, xor, and ands.

Picking Output Keys

You may not want all the keys and values from a document every time you dive into the data. Specifying specific keys again shows the differences in syntax between the two products

 > db.restaurants.find( { "name" : "Pizza Plus" }, 
   { name : 1 , borough : 1})
{ "_id" : ObjectId("5b2293b4f46382c40db8264f"), "borough" : "Brooklyn", "name" : "Pizza Plus" }
{ "_id" : ObjectId("5b2293b6f46382c40db86fb0"), "borough" : "Manhattan", "name" : "Pizza Plus" }
{ "_id" : ObjectId("5b2293b6f46382c40db8854b"), "borough" : "Brooklyn", "name" : "Pizza Plus" }

You get the _id if you want it or not. MySQL only gives you the desired values for the specified keys.


db.restaurants.find('name = "Pizza Plus"').fields('name','borough')
        "borough": "Brooklyn",
        "name": "Pizza Plus"
        "borough": "Manhattan",
        "name": "Pizza Plus"
        "borough": "Brooklyn",
        "name": "Pizza Plus"
3 documents in set (0.0486 sec)

So those extra characters are not so bothersome with small data sets but when you have millions of lines or more that extra really add overhead.