Big Data – NoSQL versus SQL Databases – MongoDB versus Drizzle


(T)  While database engineers from the last two decades were trained to develop with SQL databases, the new generations of engineers are trained to develop with NoSQL databases for Big Data applications. NoSQL databases address the requirements of the huge amount of data on distributed architectures with the resiliency that legacy SQL databases cannot. NoSQL databased have been pioneered by many Internet companies, in particular, Google, Facebook, and Amazon. At its Google IO 2012, Google presented a deep dive analysis on the use cases of NoSQL versus SQL databases.

Among the NoSQL databases which are getting market traction, MongoDB is definitely the most popular ones. MongoDB is the product of 10gen and is written in C++.

MongoDB provides a data model that stores rich objects in a set of hierarchical documents, called BSON, rather than rows split across multiple tables as implemented traditionally in a SQL database. This data model results in expanded documents, rather than new rows, tables, and columns. As a result, transactions remain simple even as the data model evolves. If, for example, ten new fields are added to a document, the query time to fetch that document does not increase.

The second major attribute of MongoDB is its horizontal scale. Instead of expanding the database with bigger servers, the database has been designed to operate over multiple servers. This enables as well transparent upgrades to the latest server technologies while keeping the existing data models.

The third and last attribute of MongoDB is its ability to run in cloud environments leveraging virtualized infrastructure. Capacity can be added through on-demand virtual machines, or to compensate for the performance of individual server nodes.

Besides MongoDB document-based architecture, there are a number of other emerging NoSQL databases, in particular:
 CouchDB, another open source database, with a document-oriented architecture like MongoDB, that uses JSON for documents, JavaScript for MapReduce queries, and HTTP for requests to the data store;
 Casandra, an open source database system started at Facebook to handle big data workloads across multiple data centers where servers can to be added/removed on an on-going basis; Casandra can be considered as having a node-based architecture;
 Neo4j, an open source database developed by Neo Technology that is among the most popular NoSQL graph databases. The data model for those databases is a graph structure very similar to legacy object-oriented databases’ data models.

But if you need to work on a SQL database, you might want to consider Drizzle that has been forked from the popular MySQL open source database. The Drizzle team has removed all non-essential code, has re-factored the remaining MySQL code into a plugin-based architecture, and finally modernized the code base moving to C++.

Copyright © 2005-2012 by Serge-Paul Carrasco. All rights reserved.
Contact Us: asvinsider at gmail dot com.

Categories: Back-End, Big Data