Currently
data is being stored the way it would be best for reporting. So a RDBMS data
store, with index keys and neatly laid out data with relationships that can be
easily queries has become the industry standard.
Polyglot
persistence means using data storage technology based on the way data is being
used by individual applications or components of a single application. So for
ex: apart from the relational data, data could be stored in form of Key/Value Store,
in Columnar Stores, in form of Documents, in form of relationships between
multiple items which can be stored as Graph and so on.
So why was
polyglot persistence not heard of before? The reason being the different data
persistence technologies matured in last few years and have some very
successful implementations at a large scale. Also the usage of data now is not
only by applications within one's own enterprise but data needs to be shared
between services that may or may not be contained within a single application.
The licensing cost of the likes of Teradata, Oracle and MS SQL is also a major
factor in driving these alternatives to become mainstream.
KV Store: An example of Key Value Store is
saving Shopping cart data in this format.
Some of the
main players for this kind of storage are Redis, Azure Table, Riak, Memcached,
Azure Cache and many more. Hadoop uses this kind of data structure for its data
storage option. The NoSQL Stores also follow Key Value Store data storage
option.
Columnar
Store:
Apache Cassandra
and Apache HBase are the major players in this space. Here data is store in columnar format and
there are systems that can store greater than 2 Billion columns of data. Also some
systems support columns that can be sparse as shown in the figure. An example
is the BigTable system developed by Google. eBay uses Hbase for its searches
and firing approximately 2 million queries per second.
This kind
of data store can be used for time series data. Row store or column store data
in 2D format exists only in theory. In reality, data has to be serialized on
the storage hardware into one form or another. Since the most expensive
operations involving hard disks are seeks, related data should be stored in a method to
minimize the number of seeks so as to improve performance.
Document
Store:
Like Key Value
Store, this type of store is Key-Document Store. This type of DB do not require
schema. Documents can be heterogeneous and may be organised in collections or
databases. MongoDB, Apache CouchDB, Raven DB are the most popular for this type
of data store.
Graph
Store :
This type
of store is applicable where multiple nodes has inter-connections/edges as
shown below. The example can be relationships between product purchased and
recommendations. Another example can be Person and his colleagues, friends,
likes, books purchased, product purchased.
Neo4J is quite
popular for Graph Store. OrientDB is another one. TitanDB runs on Hadoop and Apache
Cassandra.
Comparison
on scalability and complexity of Data Stores
The following
graph gives a comparative analysis on scalability vs. complexity of design of
data storage and can help when each of this data store can be used.
This does
not mean that RDBMS is out of the game. They
will still be relevant wherever transactions needs to be stored. Each of these
data stores has their own limitations. Based on the application need, data size
and data design, you can decide which data store to use in your application. The data store decision can also be made based on where the application will be hosted - on premises or in cloud.
Read more articles on Databases: