Relational V S Key Value Stores Information Technology Essay

RDBMS is a database management system which describes the data as a relational model. It defines the data and the relations in form of tables. Tables consist of rows and columns. Relational models have a data model in place along with various defined constraints. SQL query is used to retrieve data from the database. SQL includes a lot of complex queries which involves integrity constraints and joins.

Key-value distributed stores allows storage as a simple hash table. It stores keys and the keys map to a value. The search is conducted on the keys and it returns the value. The value is either stored as binary object or semi-structured like JSON.

“SQL databases are like automatic transmission and NoSQL databases are like manual transmission. Once you switch to NoSQL, you become responsible for a lot of work that the system takes care of automatically in a relational database system. Similar to what happens when you pick manual over automatic transmission. Secondly, NoSQL allows you to eke more performance out of the system by eliminating a lot of integrity checks done by relational databases from the database tier. Again, this is similar to how you can get more performance out of your car by driving a manual transmission versus an automatic transmission vehicle.”[1]

Key-Value store is part of the NoSQL community (Not Only SQL).

Comparison

Relational

Key-Value

Retrieving and storing data may require multiple tables

They store data about a particular item (which is used as the key) along with that item

Defines the model and application is developed to map to the model

Application-driven model in the sense that the application needs to decide the model for storing and retrieving the data

Data integrity and constraint checks are enforced by the defined model

Data integrity and constraint checks have to be enforced at the application-level

Data is accessed by using a query language like SQL

Data is accessed by using the API of the key-value store

As the data model and business logic are independently developed, the same data model can be re-used for a different application

The data in the store is application-specific and the chances of reuse are less as compared to relational model

Queries

Relational

We will store the information about books in a table. Consider the books table as below:

book

author_name

A Whole New Mind

Daniel Pink

The Black Swan

Nassim Taleb

Power of Now

Eckhart Tolle

We will write the query in SQL as follows:

Query 1: To get all books from a particular author

select * from books where author_name = “Daniel Pink”;

Query 2: To get all authors

select author_name from books;

Key-Value

Consider that we have to run the same queries for a key-value database. We will have to store the information in 2 key-value pairs as below:

authors => {“Daniel Pink”,”Nassim Taleb”, “Eckhart Tolle”}

author:Daniel Pink => {“A Whole new mind”, “Drive”}

We will write the query using the key-value store’s API as follows:

Query 1: To get all books from a particular author

get(author:Daniel Pink );

Query 2: To get all authors

get(authors);

The logic for searching is pretty complex in key-value but they are way faster than database.

Performance

There are some relational database operations that are a performance hit:

Altering a table consists of:

Updating the schema

Inserting or deleting values for the (new/old) rows and columns

It may take hours to complete this operation for millions of rows. While performing such operations, the database tends to lock the tables and this results in a performance hit.

Joins

Joins are relatively slow. They are one of the main reasons of using relational models. Excluding joins would be as good as not using relational databases.

Results of MySQL, Redis and Tokyo Tyrant performance comparison[2]

.

Scalability

Relational databases tend to scale well when the entire database is located on a single server. The problem starts when the single server can no longer handle the requests and the data has to be partitioned across servers.

One of the way to do that would be to use replication. The problem with that it that it takes time for the replication to initially sync with the master and be on par with the database. This might be a huge problem if the system is a social networking site and is used by lots of users.

Another aspect is to shard the database. One of the techniques of database sharding involves horizontal partitioning. It basically divides the rows of a table to make a new partition. Each partition is called a shard.

id | name

| Daniel Pink

| Nassim Taleb

id | name

| Daniel Pink

| Nassim Taleb

| Eckhart Tolle

| David Hansson

id | name

| Eckhart Tolle

| David Hansson

Problems with Database Sharding

You will have to shard the database keeping in mind the access patterns of the end-users

Also, sharding means that you cannot run queries that involve join operations across shards as this would be inefficient

Using referential integrity across databases is also a problem in sharding the database

It’s a complex system to implement in practice

So, essentially sharding the database means that you lose all the advantages of the relational model.

Key-Value databases can scale really well as it’s not a single point of failure and can tolerate multiple failures. Addition of new nodes can also be done easily and thereby helps to make the application scalable.

Application specifics

Applications need to decide if they have to support more features or scaling.

CAP

“The CAP theorem, also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees”[3][4]

Consistency states that every client has the same version or view of the data

Availability states that the service will always be able and read and write can be performed on the server

Partition tolerance means that the system can work even if some of the nodes go down

Visual guide to NoSQL [5]

Relational Database

Consistency is the core of relational model and it is the reason for its massive usage in applications.

If you design your database for consistency and availability then if the node fails there is no failover

If you design by keeping consistency and partition-tolerance in mind then it means that the data would not be available till the failover node gets to a consistent state in case of any failure

Key-Value stores provide Availability and Partition tolerance at the cost of Consistency

You can adopt a model depending on the requirements of the application. If you are using a banking application, consistency is the most important aspect and you have to use a relational database.

If you are building a social networking website used by millions of users, it’s important that the data is available to the users immediately. In that case you can opt for a Key-Value database.

Advantages of relational database

Relational databases support transactions and all ACID properties. These are essential for business and critical applications to always have consistent data Example: payroll or banking applications

RDMS tools are mature and have been tested over a period of time to solve various kinds of problems

As SQL is a standard language used by all the relational database vendors with relatively minor changes in implementation, you have the flexibility to move to a different product as most of them use SQL for querying the database.

It considers storing as well as searching as the same problem

Disadvantages of relational database

The developer has the overhead of making sure that the data model maps to the business logic

Relational database is not distributed by nature and stores all the data in one server and this acts as a single point of failure

Also in case of massive deployments, it requires specialized hardware such as SAN’s to scale it

Relational databases are not scalable

Advantages of Key-Value stores

It does not have any schema and so you are not bound by the model

Simple queries like get, put and delete in most cases allows the system’s performance to be a lot more predictable

It can tolerate failures depending upon the configuration

It can run on less-expensive hardware

High performance as compared to relational databases

Easy to scale by adding an extra nodes

There is no need of a DBA to manage the database

Disadvantages of Key-Value stores

Generating dynamic reports from data is difficult

There is no easy way to export data from the store

Migrating from one product to another product is difficult in as the API used to access data is store specific

Order Now