Relational V S Key Value Stores Information Technology Essay
RDBMS is a database management system which describes the data as a relational model. It defines the data and the relations in form of tables. Tables consist of rows and columns. Relational models have a data model in place along with various defined constraints. SQL query is used to retrieve data from the database. SQL includes a lot of complex queries which involves integrity constraints and joins.
Key-value distributed stores allows storage as a simple hash table. It stores keys and the keys map to a value. The search is conducted on the keys and it returns the value. The value is either stored as binary object or semi-structured like JSON.
“SQL databases are like automatic transmission and NoSQL databases are like manual transmission. Once you switch to NoSQL, you become responsible for a lot of work that the system takes care of automatically in a relational database system. Similar to what happens when you pick manual over automatic transmission. Secondly, NoSQL allows you to eke more performance out of the system by eliminating a lot of integrity checks done by relational databases from the database tier. Again, this is similar to how you can get more performance out of your car by driving a manual transmission versus an automatic transmission vehicle.”[1]
Key-Value store is part of the NoSQL community (Not Only SQL).
Comparison
Relational
Key-Value
Retrieving and storing data may require multiple tables
They store data about a particular item (which is used as the key) along with that item
Defines the model and application is developed to map to the model
Application-driven model in the sense that the application needs to decide the model for storing and retrieving the data
Data integrity and constraint checks are enforced by the defined model
Data integrity and constraint checks have to be enforced at the application-level
Data is accessed by using a query language like SQL
Data is accessed by using the API of the key-value store
As the data model and business logic are independently developed, the same data model can be re-used for a different application
The data in the store is application-specific and the chances of reuse are less as compared to relational model
Queries
Relational
We will store the information about books in a table. Consider the books table as below:
id
book
author_name
1
A Whole New Mind
Daniel Pink
2
The Black Swan
Nassim Taleb
3
Power of Now
Eckhart Tolle
We will write the query in SQL as follows:
Query 1: To get all books from a particular author
select * from books where author_name = “Daniel Pink”;
Query 2: To get all authors
select author_name from books;
Key-Value
Consider that we have to run the same queries for a key-value database. We will have to store the information in 2 key-value pairs as below:
authors => {“Daniel Pink”,”Nassim Taleb”, “Eckhart Tolle”}
author:Daniel Pink => {“A Whole new mind”, “Drive”}
We will write the query using the key-value store’s API as follows:
Query 1: To get all books from a particular author
get(author:Daniel Pink );
Query 2: To get all authors
get(authors);
The logic for searching is pretty complex in key-value but they are way faster than database.
Performance
There are some relational database operations that are a performance hit:
Altering a table consists of:
Updating the schema
Inserting or deleting values for the (new/old) rows and columns
It may take hours to complete this operation for millions of rows. While performing such operations, the database tends to lock the tables and this results in a performance hit.
Joins
Joins are relatively slow. They are one of the main reasons of using relational models. Excluding joins would be as good as not using relational databases.
Results of MySQL, Redis and Tokyo Tyrant performance comparison[2]
.
Scalability
Relational databases tend to scale well when the entire database is located on a single server. The problem starts when the single server can no longer handle the requests and the data has to be partitioned across servers.
One of the way to do that would be to use replication. The problem with that it that it takes time for the replication to initially sync with the master and be on par with the database. This might be a huge problem if the system is a social networking site and is used by lots of users.
Another aspect is to shard the database. One of the techniques of database sharding involves horizontal partitioning. It basically divides the rows of a table to make a new partition. Each partition is called a shard.
id | name
| Daniel Pink
| Nassim Taleb
id | name
| Daniel Pink
| Nassim Taleb
| Eckhart Tolle
| David Hansson
id | name
| Eckhart Tolle
| David Hansson
Problems with Database Sharding
You will have to shard the database keeping in mind the access patterns of the end-users
Also, sharding means that you cannot run queries that involve join operations across shards as this would be inefficient
Using referential integrity across databases is also a problem in sharding the database
It’s a complex system to implement in practice
So, essentially sharding the database means that you lose all the advantages of the relational model.
Key-Value databases can scale really well as it’s not a single point of failure and can tolerate multiple failures. Addition of new nodes can also be done easily and thereby helps to make the application scalable.
Application specifics
Applications need to decide if they have to support more features or scaling.
CAP
“The CAP theorem, also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees”[3][4]
Consistency states that every client has the same version or view of the data
Availability states that the service will always be able and read and write can be performed on the server
Partition tolerance means that the system can work even if some of the nodes go down
Visual guide to NoSQL [5]
Relational Database
Consistency is the core of relational model and it is the reason for its massive usage in applications.
If you design your database for consistency and availability then if the node fails there is no failover
If you design by keeping consistency and partition-tolerance in mind then it means that the data would not be available till the failover node gets to a consistent state in case of any failure
Key-Value stores provide Availability and Partition tolerance at the cost of Consistency
You can adopt a model depending on the requirements of the application. If you are using a banking application, consistency is the most important aspect and you have to use a relational database.
If you are building a social networking website used by millions of users, it’s important that the data is available to the users immediately. In that case you can opt for a Key-Value database.
Advantages of relational database
Relational databases support transactions and all ACID properties. These are essential for business and critical applications to always have consistent data Example: payroll or banking applications
RDMS tools are mature and have been tested over a period of time to solve various kinds of problems
As SQL is a standard language used by all the relational database vendors with relatively minor changes in implementation, you have the flexibility to move to a different product as most of them use SQL for querying the database.
It considers storing as well as searching as the same problem
Disadvantages of relational database
The developer has the overhead of making sure that the data model maps to the business logic
Relational database is not distributed by nature and stores all the data in one server and this acts as a single point of failure
Also in case of massive deployments, it requires specialized hardware such as SAN’s to scale it
Relational databases are not scalable
Advantages of Key-Value stores
It does not have any schema and so you are not bound by the model
Simple queries like get, put and delete in most cases allows the system’s performance to be a lot more predictable
It can tolerate failures depending upon the configuration
It can run on less-expensive hardware
High performance as compared to relational databases
Easy to scale by adding an extra nodes
There is no need of a DBA to manage the database
Disadvantages of Key-Value stores
Generating dynamic reports from data is difficult
There is no easy way to export data from the store
Migrating from one product to another product is difficult in as the API used to access data is store specific
Order Now