Comparision Between Mongodb And Couchdb Information Technology Essay
For almost 30 years, the relational database or RDBMS has been a dominant model for data management. The cost for managing RDBMS increased with time due to factors such as Scaling of the database, maintenance by DBAs, handling of large volumes of data and its effect on performance, etc. To overcome such challenges that were faced by the relational database, NoSQL or non-relational databases came into picture and are gaining mindshare as an alternate to the RDBMS model or the database management in general.
NoSQL databases such as MongoDB, CouchDB, Cassandra, Redis, etc. belong to a broad class of DBMS (Database Management Systems) that are different from the classic RDBMS. These databases or data stores do not require a fixed schema for storing data. They usually avoid join operations and are typically designed for scaling horizontally.
Fascinating as they seem, the NoSQL databases do not make up the content for this article. This article centers on comparing two such NoSQL databases, namely MongoDB and CouchDB.
MongoDB is developed by 10gen and was released in public in February 2009. MongoDB (from the word “humongous”) is anÂ open source, scalable, High-Performance, document-oriented database. It is written in C++ and features document-oriented storage in JSON style, replication of data and high availability, auto sharding, etc.
CouchDB – a commonly referred name for Apache CouchDB is also an open source, document-oriented database. CouchDB is written in the programming language Erlang. It is also a schema-free database and is accessible via a RESTful JSON API.
This article aims on providing a qualitative approach between the two NoSQL databases that focus on some of the spectacular features which can benefit a web developer or a backend engineer while choosing the framework or components that could be used in the project of concern.
One big difference between MongoDB and CouchDB is that CouchDB is MVCC based, and MongoDB is a traditional update-in-place based. MVCC is an abbreviation for Multiversion Concurrency Control which is a method of providing concurrent access to the database and to implement transactional memory in programming languages. In MVCC, the update operations are not done by deleting and overwriting the old data with new information. It is a newer version that adds up making the old data obsolete, which could be removed later.
MVCC is suitable for scenarios and problems which need intense versioning or replication needed for large amount of data for master-master replication. Although MVCC sounds as a better option at first, it does have its drawbacks in the form of additional work needed for the same. If there are any conflicts that occur on any transaction(s), it requires manual handling by a programmer unless of course locking is implemented for the same, which likely looses the master-master replication. Also, the database has to be compacted periodically, if the number of updates is more.
MongoDB is based on update-in-place and delivers very high write performance, especially for updates. It fits best if needed by a scenario needing high update rate of objects or if the data is to be dumped in large volume. MongoDB is ridiculously fast but follows more like a master-slave replication pattern rather than the complex master-master pattern.
The design for CouchDB is intended for “crash-only” purpose where in the database can terminate, rather crash at any time and still remain consistent. It means the recovery and consistency of the database and the data stored within.
MongoDB does not provide as much durability support as CouchDB. MongoDB used a stored engine that would help in repairing the database in case of a hard crash, and the repair would need a repairDatabase() operation when starting up the database again. Higher versions of MongoDB however offer higher durability than the previous versions.
Couch uses an index building scheme which is a clever approach to generate indexes to support queries. This is an elegant approach as it can provide what can be called as materialized view. One must however pre-declare these structures for each query which is needed to be excuted.
Mongo doesn’t use indexing, rather uses the traditional dynamic queries.Â It includes a query optimizer which decides or makes these decisions and determinations.Â This methodology is quite impressive when we don’t want to index, say in case of insert-intensive collections. This is quite handy in inspecting data administratively. When an index corresponds perfectly to the query, the Couch and Mongo approaches are then conceptually similar.Â We find expressing queries as JSON-style objects in MongoDB to be quick and painless though.
Scalability in both databases is one fundamental difference. Number of Couch users use replication as a way to scale.Â With Mongo, replication is thought as a way to gain reliability/failover rather than scalability.Â Mongo uses (auto) sharding as the path to scalability. Sharding is nothing but a method of horizontal partitioning in a database or a search engine. Following figures depict the scalable behavior for both mongodb and couchdb:
Figure 1 – MongoDB Sharding
Figure 2 – CouchDB Replication (Master-Master)
All stored items or documents in CouchDB are treated as resources. This makes a unique URI to access any of these resources. These resources are exposed to HTTP and REST uses the standard HTTP methods POST, GET, PUT, DELETE to perform the basic CRUD operations on these resources. It’s as if CouchDB was specially made for a RESTful implementation.
MongoDB focuses mainly on performance and relies on drivers that are language specific. These drivers can be downloaded and used to access the database on Mongo’s custom protocol which is binary. This doesn’t mean that a RESTful interface can’t be built on these drivers.
The above mentioned features make both NoSQL databases a strong competent for use in the web app architecture. It however lies on the need of any project wether they need to use MongoDB or CouchDB. A comparision of the two can detail out the differences clearly and we can then determine what the pros and cons of both.
Let’s summarize each database’s features in a table and see what picture they depict in terms of performance, support, features, etc.
Document-oriented, JSON, Schema-free
Document-oriented, JSON, Schema-free
Contains detailed docs get started quickly
Contains good documentation
Map and Reduce
Yes, supports concurrent modification of single documents
Yes. Also supports concurrent modifications of documents just like MongoDB
Yes, This is because CouchDB is MVCC based
Horizontal scalability, but in terms of sharding(auto)
Horizontal scalability, but in terms of replication
Crash and recover. Might lose consistency
Crash-only. Crash and remain consistent
Yes, the use is however adjunct as more of JSON-style objects are used in the query
Yes, the use is extensive and also includes the building of views
Good for high update rates
Fault tolerant, requires compacting though
Both open source document-oriented databases are designed for easy querying with java script, Rest service which means they are very easy and flexible to practice on. Both the databases are almost the same age in context of their release and contain a proper documentation for getting started. Both the databases are easy to work with but CouchDB requires map-reduce style query whereas MongoDB relies on SQL-style based query. MongoDB provides a good support for integration with PHP which could benefit PHP developers.
Understanding the differences we can see different scenarios or use cases where these NoSQL databases fit properly.
Say for building support service or say lotus notes, any problem where data will be offline for hours and then back online would require the use of CouchDB. There could be an implementation of master-master replication which keeps the offline data consistent.
If the need is of high performance, without any doubt MongoDB is to be used. This can be taken in an example of a user profile in a website; we also need storage for the object and the caching of data from different sources.
If there is a requirement or a problem with high update rates, MongoDB should be used as it is good at it.
But if the problem requires concurrent access for “READ” while the “WRITE” is in progress, CouchDB should be used as it provides concurrent access to data or the “READ” access during “WRITES”. This is because no global lock is created on server writes.
There are a few possible pros and cons of replacing CouchDB with MongoDB:
Mongo uses update-in-place, so the file system needed for compaction is less as we can store our schemas in one document and it is likely to perform better.
Queries are runtime based, meaning they are done at runtime. Some indexes are however helpful if we want to set up queries ahead of time.
MongoDB uses a binary format to pass data. One of the problems that are commonly faced is the encoding/decoding of the JSON data from/to the database layer to/from the API layer using the database. This however can be modified at the API level rather than the database level.
Supports drivers for various languages rather than just a RESTful interface. These drivers are language-specific and are available at MongoDB’s website.
Replication in terms of master-master for data sync even when offline.
Concurrent read access while writes
We found out features that Mongo and Couch provide and had them compared to each other. Also we discussed a few scenarios where each of these databases would work as a proper fit. However both Mongo and Couch are Map/Reduce style which is difficult to grasp but with a proper documentation it can be matched to the level of performance and efficiency both of them provide.
We found out that MongoDB can be used for traditional web based applications where users need a lot of updates or are performing a lot of updates. The interfaces and features however lie with the developers or the team providing the service or in this case the web application.
CouchDB should be used for developing focused web application. These applications are mostly a web hook or an API. The users using this kind of focused web application are generally the ones controlling the interface.
We saw the features provided by two NoSQL databases and compared them with each other and discussed which of them will be a better decision to be taken in different problem scenarios. As a part of database landscape, NoSQL databases are becoming a valuable part of this ecosystem. An enterprise should take into consideration the factors discussed in this article along with the limitations of these databases.Order Now