NoSQL is a term for an exciting group of products that offer an alternative to the rigid, and perhaps outdated, practices of using relational databases and SQL as data storage for applications and websites. It’s not a standard; the term refers to databases that use their own APIs instead of SQL to store and retrieve content. Although, this is also a little misleading since some of the software does in fact support the use of SQL, so many people have to come to regard NoSQL as meaning “Not Only SQL”.
When writing software, having to convert the data structures used in your code to a fixed arrangement of tables and columns often seems like an unnecessary hassle. And to make matters worse: as the database grows, it becomes increasingly complicated and slow.
NoSQL databases offer a far more flexible approach – often allowing you to use the same sorts of structures in the database, as you do in your code – and one that has been optimized for storing huge amounts of data and exchanging it with your application at high-speed. Instead of the database running queries to filter and order data, your application is generally expected to do that.
Different NoSQL databases are often grouped according to very vague concepts of the kind of data that they are best suited to storing. All of the products mentioned here are open-source.
Key-Value Data Stores
A key-value store holds binary data, with no enforced structure, similar to a hash table (a NameValue collection, dictionary, associative array or whatever you’re used to calling them) in that objects are saved and accessed through unique “key” values. The binary data may be in a form that directly corresponds to how your application deals with the object in memory. This type of database is particularly good when using highly-flexible data formats and for persistence of objects in rapidly-changing applications.
Redis and Riak are two key-value data stores that implement their own APIs for access to data. Both are available for the Linux platform. However, being more popular and feature-packed, an experimental distribution of Redis for Windows has also been developed by Microsoft Open Technologies Inc. and is available at https://github.com/MSOpenTech/redis. Riak, while lacking many of Redis’ features, has more of a focus on enterprise-level, distributed systems and Basho Technologies also offers entirely hosted solutions for people who don’t want to run their own servers.
Document databases are optimized for storing structured and unstructured “documents” – in a general sense of data and metadata, not your Microsoft Word files. They usually offer more searching and indexing features than key-value data stores. The structural requirements for data in a document database are there to define how data is represented so that it can be manipulated, not to specify what data should, or should not, be present.
RavenDB is unusual among the products presented here, having been developed with the Microsoft .NET Framework and providing integration with Internet Information Services (IIS). Although only a .NET client library is available, RavenDB databases are also accessible through a web service. For .NET developers, being able to work with RavenDB’s source code may be a significant advantage over other products.
Also known as “extensible record stores”, the wide-column database has some similarities with relational databases. The key point here is that now you have the freedom to include different columns for each record, and the database is optimized to reduce the amount of time and resources spent handling data that is not relevant to the query.
Cassandra is from the Apache Software Foundation and is designed for excellent performance of large-scale, distributed applications on Linux, Windows, and Mac OS X. Its main rival, HBase, is another cross-platform Apache product, which was developed in Java and supports a Java API for client access, along with gateways such as a RESTful web service.
Graph databases are fairly unique in their approach of applying “graph theory” to data storage concepts – almost like treating pieces of data as shapes on a spider diagram and drawing connecting lines between them. This provides efficient access to highly-linked data, and modelling of the relationships between data objects in ways that are extremely difficult with traditional database systems.
Neo4j describes itself as “the world’s leading graph database” and is written in Java. Its core components are licensed under the GPL, but commercial subscriptions extend the functionality – particularly in the areas of performance and scalability – with non-GPL licensed code. Neo4j is available for Windows, Linux, and Mac OS X.
Titan is an unusual system among those listed here. It relies on another database backend, such as Cassandra or HBase, for its data storage, and packages are available that bundle Titan with a suitable backend.
Ready for NoSQL?
It can be a relatively annoying job to take an existing application and replace its relational database with a NoSQL one. And the lack of standard APIs across the various NoSQL databases may be an additional reason why you may not be ready to make the switch yet, since it will be awkward for you to change from one to another if you decide you don’t like the one you picked initially.
However, NoSQL technologies are here to stay, and are a perfectly-valid choice. They frequently outperform older systems, and reduce complexity for software developers (Yay!) Your next project might be the ideal time to try one.