NoSQL Comparisons : Cassandra vs HBase

 

After looking at both Cassandra and HBase, there is a natural tendency to wonder which one is better. This is not an easy decision to make.

If you are highly-experienced with databases and database server administration then your priorities and preferences could be very different from someone who is just starting out on their first large-scale NoSQL setup. So rather than compare very technical aspects that may not be useful to newer users, here you can see a more general comparison that should help those of you who have less-specific requirements.

Installation

Enterprise-level solutions such as Cassandra and HBase are clearly going to be a little more difficult to install and setup than smaller products like RavenDB. Installing HBase is made more complicated by having to install all of its key components separately. And because it usually runs on the Hadoop distributed file system (HDFS), this adds an extra layer of complexity if you are not already using that software.

Cassandra installs all of its key components in one, relatively simple, installation process. 

However, it is worth noting that you can run HBase without HDFS (although it is still required for fully-distributed systems), and you can run Cassandra with it. So, as they say, your mileage may vary.

Documentation

HBase’s end-user documentation is not great, and may even be off-putting to some new users. This is an area where DataStax (distributors of their own editions of Cassandra) have focused a lot of effort. The documentation for DataStax Cassandra is far more readable and accessible, and their free, online training programs are a definite plus. These materials are useful even if you’re not using a DataStax version of the database.

Administration Tools

Both databases have pretty much the same tools – command line interfaces, web-based management tools, and monitoring solutions. In terms of functionality, there’s not enough real differentiation between these tools to objectively say that one set is substantially better than the other.

Programming

Both databases are written in Java and have client libraries for most of the same programming languages. Early comparisons of HBase and Cassandra were written before Cassandra had support for triggers, aggregate functions, or any means of running server-side code. In version 3.0, Cassandra will support user defined functions (taking care of the latter two points) and has supported triggers since version 2.0. However, if you like SQL then the inclusion of Cassandra query language (CQL) is going to be a key decider for you.

But if you cannot wait for version 3.0 and do not need an SQL-like query language, then HBase is currently a little more capable.

Performance

In most independent tests, Cassandra is a clear winner in terms of its overall performance. But that doesn’t mean HBase is slow, far from it. While Cassandra is optimized for writing data, HBase is optimized for reading data and has sometimes been shown to be slightly faster in applications that are read-heavy. Overall though, Cassandra is the better-performing product.

Scalability

Both solutions are intended as highly-available, scalable, enterprise-level systems, and so it is quite difficult to judge between them in this area. HBase scales very well horizontally (although not without effort on the part of the system administrator), while Cassandra’s row-size limit could be a problem in some rare cases.

However, the key difference between the two comes when you need guaranteed consistent data across all of the nodes in your cluster. Cassandra’s “eventual” consistency model makes this a little more difficult to achieve, despite node administration being substantially easier than with HBase.

Verdict

As with any comparison between two large systems, prior experience and personal preference can make a big difference to which product you decide to use. Both have a lot of loyal users. The requirements of your specific application are a more important factor than general comments in the six areas above.

However, Cassandra’s ease of installation and significantly better documentation and training resources set it apart from HBase. For new users, documentation is extremely important and HBase is found lacking. That the increased “friendliness” of Cassandra also comes with overall performance gains gives Cassandra the “win” at this point.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>