NoSQL Data Patterns and Caching Tips

 

Fast, flexible and distributed – such are the promises of many NoSQL databases that make intelligent tradeoffs between consistency, availability and partitioning to boldly go where SQL databases cannot. However, there’s a counter-effect; or rather an SQL-mindset concerning data patterns and caching that often needs to be explicitly avoided. Otherwise those great NoSQL advantages may shrivel and die. Consequently, a rule of thumb concerning data patterns and caching for NoSQL databases is often to do the opposite of what you would do for a conventional SQL database.

De-Normalization is the Norm for NoSQL

Instead of trying to introduce ruthless storage efficiency and banish data duplication, NoSQL databases often go the other way. They favor denormalization to copy the same data into several tables or documents. This approach then allows them to group data together in one place for processing a query and avoids the resource-hungry join operations that conventional relational database systems use. The trade-off for NoSQL databases is then to gain greater simplicity and speed at the expense of higher volumes of data stored.

First Figure Out What You Want, Then Ask

The conventional (RDBMS) way of getting information out of a database is to ask for a list of tables and to browse the records of those tables to see what you can find. NoSQL databases however typically deal with unstructured data, where trying to put together lists of tables (or their equivalent) is a distinctly non-trivial task, leading to performance degradation. The better way to extract the specific data you require is to make your database application first determine the corresponding key and then pull out the data from the NoSQL database without browsing.  

NoSQL and Server Platform Caching Strategies

Management of cache by the NoSQL data store or the user varies from one vendor to another. The Oracle NoSQL database uses Berkeley DB Java Edition (JE) as its storage resource. JE nodes to navigate data (interior nodes or INs) and nodes to store data (leaf nodes or LNs). Oracle suggests sizing the JE cache to hold as many of (all) the INs as possible, leaving the file system cache that operating systems use to speed up disk reads with possible extra capacity for INs and LNs.  Memcached uses smart distribution of memory to the parts of the database that need it most, and can store both raw data and serialized objects in cache. Users can then decide if they want to use the web server portion of RAM as the principal caching resource for memcached, or if they want to give it the entire RAM available in the whole server.

Scale Out Rather than Scale Up

Not only do NoSQL data stores usually support a linear scalability of cache that relational databases do not offer, but they also favor scaling out over multiple machines (another sticking point for the RDBMS model). In fact, the NoSQL key value data store Riak is described by its developers as more of a database coordination platform than a database itself, using a high number (64) of databases for high availability. That means that not only can it be, but it also should be distributed over several physical servers, where it then gets the benefit not only of fault tolerance, but also of access to multiple caching resources.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>