Posts tagged as cassandra


Column Expiration (TTL) Support for SimpleCassie

I’m probably the only person who still uses SimpleCassie, an early PHP wrapper for Cassandra. I like its chaining syntax and I’m too lazy to port our code over to phpcassa (although a CQL migration seems inevitable).  

Just in case there are other SimpleCassie users out there, I’ve forked it on Github to include TTL support (developed by Zhengjun Chen) and a parse() method to return friendly responses instead of raw Thrift objects. See the README file for details.


Batch Reporting in Cassandra

I’ve had great success using Cassandra for real time querying, but have only recently begun exploring more complex reporting queries.

I knew a report I needed recently would be hitting the database pretty hard, so I isolated one of our nodes by removing it from the balancer serving our front end users. I used a ConsistencyLevel of One to make my reads as fast as possible. Unfortunately, I was forgetting about Cassandra’s read repair mechanism.

When a query is performed under ConsistencyLevel of One, the first node to respond will return the result to the client, but all replicas are still contacted in the background. This means that a client connecting to one Cassandra node can still impact performance on another node. With my batch report, I experienced rising CPU and memory usage on our entire cluster, to the point where it was impacting real time queries.

Cassandra’s view is that optimizing for both real time queries and batch reporting on the same server is futile. Instead, the common pattern for supporting both workloads is to use Cassandra’s Rack Aware strategy to create two different clusters (aka “racks”) with the same data but different performance configurations and workloads.

All well and good for those with the luxury to purchase gobs of hardware, but for the rest of us there aren’t many options. It would be great if Cassandra supported a ConsistencyLevel where only one replica is contacted at all. Unfortunately, given the fact that there’s no guarantee the server the client contacts is the one holding the data, this probably is not possible.