Monitoring Cassandra at Scale

At Yelp we leverage Cassandra to fulfill a diverse workload that seems to combine every consistency and availability tradeoff imaginable. It is a fantastically versatile datastore, and a great complement for our developers to our MySQL and Elasticsearch offerings. However, our infrastructure is not done until it ships and is monitored. When we started deploying Cassandra we immediately started looking for ways to properly monitor the datastore so that we could alert developers and operators of issues with their clusters before cluster issues became site issues. Distributed datastores like Cassandra are built to deal with failure, but our monitoring solution had to be robust enough to differentiate between routine failure and potentially catastrophic failure.

Related Articles:

https://engineeringblog.yelp.com/2016/06/monitoring-cassandra-at-scale.html