Core Couchbase services
LinkedIn runs a mix of Couchbase Community and Enterprise Editions over our infrastructure. The deployments of Couchbase range from 3 nodes per cluster up to 72 in our largest cluster. Currently, we deploy this as a standard RPM, but in the future, we’ll move towards coupling Couchbase deployments with our in-house deployment system LID.
Salt + range
As detailed further in this post, LinkedIn installs and manages Couchbase clusters using an array of SaltStack tooling. We utilize range as a cluster configuration store to assist with installation and monitoring of the cluster. This data is also used in our fleet management system, Macy’s (see below).
Li-couchbase-client wraps the open source Java couchbase-client with our own modifications. In our version, we’ve added the ability to monitor the client statistics and capture metrics like Queries Per Second (QPS), latency, and errors, which allows us to have great insight into the use and performance of Couchbase.
At LinkedIn, a number of our infrastructure tools are written in Python and use Couchbase as a backend. Currently, we use the open source Python couchbase-client. In the future, we plan to write our own wrapper library that will allow us to include the emission of metrics and the automatic discovery of Couchbase servers. This will help improve the operability of Couchbase for Python users at LinkedIn.
While installing Couchbase on servers, we also install a daemon called “amf-cbstats.” Active Monitoring Framework (AMF) is a framework at LinkedIn to send metrics to our monitoring system from applications. Amf-cbstats polls the standard performance metrics from a Couchbase server every minute and sends them to our metrics collection system. We also have a second daemon, “amf-couchbase-aux,” that collects metrics about backup we perform on certain clusters. You can see more about Couchbase monitoring from my Couchbase Connect 2016 presentation.
After collecting all of these metrics, we need to visualize and set alerts against them. At LinkedIn, we use our in-house metrics visualization system, inGraphs, to display this data. After installing a Couchbase cluster, engineers use an internal utility called “couchbase-dashboard-generator” to generate a set of dashboards and alerts for the cluster. The dashboards include a set of important graphs we consider key to monitoring the health of the cluster, as well as all the other statistics we collect. We automatically include alerts with these dashboards and they are tunable via manipulating the range data mentioned earlier.