Cassandra/CapacityPlanning

From Wikitech

General

Establishing an Upper Bound

Preserving the ability to decommission hosts

From an operational perspective, it is important to retain the ability to decommission at least one host in each rack.  As an example, consider a cluster that has 4 machines to a rack, all of which are treated homogeneously (i.e. they all store the same amount of data).  In order to retain the ability to decommission one host in each rack, you must have the free space to transfer one host’s worth of data to the remaining 3, this would establish a hard upper bound of 75%.[1]

NOTE: It may make more sense to express capacity and utilization in terms of data units, rather than percentages.  However, when considering an upper bound that permits the decommission of at least one host, a percentage will always be required. That percentage is calculated as: ((N-1) / N) * 100 (where N is the number of hosts in a rack).

Runway

Another consideration to establishing an upper bound on utilization is the runway needed to procure and provision additional resources. There needs to exist an upper bound on utilization, after which no additional use-cases will be added (or changes made to existing use-cases that would result in increased utilization).  From this point of “No”, there should be sufficient time given the rate of growth, to budget for, procure, and provision additional storage well before running critically low.

Working Space

Compaction is an asynchronous process that optimizes the layout of data on disk for read performance. Various compaction strategies exist within Cassandra, each meant to optimize for different storage and/or access patterns, but all work by creating new files to atomically replace existing ones. This requires that we maintain enough free space to store as much as 2x of all in-progress compactions (space enough for the tables being combined, plus the newly combined files, until the compaction succeeds and the old data can be safely removed). As a pathological worst case, imagine an entire dataset stored as one keyspace, with a single configured data file directory, sans incremental repair. A non-split major compaction using SizeTieredCompactionStrategy would result in a single file as output, requiring 100% in additional overhead for working space. Again, this represents a worst-case scenario that is easily avoided, but it demonstrates the need to understand your configuration, and plan accordingly.

Other examples of where working space is needed include: Java heap dumps, hint storage, saved caches, and commitlogs.

RESTBase

Cluster Disposition

RESTBase Cassandra cluster

For the RESTBase cluster, Cassandra hosts are uniformly distributed over 3 rows in each data-center. Cassandra's NetworkTopologyStrategy is used to create a correlation between a Cassandra rack, and each of the data-center rows where nodes are located; Three replicas are stored in each data-center, one per row/rack.

Multiple data-centers are used in order to support active-active geographic distribution (uncoordinated reads and writes from either or both sites). The use of 3-way replication in each data-center is primarily a matter of guaranteeing consistency and availability requirements, vis-a-vis tunable consistency, and not about data redundancy (all data in RESTBase is thus far secondary in nature, and can be recovered or regenerated from primary data sources).

NOTE: All hosts in the RESTBase cluster are treated homogeneously, even when the hardware is not. This necessitates that capacity be based off of the least common denominator of any resource. For example: If a single host in a 24 host cluster has a total storage capacity of N, and the other 23 have N+10, total capacity is N*24.

Constraints

Storage Capacity
Date Total capacity Critical limit Soft limit Current utilization
2018-07 16.3TB 11TB[2] 8.5TB[3] 9TB (2TB/-0.5TB)

FY18/19

FY19/20

Footnotes

  1. Planning for 100% utilization is dangerous and impractical, so the actual limit needs to be somewhat lower than this in practice.
  2. On the basis that we need reserve 25% for decommissions, and 1.2TB for other overheads.
  3. Given a growth-rate of 850GB/month, and the need for a 3-month runway.