Jump to content

SessionStorage/Runbook

From Wikitech

Cassandra Node Down

Sessionstore is backed by a Cassandra cluster with 3 nodes in each of eqiad & codfw. It uses a replication factor of 3 per data-center, and does quorum reads and writes. This means: At least two nodes need to be up in a data-center for sessionstore to field requests there. The loss of one node in a DC is acceptable, the loss of a second is impacting. Thus, if a single node is down, make arrangements to get it back up as soon as possible and restore full redundancy. If two or more nodes have failed, depool sessionstore for that datacenter immediately.

High Storage Utilization

The sessionstore cluster is over-provisioned, and storage is expected to be stable, so high utilization is an indication of an aberrant workload. Pay particular attention to the rate of growth, you may not have much time to act!

If the sessionstore cluster is experiencing high storage utilization, the first order of business is to establish the rate of growth —and by extension— how much time you have. Under normal circumstances, storage volume should be relatively constant (graphs will have a sawtooth pattern, but trend flat), so high utilization is an indication that something is very wrong. Time permitting, you should attempt to figure out what is generating the additional storage. If however you are in imminent danger of running out of space, the only option is to truncate the data table.

Truncating (deleting)

Truncating will delete every session. This impacts users, and should only be used as a last resort to prevent the cluster from running entirely out of storage (which will create greater impact).

From any sessionstore node, start a CQL shell, and issue TRUNCATE sessions.values. You will not be prompted, the effect is immediate.

eevans@sessionstore1004:~$ cqlsh-a 
Connected to sessionstore at 10.64.0.33:9042
[cqlsh 6.1.0 | Cassandra 4.1.8 | CQL spec 3.4.6 | Native protocol v5]
Use HELP for help.
cassandra@cqlsh> TRUNCATE sessions.values; 
cassandra@cqlsh>
eevans@sessionstore1004:~$

A TRUNCATE will result in a snapshot, so no actual data will be freed until the snapshot is removed. Snapshot removal is a per-node operation; It must be performed on each host, and for each instance. Use cumin for this:

eevans@cumin1002:~$ sudo cumin A:sessionstore 'c-foreach-nt clearsnapshot --all'

High Error Rate

FIXME: Do.