Cassandra
Clusters
There are three production, and one staging Cassandra clusters in use at WMF.
RESTBase
Originally dedicated to RESTBase for caching of service responses (and still colloquially referred to as the RESTBase cluster). With the deprecation of RESTBase, this cluster is earmarked as multi-tenant, general purpose storage. In addition to the traditional RESTBase datasets, this cluster also provides storage for: the Echo extension (last-seen timestamps).
See also: Cassandra/Clusters
AQS / Generated Dataset Platform
Originally dedicated to storage of the Analytics Query Service (AQS) datasets (and still frequently referred to as the AQS cluster), this cluster has since been repurposed for multi-tenant storage of generated datasets (which includes the legacy AQS datasets). In addition to AQS, it also serves as storage for: Image suggestions.
See also: Cassandra/Clusters
Sessionstore
Sessionstore uses Kask with Cassandra (a dedicated cluster for security reasons) as a key-value store for session data.
See also: Cassandra/Clusters
Staging
See: Cassandra/Staging
Network Topology
Cassandra's tunable consistency provides much flexibility, allowing to you to pick and choose amongst the trade-offs commonly associated with replicated storage (see: CAP theorem). However, quorum-based consistency for both reads and writes —which provides strong consistency guarantees, and availability— has proven so compelling that it is has come to be used exclusively at the WMF. As a result, every application at the WMF utilizes a replication factor of 3, per data-center (in support of LOCAL_QUORUM).
Within WMF data-centers, the largest unit of failure we plan for is the row. As a result, we configure replication in Cassandra such that it never places more than a single replica in the same row; We map rows in our data-centers to Cassandra's notion of a "rack".
RESTBase
Data-center | Cassandra rack ID | Corresponds to... |
---|---|---|
eqiad | a | Row A |
eqiad | b | Row B |
eqiad | d | Row D |
codfw | b | Row B |
codfw | c | Row C |
codfw | d | Row D |
AQS / Generated Dataset Platform
Data-center | Cassandra rack ID | Corresponds to... |
---|---|---|
eqiad | rack1 | Rows A and D |
eqiad | rack2 | Rows B and E |
eqiad | rack3 | Rows C and F |
codfw | a_c | Rows A and C |
codfw | b_e | Rows B and E |
codfw | c_f | Rows C and F |
Sessionstore
Data-center | Cassandra rack ID | Corresponds to... |
---|---|---|
eqiad | a | Row A |
eqiad | c | Row C |
eqiad | d | Row D |
codfw | b | Row B |
codfw | c | Row C |
codfw | d | Row D |
Multi-instance Cassandra
Clusters at the WMF utilize a configuration that permits an arbitrary number of Cassandra nodes (instances) to be ran from a single host. This creates some noteworthy differences when compared to a typical Cassandra install:
- Each configured instance is given a unique identifier, typically from the set
[a, z]
(hereafter referred to asid
) - Configuration files are located in
/etc/cassandra-{id}
- Instances are each assigned a dedicated IPv4 interface (with DNS in the form of
{hostname}-{id}
) - Each instance has its own systemd unit (named in the form of
cassandra-{id}
) - A
/etc/cassandra-{id}/cqlshrc
file is created —readable only by root— populated with superuser credentials and TLS configuration - Logs are written as
/var/log/cassandra/{system,debug}-{id}.log
- As JMX is bound to
0.0.0.0
, each instance is given a unique JMX port number - As a convenience, wrappers for
cqlsh
are install ascqlsh-{id}
(see: cqlsh) - As a convenience, wrappers for
nodetool
are installed asnodetool-{id}
(see: nodetool) - As a convenience, wrappers for
sstableutil
are installed assstableutil-{id}
(see: sstableutil)
See the Tools section below for a number of utilities aimed at easing the administrative burden of a multi-instance configuration.
Common admin tasks
Rolling restart
Cassandra will take a while (minutes) to start up when the dataset per instance is large. During this time, it is not part of the cluster and does not respond to client requests. It is therefore important to wait for each node to come fully up before proceeding with another node.
There is a cookbook for this, sre.cassandra.roll-restart
, which should be run from the cumin server. Example:
cookbook -v sre.cassandra.roll-restart --query restbase1.eqiad.wmnet,restbase2.eqiad.wmnet -r "why we are doing this"
This takes a good while to run. Make sure you're in a screen or tmux.
Restart & check status
(Re)start cassandra: systemctl restart cassandra
. The command
nodetool status
should return information and show your node (and the other nodes) as being up. Example output:
root@xenon:~# nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.64.16.149 91.4 KB 256 33.4% c72025f6-8ad8-4ab6-b989-1ce2f4b8f665 rack1 UN 10.64.0.200 30.94 KB 256 32.8% 48821b0f-f378-41a7-90b1-b5cfb358addb rack1 UN 10.64.16.147 58.75 KB 256 33.8% a9b2ac1c-c09b-4f46-95f9-4cb639bb9eca rack1
The nodetool
utility provides many more interesting tools, among them
- Stats about a keyspace (~db)
nodetool cfstats org_wikipedia_en_T_parsoid_html | less
- Histograms about a table
nodetool cfhistograms org_wikipedia_en_T_parsoid_html data
- Take an immutable snapshot of a specific table or all tables
nodetool snapshot [keyspace] [table]
- Clear snapshot(s) to recover disk space
nodetool clearsnapshot [keyspace] [table]
See more options with nodetool help
Adding a new (empty) node
Background: docs
Stop the node and wipe the default data (if it started up) with rm -r /var/lib/cassandra/*
.
Remove itself temporarily from its own seeds list in /etc/cassandra/cassandra.yaml. This makes sure that the node goes through a full bootstrap process before fully joining the cluster and accepting requests (see this ticket for background).
Now, start the node. If everything goes well the cassandra node should automatically join the cluster after going through a bootstrap ('joining') phase. Verify with nodetool status
as above.
On Bootstrapping
During the bootstrap, data will stream in such a way as to avoid consistency violations. Simply put, the data will be streamed from the node that is giving up the range (primary, or otherwise), and other replicas are not considered. This can result in a lack of streaming concurrency, and a slower bootstrap. For example, consider the case where replication factor is 3, and there are 5 nodes in the cluster, with 2 nodes each in rack A and B, and one in rack C. If a new node is bootstrapped into rack C, all of the data loaded onto the new node will stream from the lone node in rack C.
This behavior can be overridden (if say for example, the single target node were down), by adding the -Dconsistent.rangemovement=false
JVM property during startup of the joining node. Tread carefully though, data loss is possible!
See CASSANDRA-2434.
Boostrapping new nodes
A newly provisioned Cassandra host starts with all of its instances (nodes in Cassandra parlance) disabled. To enable them, do the following:
Run all the Puppets
After the first (successful) run of Puppet has taken place, force a run of Puppet on the other hosts in the cluster. This is necessary to update their firewall rules, and permit inter-node messaging with the new host. For example:
$ sudo cumin A:sessionstore run-puppet-agent
Bootstrap all the instances
Bootstrapping nodes in parallel is ill-advised, and —by default— Cassandra will complain loudly and exit if you try. To address this, the systemd units use a guard-file of the form /etc/cassandra-{id}/service-enabled
, which must be created manually before Cassandra will start. Therefore, after Puppet has run successfully the first time, and after Puppet has been run on the other nodes of the cluster, create the initial guard-file and begin a bootstrap. For example:
$ sudo touch /etc/cassandra-a/service-enabled
$ sudo systemctl start cassandra-a.service
You can monitor bootstrapping using nodetool-{id} netstats
, or interactively with cassandra-streams -nt nodetool-{id}
. Once complete, repeat for the next instance, until all have been provisioned (try: c-ls
for a list of instance IDs).
Getting a CQL shell
Instance-specific wrappers are installed that make obtaining a CQL shell less cumbersome. The wrappers makes use of a cqlshrc
pre-populated with superuser credentials and TLS configuration (see: Multi-instance Cassandra). The file is only readable by root (read: you must have root in sudoers):
$ cqlsh-{id} [args...]
See also: the full CQL reference documentation.
Installing and generating certificates
Cassandra traffic between datacenters travels encrypted as per https://phabricator.wikimedia.org/T108953.
There are two ways to add TLS support for a Cassandra cluster:
- Using the PKI infrastructure. This is only supported for Cassandra 4+, and it is the easiest way. It is sufficient to add pki configs to the Cassandra settings in puppet, and to set the Keystore's password in Puppet private. The Cassandra instances will pick up any new certificate that Puppet will rollout automatically without the need of a restart.
- Using
cassandra-ca-manager
. This method is to be considered deprecated as of May 2024 (see https://phabricator.wikimedia.org/T352647). The per-host keypairs live in private.git on the puppetmaster host (Feb 2021 on puppetmaster1001) under /srv/private/modules/secret/secrets/cassandra. To (re)generate the keypairs a script named cassandra-ca-manager is used, the script is driven by YAML configuration files, one per cassandra cluster. The YAML configuration contains one entry per host plus information for the CA itself, and it is designed to be idempotent. If a certain host in the configuration file is missing its keypair it will be (re)generated when invoking the script (on the puppetmaster, from the puppet.git checkout), for example:
cd /srv/private/modules/secret/secrets/cassandra/ /usr/local/bin/cassandra-ca-manager services-test.yaml
where the yaml file should be the one for the service for which you need certs; for example for restbase, the file restbase.yaml
.
Once generated and committed the new private material will be deployed at the next puppet run.
Also, when adding a new host don't forget to update labs/private.git too and (re)generate the fake private material.
Rollover / expiration
Certificates created via cassandra-ca-manager
are generated with two year expiration, around that time (see also https://phabricator.wikimedia.org/T120662 for monitoring) it is sufficient to delete the old keypairs and generate new ones. Once puppet has deployed the new keypair it should be sufficient to roll-restart Cassandra.
All keypairs are signed by a CA which is part of a truststore, adding a new CA to the truststore will effectively trust both. Another way to achieve rollover is also by issuing another certificate for the CA with extended expiration without changing the private key, in practice this means deleting rootCa.crt and invoking cassandra-ca-manager again.
If the cluster is configured via PKI there is no action needed, the new certificate will be created by puppet automatically and Cassandra will pick it up and reload it (without the need of a restart).
Add a new host to a multi-instance cluster
Provisioning
Once a server is in the right state to be deployed to, first ensure that the correct DNS setup has been completed. RESTBase hosts for example have three additional addresses added as aliases for the system's main interface. These are usually created along with the host's base DNS record (restbaseNNNN) during setup via Netbox using the checkbox for Restbase hosts, but you should make sure that restbaseNNNN-a, restbaseNNNN-b and restbaseNNNN-c have all been created in DNS. Ideally these addresses are sequential from the base address but this is not required. These addresses should be specified in DNS and Netbox.
Once the host is configured correctly:
- Add host definitions in hiera - look at the existing hosts in the chosen datacenter to determine which rack the hosts should go into. Generally distribution between racks should be as even in number as possible. The
jbod_devices
parameter is dependent on the disk layout of the hosts in question. If you're not changing the layout elsewhere, it's generally okay to reuse the layout of previous hosts - verify this on the new host before reprovisioning it. - Add the host definitions to the relevant hierdata
- Add the hosts to the datacenter hierdata - these are used only for configuring the rate limiting service and its firewall in puppet and in Scap3.
- Add certificates for the host on the main puppetmaster's /srv/private repo. Search the commit history for previous restbase host additions for examples. Generate the new certificates from the cassandra certs directory by running
cassandra-ca-manager $cluster.yaml
. - Change the hosts' roles - by default hosts will be using
insetup
.
Example links for historical changes shown for context.
Generally it's a good idea to do the above one host at a time. Adding new hosts to the large Cassandra clusters can take a long time and so it's best to just proceed slowly one by one if you have many hosts to add at once rather than have to worry about juggling downtimes.
Cassandra setup
Once the host is fully provisioned, it can be added instance by instance (restbaseNNNN-a, then restbaseNNNN-b, then restbaseNNNN-c) to the Cassandra cluster. Topology changes are costly events in Cassandra and for this reason only ever add one node to the cluster at a time. To start bootstrapping the "a" instance, simply run sudo touch /etc/cassandra-a/service-enabled
and sudo run-puppet-agent
. This will start the respective cassandra instance and in time it will start bootstrapping itself. To monitor the progress of node a's updates, run cassandra-streams -nt nodetool-a
from the bootstrapping host or c-any-nt status -r| grep restbaseNNNN
from another host in the cluster. A host can be considered fully bootstrapped when the instance when c-any-nt status -r | grep restbaseNNNN
shows the node in status "UN". Hosts still in the process of joining will show status "UJ".
The process of adding a single node can take a long time - even beginning the bootstrap process can take upwards of an hour, and the process itself can take around 5-6 hours. For this reason, it is imperative that you manage your downtimes appropriately to prevent disruption (in short, use the sre.hosts.downtime
cookbook).
When waiting for the bootstrapping process to start, it is perfectly normal to see the message "Migration task failed to complete
" in the system logs for the instance in question. This can be ignored.
Other updates
Other systems that may require updating after adding and/or removing Cassandra nodes:
- scap deployment of operations/software/logstash-logback-encoder jars
- seed lists in client applications
- container egress rules
On-boarding
Schema creation
Schema should be version controlled in the repository of the service it corresponds to. Any subsequent schema changes should be kept in separate files containing ALTER
statements —one file per change. All schema files should be named so that a lexical sort by filename matches the order in which they are applied. For example:
$ ls
00_cassandra_schema.cql 10_users_dob.cql 20_users_email.cql
$ cat 00_cassandra_schema.cql
CREATE KEYSPACE IF NOT EXISTS db
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
CREATE TABLE db.users (
id uuid PRIMARY KEY,
given text,
surname text
);
$ cat 10_users_dob.cql
// Add a date-of-birth attribute to users table
ALTER TABLE db.users ADD dob date;
$ cat 20_users_email.cql
// Store email addressess for a user.
ALTER TABLE db.users ADD email set<text>;
$
To apply schema changes on the appropriate cluster, use the CQL shell (from any node in the cluster):
$ c-cqlsh a -f 00_cassandra_schema.c-cqlsh
Keyspace & table parameters
An unfortunate property of Cassandra's DDL is that it acts not only on schema, but also on the operational parameters of keyspaces and tables. For example:
CREATE KEYSPACE IF NOT EXISTS db
WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': 3, 'codfw': 3};
The namespace being created is db
, this is part of the schema. However, everything that appears after WITH
pertains to the replication of data, these are operational parameters, (and the same is true for table DDL statements). The former makes sense to be maintained with the service code it corresponds to, applications implement the schema, and any changes should be version controlled in lock-step. The latter however, amounts to cluster configuration.
Service owners, where possible, should omit these operational parameters from schema files (to avoid confusion). However, for keyspaces, specifying replication (as shown above) is required and cannot be avoided. Service owners can either default to SimpleStrategy
and replication_factor
of 1, or use whatever strategy is convenient for their test suites. In all cases, it is important that cluster operators (read: SRE) edit any local copies of schema to assign keyspace/table parameters as appropriate, or to issue any necessary ALTER...WITH
statements afterward.
Roles & Grants
Once the schema has been created (see Schema creation above), roles and grants can be added to allow access.
New Cassandra roles (users) are added as Puppet templates to operations/puppet/modules/cassandra/templates/users/
. Each is formatted as a list of CQL statements, there is one template file per role, named to match the role name, and each containing the GRANT
statements corresponding to that role. Adding the role name to the users
list of profile::cassandra::settings
(see hieradata/role/common/sessionstore.yaml for an example) will cause the file to be copied to the cluster as /etc/cassandra-{id}/user_{role_name}.cql
. Adding a password to the private repository on a puppetmaster will cause it to be templated with that password (hieradata/common/profile/cassandra.yaml
).
Once the template is in place, you can apply the changes using the CQL shell (from any node in the cluster):
$ c-cqlsh a -f /etc/cassandra-a/users_new_user.cql
Using cumin to execute commands on a per-rack basis
To issue commands against the hosts in a single rack using cumin and the profile::cassandra::rack
value:
$ sudo cumin 'P{P:cassandra%rack = "a"} and A:sessionstore and A:eqiad' 'c-foreach-nt cleanup'
Cassandra version upgrades
Cassandra is installed from upstream packages, imported into the Wikimedia APT repository, where per-version components are used. For example, Cassandra 4.1.5 would be installed from component/cassandra41
. For new/incoming versions of Cassandra we use component/cassandradev
. The basic steps to upgrade:
- Download the new packages (source and binary), and verify the signatures
- Import the new packages to
component/cassandradev
(see: Reprepro) - Update Puppet to set the
dev
target version mapping used to pin the package version (see:modules/cassandra/manifests/init.pp
) - Upgrade a host or cluster by setting the hiera value
profile::cassandra::settings:target_version
todev
- Once all clusters have been upgraded:
- Copy the packages from
component/cassandradev
tocomponent/cassandraNN
- Change all instances of
profile::cassandra::settings:target_version
toNx
(e.g.4x
,5x
, etc)
- Copy the packages from
- Rejoice
What to do if...
A corrupt commitlog segment prevents node startup
If Cassandra startup fails with log messages that indicate a corrupt commitlog segment (i.e. checksum failures, etc), it is generally acceptable to delete the offending file, and try again.
Monitoring
- Logs:
/var/log/cassandra/system.log
has very detailed information on what's going on.- Grep for 'GCInspector' to search for slow GCs
- There are a number of Grafana dashboards linked from the main dashboard - ensure you're looking at the right keyspace or cluster.
RESTBase (major Cassandra user)
- Grafana: RESTBase service dashboard (as distinct from the RESTBase Cassandra cluster)
- Logstash: RESTBase dashboard (indirect)
On going cassandra bootstrap/decommission
While performing a node bootstrap or decommission it is possible to "monitor" the transfers using cassandra-streams -nt nodetool-a
.
Alternatively its progress can be monitored with something like: (change nodetool-a with the "right" nodetool) watch -d -n300 "nodetool-a netstats -H | grep total"
Tools
cassandra-tools-wmf
c-cqlsh
Synopsis
c-cqlsh <id>
Description
Given an instance ID, executes cqlsh
on that instance. Uses the the cqlshrc
file located in the instance's configuration directory.
Example
$ c-cqlsh a
Connected to services-test at 10.64.0.202:9042.
[cqlsh 5.0.1 | Cassandra 2.2.6 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cassandra@cqlsh>
cqlsh-instance
Description
A convenience wrapper for cqlsh
. This is not meant to be invoked directly; Create a symlink (preferably somewhere on PATH
) in the form cqlsh-{id}
.
$ ln -s /usr/bin/cqlsh-instance /usr/local/bin/cqlsh-a
$ cqlsh-a
Connected to cassandra-dev at 10.192.16.14:9042
[cqlsh 6.1.0 | Cassandra 4.1.1 | CQL spec 3.4.6 | Native protocol v5]
Use HELP for help.
cassandra@cqlsh>
c-any-nt
Synopsis
c-any-nt <arg> [arg ...]
Description
In a WMF multi-instance environment, executes nodetool on a randomly chosen instance (any instance).
Example
$ c-any-nt status -r
...
c-foreach-nt
Synopsis
c-foreach-nt <arg> [arg ...]
Description
In a WMF multi-instance environment, iteratively execute nodetool on all local Cassandra instances.
Example
$ c-foreach-nt version
a: ReleaseVersion: 2.2.6
b: ReleaseVersion: 2.2.6
c-foreach-restart
Synopsis
usage: c-foreach-restart [-h] [-a ATTEMPTS] [-r RETRY] [--execute-post-shutdown CMD] [-d DELAY] [--logmsgbot LOGMSGBOT] [--tcpircbot-host HOST] [--tcpircbot-port PORT] [--phabricator-issue ISSUE] Cassandra instance restarter optional arguments: -h, --help show this help message and exit -a ATTEMPTS, --attempts ATTEMPTS Maximum number of times to check if service is up after restarting. -r RETRY, --retry RETRY Number seconds between connection attempts, in seconds. --execute-post-shutdown CMD Command to execute after Cassandra has been shutdown, and before it is started back up. -d DELAY, --delay DELAY Delay between instance restarts (defaults to no delay). --logmsgbot Log restarts to SAL (via logmsgbot and #wikimedia- operations). --tcpircbot-host HOST tcpircbot hostname. Only valid when --logmsgbot is used. Default: einsteinium.wikimedia.org --tcpircbot-port PORT tcpircbot port number. Only valid when --logmsgbot is used. Default: 9200 --phabricator-issue ISSUE Phabricator issue to associate these restarts with. This currently only makes sense in combination with --logmsgbot where it is included in the formatted log message.
Description
In a WMF multi-instance environment, iteratively restart instances.
Each instance is drained prior to being restarted. After each restart, the process blocks until the CQL port is listening.
Commands specified by --execute-post-shutdown
can include the {id}
template which will be substituted with the ID of the instance being restarted.
Example
$ sudo c-foreach-restart --execute-post-shutdown="echo '{id} is a teapot, short and stout'"
...
c-ls
Synopsis
c-ls
Description
List the IDs of the locally configured instance IDs.
Example
$ c-ls
a
b
c
streams
Synopsis
streams [-h] [-nt CMD] Cassandra streams monitor optional arguments: -h, --help show this help message and exit -nt CMD, --nodetool CMD Nodetool command to use
Description
Realtime console monitoring of streaming operations.
Example
$ streams -nt nodetool-b
uyaml
Synopsis
uyaml [-h] yaml path YAML parser positional arguments: yaml YAML file to parse path Path to parse from file optional arguments: -h, --help show this help message and exit
Description
Query attributes from a YAML file.
Example
$ uyaml /etc/cassandra-instances.d/restbase1007-a.yaml /jmx_port
7189
$ for i in `c-ls`; do uyaml /etc/cassandra-instances.d/restbase1007-$i.yaml /jmx_port; done
7189
7190
7191
cassandra-ca-manager
cassandra-ca-manager: manage Java keystores with a self-signed certificate authority
cdsh
cdsh: a Cassandra cluster wrapper for dsh
Uncommon admin tasks
Monitor compactions
nuria@aqs1004:~$ nodetool-a compactionstats -H pending tasks: 998 id compaction type keyspace table completed total unit progress 88386550-6f46-11e6-908d-b320c64aa5c2 Compaction local_group_default_T_pageviews_per_article_flat data 333.74 GB 339.6 GB bytes 98.27% 0ee1fe70-6f99-11e6-908d-b320c64aa5c2 Compaction local_group_default_T_pageviews_per_article_flat data 156.05 GB 336.07 GB bytes 46.43% e9461700-6ecb-11e6-908d-b320c64aa5c2 Compaction local_group_default_T_pageviews_per_article_flat data 710.8 GB 896.46 GB bytes 79.29% Active compaction remaining time : 0h24m46s
Bootstrap a brand new cluster
When bootstrapping a new cluster at least one of the seeds will need to have auto_bootstrap: false in /etc/cassandra/cassandra.yaml to prevent it from asking (and failing) to bootstrap from other seeds.
Chances are you are bootstrapping a new cluster from freshly-installed machines, thus cassandra would have run at least once and we need to remove its data files.
NOTE this procedure is meant to be run only on a brand new cluster and NOT ON EXISTING CLUSTERS
service cassandra stop rm -rf /var/lib/cassandra/* # this is needed only on one of the seeds grep -q 'auto_bootstrap: false' /etc/cassandra/cassandra.yaml || echo 'auto_bootstrap: false' >> /etc/cassandra/cassandra.yaml service cassandra start
once the node has started, you can stop/wipe/start the other seeds.
Switch to multiple cassandra instances per hardware node
As per the work in https://phabricator.wikimedia.org/T95253 and related, we have the capability of running multiple (up to four ATM) cassandra instances per hardware node. A single node can be converted in place from single instance to multiple instances by first decommissioning the node in cassandra, reimage and assign new IP/names for the instances in DNS and puppet.
- Allocate IP addresses in DNS for instance(s) (e.g. https://gerrit.wikimedia.org/r/#/c/249152/)
- Generate instance(s) TLS certificates (see palladium:/srv/private/modules/secret/secrets/cassandra)
- Also generate new material in labs/private.git for puppet compiler's benefit
- Stop cassandra from being restarted with systemctl mask cassandra
- Run nodetool decomission on the node to be converted
- (In parallel too) Add the new instance(s) to the list of known seeds (e.g. https://gerrit.wikimedia.org/r/#/c/250985/) so they are known to cassandra and allowed in ferm
- Reimage the machine
- Add one instance at a time to the reimaged machine (e.g. https://gerrit.wikimedia.org/r/#/c/250975/) once puppet runs it will provision the additional instance
Handling JBOD disk failures
Modern versions of Cassandra (those which include CASSANDRA-6696) are capable of using JBOD. A JBOD configuration is accomplished by simply supplying a list of paths as the data_file_directories
directive.
The Wikimedia convention is to mount the respective devices as /srv/{device}
(e.g. /srv/sda4
, /srv/sdb4
, etc), with data_file_directories
in the form of /srv/{device}/cassandra-{id}/data
. Common data should be placed somewhere with single device redundancy (e.g. RAID-10), mounted as /srv/cassandra/instance-data
Cluster Setup
Authentication
Authentication is enabled in cassandra.yaml by setting the authenticator and authorizer directives to PasswordAuthenticator
and CassandraAuthorizer
respectively. When so enabled, Cassandra will automatically create a system_auth keyspace at first start, and a default (super-)user named cassandra, with a password of cassandra. Replication should be set accordingly on this new keyspace, and a new super-user with a strong password created.
First, connect to the cluster using the default credentials and alter the keyspace:
$ cqlsh -u cassandra -p cassandra cassandra@cqlsh> ALTER KEYSPACE system_auth WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 3 };
Next, create a new super-user:
cassandra@cqlsh> CREATE USER IF NOT EXISTS admin WITH PASSWORD 'sup3rs3kr3t' SUPERUSER;
Users are not permitted to delete themselves, so exit the shell and reconnect as the new super-user:
cassandra@cqlsh> exit $ cqlsh -u admin -p sup3rs3kr3t
Delete the default super-user:
admin@cqlsh> DROP USER IF EXISTS cassandra;
For authenticated application access, create a normal user:
admin@cqlsh> CREATE USER IF NOT EXISTS restbase WITH PASSWORD 'moars3kr3t' NOSUPERUSER;
Replicating system_auth
Authentication and authorization information is maintained in the system_auth
keyspace, which by default uses SimpleStrategy
and a replication factor of 1. This is definitely not what you want; A single node failure can prevent you from accessing your database! Best practice is to configure a replication factor of 3-5 per data-center.
Please check Incident documentation/20170223-AQS before proceeding, increasing the replication of system_auth on a live cluster may lead to outages.
Time-window compaction strategy
See Cassandra/TimeWindowCompactionStrategy.
Prometheus JMX Exporter
See Cassandra/PrometheusJmxExporter.
See also
- Hardware and instance sizing
- Tuning
- Old Cassandra testing notes -- mostly about avoiding OOM at high write volumes