Service/Etcd
Etcd
Description
etcd (https://etcd.io/) is an open source key-value store with a focus on reliability that is used to store configuration and state data for distributed systems. At WMF we run a number of etcd clusters, this document addresses the two etcd Main clusters, one each installed in the primary datacenters, eqiad and codfw. A number of applications, including mediawiki read/write configuration store state data on etcd.
Categories
Relevant service categories (wiki categories) for grouping by similar services, owner, etc.
Service Type
Etcd is a foundational service
Service Dependenies
No hard dependencies beyond hardware and networking. It is worth pointing out that server hardware and networking have their own failure rates that are in the 99% range. Etcd as configured is able to deal with a certain type of failures in a local datacenter.
Related Services
Confd: a lightweight configuration management tool focused on keeping local configuration files up-to-date using data stored in etcd
Ownership
Etcd is owned by the Service Operations SRE team, which is responsible for all aspects including operation, scalability, backups and software updates.
Technology Department / Site Reliability Engineering / Service Operations
- Escalation points and Key contacts:
- Code: etcd.io - Service Operations
- Data persistence: Service Operations
- Release pipelines: Service Operations
- Metrics, logging, alerting, tracing: Service Operations
- Product: Service Operations
- Security: Service Operations
- Systems & Infrastructure: Service Operations
Supporting documentation and relevant information
- Design documents
- Operational documentation
- Phabricator component query links
- Netbox links
- LibreNMS
- Icinga
- Links to other relevant SRE Tooling™
- Links to Runbooks
- Related service request types
- Any supporting or underpinning services (e.g. dependencies)
- Who is entitled to request/view the service