SRE/Infrastructure naming conventions

From Wikitech

This page documents the naming conventions of servers, routers, and data center sites.

Our servers currently fall in broadly two categories:

  • Clustered servers: These use numeral sequences with a descriptive prefix (see #Networking and #Servers). For example: db1001.
  • Miscellaneous servers: These used unique hostnames (see #Miscellaneous servers). For example: helium. This naming convention is deprecated and not used for new hosts, but some older miscellaneous-named hosts still exist.

Name reuse

Historically, we did not reuse names of past servers for new servers. For example, after db1001 is decommissioned, no other server will be named db1001. Ganeti VMs sometimes reuse hostnames, but bare metal typically will not.

The notable exception is networking gear, which are deterministically specified by rack. For example the access switch in Eqiad rack A8 is named asw-a8-eqiad. If it is replaced, the new switch will take the same name.

All hardware in the datacenter space is tracked in Netbox, which can be used to check for existing hostnames for both hardware and ganeti instances.

Data centers

Data centers were traditionally named as vendor initials (at time of lease signing) followed by the IATA code for a nearby major airport. For example, for Eqiad data center the vendor is Equinix, and IAD the large nearby airport. This convention was used from 2003 upto 2023. Because vendors go through acquisitions, and original initials no longer apply after some time, starting with Magru in 2023, only the airport code is used along with a freely chosen prefix.[1]

DC Vendor (originally) Airport Code
codfw CyrusOne DFW
drmrs Digital Realty MRS
eqdfw Equinix DFW
eqiad Equinix IAD
eqord Equinix ORD
eqsin Equinix SIN
esams EvoSwitch AMS
knams Kennisnet AMS
ulsfo United Layer SFO

Networking

Naming for network equipment is based on role and location.

This also applies to: power distribution units, serial console servers, and other networking infrastructure.

Name prefix Role Example
asw access switch asw-a1-eqiad
cr core router cr1-eqiad
mr management router mr1-eqiad
lsw leaf switch lsw1-e1-eqiad
ssw spine switch ssw1-e1-eqiad
msw management switch msw1-eqiad & msw-b2-eqiad
pfw payments fire wall pfw1-eqiad
ps1 / ps2 power strips/distribution units ps1-b3-eqiad
scs serial console server scs-a8-eqiad
fasw Fundraising access switch fasw-c-codfw
cloudsw Cloud L3 switches cloudsw1-c8-eqiad

OpenStack deployments

[Datacenter Site][numeric identifier](optional dev suffix to indicate non-external non-customer facing deployments) - [r (if region)][letter for AZ]

  • Current Eqiad/Codfw deployments will not fully meet these standards until rebuilt: [eqiad0 (deployment), eqiad (region), nova (AZ)]
Deployment Region Availability Zone
eqiad0 eqiad0-r eqiad0-rb
eqiad1 eqiad1-r eqiad1-rb
codfw0dev codfw0dev-r codfw0dev-rb
codfw1dev codfw1dev-r codfw1dev-rb

Disks

  • Arrays must use the Storage array device role in Netbox.
  • Naming follows two conventions:
  • Array is attached to a single host:
  • hostname_of_host_system-arrayN
  • Example: ms2001-array1, ms2001-array2
  • all arrays get a number, even if there is only a single array.
  • Example: dataset1001-array1
  • Array is attached to multiple hosts
  • Labs uses this for labstore, each shelf connects to two different hosts. As such, the older single host naming scheme fails.
  • servicehostgroup-arrayN-site
  • Example: labstore-array1-codfw, labstore-array2-codfw

Kubernetes

Any cluster that is not the main wikikube cluster should use a consistent identifier and follow these conventions:

  • Control plane service name: <identifier>-ctrl
  • Ingress service name: <identifier>-ingress [-ro|-rw] for active/active or active/passive
  • Hostnames for control plane : <identifier>-ctrlXXXX.$site.wmnet
  • Hostnames for kubelets : <identifier>-workerXXXX.$site.wmnet

Servers

Datacenter numbering

Any system that runs in a dedicated services cluster with other machines will be named after their role/service task. As a rule, we attempt to name after the service, not just the software package. Also, servers within a group are numbered based on the datacenter they are located in.

Data center Numeral range Example
pmtpa / sdtpa (decommissioned) 1-999 cp7
eqiad 1000-1999 db1001
codfw 2000-2999 mw2187
esams / knams 3000-3999 cp3031
ulsfo 4000-4999 bast4001
eqsin 5000-5999 dns5001
drmrs 6000-6999 cp6011
magru 7000-7999 cp7001

When adding a new datacenter, make sure to update operations/puppet.git's /typos file which checks hostnames.

Hostname prefixes

The full list of hostname prefixes currently in use can be gathered from a cumin host (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet) with:

sudo cumin --no-color 'A:all' 2>/dev/null | nodeset -S '\n' -e | sed 's/\..*//g' | sed 's/[0-9]\{4\}//g' | sort | uniq

Be aware that hosts with dev in their name could have the dev part before or after the 4 digits number.

Name prefix Description Status Points of contact
acmechief ACME certificate manager In use Traffic
acmechief-test ACME certificate manager staging environment In use Traffic
alert Alerting host (Icinga / Alertmanager) In use Observability
amssq esams caching server No longer used (deprecated)
amslvs esams LVS No longer used (deprecated)
analytics analytics nodes (Hadoop, Hive, Impala, and various other things) Being replaced by an-worker Data Platform SREs
an-conf Analytics Hadoop cluster zookeeper quorum In use Data Platform SREs
an-coord Analytics Hadoop cluster coordination node (Presto and Hive) In use Data Platform SREs
an-db Data Platform Postgresql database cluster In use Data Platform SREs
an-druid Druid Cluster (Analytics) In use Data Platform SREs
an-launcher Analytics job scheduler node In use Data Platform SREs
an-master Analytics Hadoop cluster namenode In use Data Platform SREs
an-mariadb Data Platform mariadb databases (analytics_meta) In use Data Platform SREs
an-presto Analytics Presto cluster workers In use Data Platform SREs
an-redacteddb analytics dedicated mariadb servers with sanitized data, as per the wikireplicas In use Data Platform SREs
an-tool Analytics tools node (YARN UI, Turnilo In use Data Platform SREs
an-test-client Analytics Hadoop-test client (equivalent to stat servers, but for test cluster) In use Data Platform SREs
an-test-coord Analytics Hadoop-test cluster coordinator (Hive, Presto, MariaDB) In use Data Platform SREs
an-test-master Analytics Hadoop-test cluster namenodes In use Data Platform SREs
an-test-ui Analytics Hadoop-test YARN UI In use Data Platform SREs
an-test-worker Analytics Hadoop-test cluster workers In use Data Platform SREs
an-test-druid Analytics Druid-test worker In use Data Platform SREs
an-test-presto Analytics Presto-test worker In use Data Platform SREs
an-web Analytics webserver (wikistats, published datasets
an-worker Analytics Hadoop cluster workers In use, replacing analyticsNNNN Data Platform SREs
an-airflow Airflow instances provided to client teams by Data Platform Engineering Being migrated to dse-k8s Data Platform SREs
aphlict notification server for Phabricator In use Service Operations
apt Advanced Package Tool Repository (Debian APT repo) In use Infrastructure Foundations
aqs Cassandra cluster for Analytics Query Service (+others) In use Data Persistence
archiva Archiva Artifact Repository Being decommissioned Data Platform SREs
auth Authentication server In use Infrastructure Foundations
authdns Authoritative DNS (gdsnd) In use Traffic
backup Backup hosts In use Data Persistence
backupmon Backup monitoring hosts In use Data Persistence
bast bastion host In use Infrastructure Foundations
censorship Censorship monitoring databases and scripts No longer used (deprecated)
centrallog Centralized syslog In use Observability
cephosd Ceph servers for use with Data Engineering and similar storage requirements In use Data Platform SREs
certcentral Central certificates service No longer used (deprecated)
chartmuseum Helm Chart repository ChartMuseum In use Service Operations
cloud*-dev Any cloud role + '-dev' = internal deployment (PoC, Staging, etc) In use WMCS
cloudbackup Backup storage system for WMCS In use WMCS
cloudcephmon Ceph monitor and manager daemon for WMCS In use WMCS
cloudcephosd Ceph object storage data nodes for WMCS In use WMCS
cloudceph Converged Ceph object storage and monitor nodes for WMCS (only used for testing) No longer used
cloudcontrol OpenStack deployment controller for WMCS In use WMCS
clouddb Wiki replica servers for WMCS In use WMCS, with support from DBAs and Data Platform SREs
cloudelastic Replication of ElasticSearch for WMCS In use WMCS
cloudgw Cloud gateway server for WMCS In use WMCS
cloudmetrics Monitoring server for WMCS In use WMCS
cloudnet Network gateway for tenants of WMCS (Neutron l3) In use WMCS
cloudservices Misc OpenStack components (Designate) for WMCS In use WMCS
cloudstore Storage system for WMCS In use WMCS
cloudvirt OpenStack Hypervisor (libvirtd + KVM) for WMCS In use WMCS
cloudvirtan OpenStack Hypervisor (libvirtd + KVM) for WMCS (dedicated to Analytics) No longer used
cloudvirt-wqds OpenStack Hypervisor (libvirtd + KVM) for WMCS (dedicated to WDQS) WMCS
cloudweb WMCS management websites (wikitech, horizon, striker) In use WMCS
conf Configuration system host (etcd, zookeeper...) In use Service Operations
config-master host running the config-master site In use Infrastructure Foundations
contint Continuous Integration In use Service Operations
cp Cache proxy (Varnish) In use Traffic
cumin Cluster management (cumin/spicerack/debdeploy/etc...) In use Infrastructure Foundations
datahubsearch DataHub OpenSearch Cluster - used for DataHub In use Data Platform SREs
dataset dataset dumps storage No longer used (deprecated)
db Database host In use Data Persistence
dbmonitor Database monitoring In use Data Persistence
dborch Database orchestration (MySQL Orchestrator) In use Data Persistence
dbprov Database backup generation and data provisioning In use Data Persistence
dbproxy Database proxy In use Data Persistence
dbstore Analytics private mediawiki database replicas In use Data Platform SREs & Data Persistence
debmonitor Debian packages monitoring In use Infrastructure Foundations
deploy Deployment hosts In use Service Operations
dns DNS recursors In use Infrastructure Foundations
doc Documentation server (CI) In use Service Operations (Supportive Services) & Release Engineering
doh Wikidough Anycasted In use Traffic
druid Druid Cluster (Public) In use Data Platform SREs
dse-k8s-etcd etcd server for the kubernetes cluster of Data Science and Engineering In use Data Platform SREs
dse-k8s-ctrl control plane server for the kubernetes cluster of Data Science and Engineering In use Data Platform SREs
dse-k8s-worker worker node for the kubernetes cluster of Data Science and Engineering In use Data Platform SREs
dumpsdata dataset generation fileset serving to snapshot hosts In use Data Platform SREs
durum Check service for Wikidough In use Traffic
elastic elasticsearch servers In use Data Platform SREs
es Database host for MediaWiki external storage (wiki content, compressed) In use Data Persistence
etcd Etcd server In use Service Operations
etherpad Etherpad server In use Service Operations
eventlog EventLogging host In use Data Platform SREs
flink-zk Dedicated zookeeper cluster for Flink in use Data Platform SREs
flowspec Network controller In use (testing) Infrastructure Foundations
fr* Fundraising servers, e.g. frdb, frlog, frpm (puppetmaster) In use fr-tech SREs
ganeti Ganeti Virtualization Cluster In use Infrastructure Foundations
ganeti-test Ganeti Virtualization Cluster (test setup) in use Infrastructure Foundations
gerrit Gerrit code review (gerrit1001 in eqiad is currently used) In use (deprecated) Service Operations & Release Engineering
gitlab Gitlab servers In use (phab:T274459) Service Operations
grafana Grafana server In use Observability
graphite Graphite server In use Observability
icinga Icinga servers In use Observability
idm Identity manager (Bitu) In use Infrastructure Foundations
idp Identity provider (Apereo CAS) In use Infrastructure Foundations
install Installation server In use Infrastructure Foundations
kafka Kafka brokers No longer used
kafka-main Kafka brokers In use Infrastructure Foundations
kafka-jumbo Large general purpose Kafka cluster In use Data Platform SREs & Infrastructure Foundations
kafka-logging Logging/o11y Kafka cluster In use Observability
kafkamon Kafka monitoring (VMs) In use Data Platform SREs & Infrastructure Foundations
karapace DataHub Schema Registry server (standalone) - Used for DataHub In use Data Platform SREs
knsq knams squid No longer used (deprecated)
krb Kerberos KDC/Kadmin In use Infrastructure Foundations & Data Platform SREs
kubernetes Kubernetes cluster (k8s) In use Service Operations
kubestage Kubernetes staging cluster In use Service Operations
kubestagetcd Etcd cluster for the Kubernetes staging cluster In use Service Operations
kubetcd Etcd cluster for the Kubernetes cluster In use Service Operations
lab labs virtual node No longer used (deprecated)
labcontrol Controller node for WMCS (aka "labs") No longer used (deprecated)
labnet Networking host for WMCS No longer used (deprecated)
labnodepool Dedicated WMCS host for Nodepool (CI) No longer used (deprecated)
labpuppetmaster Puppetmasters for WMCS No longer used (deprecated)
labsdb Replication of production databases for WMCS No longer used (deprecated)
labservices Services for WMCS No longer used (deprecated)
labstore Disk storage for WMCS In use (deprecated) WMCS
labtest* Test hosts for WMCS No longer used (deprecated)
labvirt Virtualization node for WMCS No longer used (deprecated)
labweb Management websites for WMCS No longer used (deprecated)
lists Mailing lists running Mailman In use Legoktm and Ladsgroup
logging-hd Logging Cluster - OpenSearch data node (hdd class) In Use Observability
logging-sd Logging Cluster - OpenSearch data node (ssd class) Planned Observability
logging-fe Logging Cluster - OpenSearch/OpenSearch-Dashboards/Logstash node Planned Observability
logstash opensearch/logstash/opensearch-dashboards node In use Observability
lvs lvs load balancer In use Traffic
maps Maps cluster In use Content Transform Team and hnowlan
maps-test maps test cluster No longer used (deprecated)
matomo Matomo analytics serer (formerly named Piwik) In use Data Platform SREs
mc memcached server for mediawiki In use Service Operations
mc-gp memcached gutter pool server for mediawiki In use Service Operations
mc-wf memcached servers for wikifunctions In use Service Operations
ml-staging Machine learning stanging env etcd and control plane machines In use ML team
ml-serve Machine learning serving cluster (ml-serve-ctrl* are VMs for k8s control plane) In use ML team
ml-cache Machine leaning caching nodes In use ML team
mirror public mirror, e.g. Debian mirror, Ubuntu mirror In use Infrastructure Foundations
miscweb miscellaneous web server In use Service Operations
ms media storage No longer used (deprecated) Data Persistence (Media Storage)
ms-backup media storage backup generation (workers) In use Data Persistence (Media Storage)
ms-be media storage backend In use Data Persistence (Media Storage)
ms-fe media storage frontend In use Data Persistence (Media Storage)
mutual-os Mutualized (shared) opensearch cluster planning Data Platform SRE
mw MediaWiki application server (MediaWiki PHP webservers, api, jobrunners, videoscalers) In use Service Operations
mwdebug MediaWiki application server for debugging and deployment staging (Ganeti VMs) In use Service Operations
mwlog MediaWiki logging host In use Service Operations
mwmaint MediaWiki maintenance host (formerly "terbium") In use Service Operations
mx Mail relays In use Infrastructure Foundations
mx-out Outbound mail relays In use Infrastructure Foundations
mx-in Inbound mail relays In use Infrastructure Foundations
nas NAS boxes (NetApp) Unused
netflow Network visibility In use Infrastructure Foundations
netmon Network monitor (librenms, rancid, etc) In use Infrastructure Foundations
netbox Netbox front-end instances In use Infrastructure Foundations
netbox-dev Netbox test instances In use Infrastructure Foundations
netboxdb Netbox back-end database instances In use Infrastructure Foundations
nfs NFS server Unused
peek Security Team workflow and project management tooling In use Security Team
ocg offline content generator (PDF) No longer used (deprecated)
ores ORES cluster In use Machine Learning SREs
orespoolcounter ORES PoolCounter In use Machine Learning SREs
oresrdb ORES Redis systems No longer used (deprecated)
pc Parser cache database In use SRE Data Persistence (DBAs), with support from Platform and Performance
pdf PDF Collections No longer used (deprecated)
people peopleweb (people.wikimedia.org) In use Service Operations & Infrastructure Foundations
parse parsoid Soon to be no longer used (deprecated) Service Operations
parsoidtest parsoid Soon to be used Service Operations
phab Phabricator host (currently iridium is eqiad phab host) In use Service Operations
ping Ping offload server In use Infrastructure Foundations
planet Planet server In use (mistake) Service Operations
pki PKI Server (CFSSL) In use Infrastructure Foundations
pki-root PKI Root CA Server (CFSSL) In use Infrastructure Foundations
poolcounter PoolCounter cluster In use Service Operations
prometheus Prometheus cluster In use Observability
proton Proton cluster No longer used (deprecated)
puppetboard PuppetDB Web UI In use Service Operations
puppetdb PuppetDB cluster In use Service Operations
puppetmaster Puppet masters In use Infrastructure Foundations
puppetserver Puppet Servers In use Infrastructure Foundations
pybal-test PyBal testing and development In use Traffic
rbf Redis Bloom Filter server Unused
rcs Obsolete:RCStream server (recent changes stream) No longer used (deprecated)
rdb Redis server In use Service Operations
registry Docker registries In use Service Operations
releases Software Releases In use Service Operations
relforge Discovery's Relevance Forge (see discovery/relevanceForge.git, T131184) In use Search Platform SREs
restbase Cassandra cluster for RESTBase service (+others) In use Data Persistence
rpki RPKI#Validation In use Infrastructure Foundations
sca Service Cluster A - Includes various services No longer used (deprecated)
scb Service Cluster B - Includes various services. It's effectively the next generation of the sca cluster above No longer used (deprecated)
schema Event Schemas HTTP server In use Data Platform SREs & Service Operations
search-loader Analytics to Elastic Search model data loader In use Search Platform SREs
sessionstore Cassandra cluster for sessionstore In use Data Persistence
snapshot Data dump processing node In use Data Platform SREs
sq squid server No longer used (deprecated)
srv apache server No longer used (deprecated)
stat statistics computation hosts (see Analytics/Data access) In use Data Platform SREs
storage storage host No longer used (deprecated)
stewards special hosts for wiki stewards (see T344164) In use SRE collaboration services
testreduce parsoid visual diff testing In use Service Operations
thanos-be Prometheus long term storage backend (swift storage) In use Observability / Data Persistence
thanos-fe Prometheus long term storage frontend (swift proxy software) In use Observability / Data Persistence
thumbor Thumbor In use Service Operations (& Performance)
titan Thanos frontends In use Observability
tmh MediaWiki videoscaler (TimedMediaHandler). See T105009 and T115950. No longer used (deprecated)
torrelay Tor relay No longer used (deprecated)
urldownloader url-downloader In use (added in T224551) Service Operations
virt labs virtualization nodes No longer used (deprecated)
wcqs wikicommons query service In use Search Platform SREs
wdqs wikidata query service In use Search Platform SREs
webperf webperf metrics (performance team). See T179036. In use Performance & Service Operations
wtp wiki-text processor node (parsoid) In use Service Operations
xhgui A graphical interface for PHP debug profiles. See Performance/Runbook/XHGui service. In use Performance & Service Operations
dragonfly-supernode Supernode for Dragonfly P2P network (distributing docker images) (T286054) In use Service Operations

Miscellaneous servers

Historically, we used per-datacenter naming schemes for any one-off or single host. This included any software that wasn't load balanced across multiple machines, or general task machines that could cluster (to an extent) but required opsen work to do so.

Instead of being named for their purpose, these hosts were named according to a naming convention for their datacenter:

  • Hosts in eqiad were named for chemical elements, in order of increasing atomic number.
  • Hosts in codfw were named for stars. Stars in the Orion constellation were reserved for fundraising (Alnilam, Alnitak, Bellatrix, Betelgeuse, Heka, Meissa, Mintaka, Nair Al Saif, Rigel, Saiph, Tabit, Thabit).
  • Hosts in esams or knams were named for notable Dutch people.

These naming schemes are deprecated in favour of specialized cluster names above. Even if you're certain that the foobar service will only ever use a single host, you should name that host "foobar1001" (or 2001, 3001, etc. as appropriate to the datacenter).

One-off names were easy to come up with—especially for machines that did more than one kind of thing, where it's hard to identify a single descriptive name—but they were also opaque. Engineers had to know that the eqiad MediaWiki maintenance host was "terbium" and the codfw package-build host was "deneb." Naming these machines "mwmaint1001" and "build2001" is easier for sleepy oncallers to remember in an emergency, and friendlier to new hires who have to learn all the names at once.

Some older hosts in production still use these naming schemes, but new hosts should not use them.

  1. P&T Weekly Status Updates: 2023-12-04, Wikimeida Foundation, Google Docs (restricted)