SRE/Infrastructure naming conventions
This page documents the naming conventions of servers, routers, and data center sites.
Our servers currently fall in broadly two categories:
- Clustered servers: These use numeral sequences with a descriptive prefix (see #Networking and #Servers). For example: db1001.
- Miscellaneous servers: These used unique hostnames (see #Miscellaneous servers). For example: helium. This naming convention is deprecated and not used for new hosts, but some older miscellaneous-named hosts still exist.
Name reuse
Historically, we did not reuse names of past servers for new servers. For example, after db1001 is decommissioned, no other server will be named db1001. Ganeti VMs sometimes reuse hostnames, but bare metal typically will not.
The notable exception is networking gear, which are deterministically specified by rack. For example the access switch in Eqiad rack A8 is named asw-a8-eqiad. If it is replaced, the new switch will take the same name.
All hardware in the datacenter space is tracked in Netbox, which can be used to check for existing hostnames for both hardware and ganeti instances.
Data centers
Data centers are named as vendor initials (at time of lease signing) followed by the IATA code for a nearby major airport.
For example: our Dallas site is named codfw. The vendor is CyrusOne, and DFW being the large nearby airport. (Technically, Love Field airport is closer but less well-known.)
DC | Vendor | Airport Code |
---|---|---|
codfw | CyrusOne | DFW |
drmrs | Digital Realty | MRS |
eqdfw | Equinix | DFW |
eqiad | Equinix | IAD |
eqord | Equinix | ORD |
eqsin | Equinix | SIN |
esams | EvoSwitch | AMS |
knams | Kennisnet | AMS |
ulsfo | United Layer | SFO |
Networking
Naming for network equipment is based on role and location.
This also applies to: power distribution units, serial console servers, and other networking infrastructure.
Name prefix | Role | Example |
---|---|---|
asw | access switch | asw-a1-eqiad |
cr | core router | cr1-eqiad |
mr | management router | mr1-eqiad |
lsw | leaf switch | lsw1-e1-eqiad |
ssw | spine switch | ssw1-e1-eqiad |
msw | management switch | msw1-eqiad & msw-b2-eqiad |
pfw | payments fire wall | pfw1-eqiad |
ps1 / ps2 | power strips/distribution units | ps1-b3-eqiad |
scs | serial console server | scs-a8-eqiad |
fasw | Fundraising access switch | fasw-c-codfw |
cloudsw | Cloud L3 switches | cloudsw1-c8-eqiad |
OpenStack deployments
[Datacenter Site][numeric identifier](optional dev suffix to indicate non-external non-customer facing deployments) - [r (if region)][letter for AZ]
- Current Eqiad/Codfw deployments will not fully meet these standards until rebuilt: [eqiad0 (deployment), eqiad (region), nova (AZ)]
Deployment | Region | Availability Zone |
---|---|---|
eqiad0 | eqiad0-r | eqiad0-rb |
eqiad1 | eqiad1-r | eqiad1-rb |
codfw0dev | codfw0dev-r | codfw0dev-rb |
codfw1dev | codfw1dev-r | codfw1dev-rb |
Disks
- Arrays must use the
Storage array
device role in Netbox. - Naming follows two conventions:
- Array is attached to a single host:
- hostname_of_host_system-arrayN
- Example: ms2001-array1, ms2001-array2
- all arrays get a number, even if there is only a single array.
- Example: dataset1001-array1
- Array is attached to multiple hosts
- Labs uses this for labstore, each shelf connects to two different hosts. As such, the older single host naming scheme fails.
- servicehostgroup-arrayN-site
- Example: labstore-array1-codfw, labstore-array2-codfw
Kubernetes
Any cluster that is not the main wikikube cluster should follow these conventions:
- Cluster name: <identifier>-k8s (ex: dse-k8s, aux-k8s)
- Control plane service name: <identifier>-k8s-ctrl
- Ingress service name: <identifier>-k8s-ingress [-ro|-rw] for active/active or active/passive
- Hostnames for control plane : <identifier>-k8s-ctrlXXXX.$site.wmnet
- Hostnames for kubelets : <identifier>-k8s-workerXXXX.$site.wmnet
Servers
Any system that runs in a dedicated services cluster with other machines will be named after their role/service task. As a rule, we attempt to name after the service, not just the software package. Also, servers within a group are numbered based on the datacenter they are located in.
Data center | Numeral range | Example |
---|---|---|
pmtpa / sdtpa (decommissioned) | 1-999 | cp7 |
eqiad | 1000-1999 | db1001 |
codfw | 2000-2999 | mw2187 |
esams / knams | 3000-3999 | cp3031 |
ulsfo | 4000-4999 | bast4001 |
eqsin | 5000-5999 | dns5001 |
drmrs | 6000-6999 | cp6011 |
When adding a new datacenter, make sure to update operations/puppet.git
's /typos
file which checks hostnames.
Name prefix | Description | Status | Points of contact |
---|---|---|---|
acmechief | ACME certificate manager | In use | Traffic |
acmechief-test | ACME certificate manager staging environment | In use | Traffic |
alert | Alerting host (Icinga / Alertmanager) | In use | Observability |
amssq | esams caching server | No longer used (deprecated) | |
amslvs | esams LVS | No longer used (deprecated) | |
analytics | analytics nodes (Hadoop, Hive, Impala, and various other things) | Being replaced by an-worker | Data Engineering SREs |
analytics-master | analytics master nodes | Being replaced by an-master | Data Engineering SREs |
analytics-tool | virtual machines in production (Ganeti) running analytics tools/websites | Being replaced by an-tool | Data Engineering SREs |
an-coord | analytics coordination node | In use | Data Engineering SREs |
an-db | analytics postgresql database cluster | In use | Data Engineering SREs |
an-master | analytics master node | In use, replacing analytics-master | Data Engineering SREs |
an-mariadb | analytics-meta mariadb databases | In use | Data Engineering SREs |
an-tool | analytics tools node | In use | Data Engineering SREs |
an-test-(coord/master/worker) | analytics hadoop test cluster nodes | In use | Data Engineering SREs |
an-worker | analytics worker node | In use, replacing analyticsNNNN | Data Engineering SREs |
an-scheduler | analytics job scheduler node | In use | Data Engineering SREs |
an-airflow | analytics job scheduler node dedicated to the Discovery team | In use | Data Engineering SREs |
aphlict | notification server for Phabricator | In use | Service Operations |
apt | Advanced Package Tool Repository (Debian APT repo) | In use | Infrastructure Foundations |
aqs | Analytics Query Service | In use | Data Engineering SREs |
archiva | Archiva Artifact Repository | In use | Data Engineering SREs |
auth | Authentication server | In use | Infrastructure Foundations |
authdns | Authoritative DNS (gdsnd) | In use | Traffic |
backup | Backup hosts | In use | Data Persistence |
backupmon | Backup monitoring hosts | In use | Data Persistence |
bast | bastion host | In use | Infrastructure Foundations |
censorship | Censorship monitoring databases and scripts | No longer used (deprecated) | |
centrallog | Centralized syslog | In use | Observability |
cephosd | Ceph servers for use with Data Engineering and similar storage requirements | In use | Data Engineering SREs |
certcentral | Central certificates service | No longer used (deprecated) | |
chartmuseum | Helm Chart repository ChartMuseum | In use | Service Operations |
cloud*-dev | Any cloud role + '-dev' = internal deployment (PoC, Staging, etc) | In use | WMCS |
cloudbackup | Backup storage system for WMCS | In use | WMCS |
cloudcephmon | Ceph monitor and manager daemon for WMCS | In use | WMCS |
cloudcephosd | Ceph object storage data nodes for WMCS | In use | WMCS |
cloudceph | Converged Ceph object storage and monitor nodes for WMCS (only used for testing) | No longer used | |
cloudcontrol | OpenStack deployment controller for WMCS | In use | WMCS |
clouddb | Wiki replica servers for WMCS | In use | WMCS, with support from DBAs |
cloudelastic | Replication of ElasticSearch for WMCS | In use | WMCS |
cloudgw | Cloud gateway server for WMCS | In use | WMCS |
cloudmetrics | Monitoring server for WMCS | In use | WMCS |
cloudnet | Network gateway for tenants of WMCS (Neutron l3) | In use | WMCS |
cloudservices | Misc OpenStack components (Designate) for WMCS | In use | WMCS |
cloudstore | Storage system for WMCS | In use | WMCS |
cloudvirt | OpenStack Hypervisor (libvirtd + KVM) for WMCS | In use | WMCS |
cloudvirtan | OpenStack Hypervisor (libvirtd + KVM) for WMCS (dedicated to Analytics) | No longer used | |
cloudvirt-wqds | OpenStack Hypervisor (libvirtd + KVM) for WMCS (dedicated to WDQS) | WMCS | |
cloudweb | WMCS management websites (wikitech, horizon, striker) | In use | WMCS |
conf | Configuration system host (etcd, zookeeper...) | In use | Service Operations |
config-master | host running the config-master site | In use | Infrastructure Foundations |
contint | Continuous Integration | In use | Service Operations |
cp | Cache proxy (Varnish) | In use | Traffic |
cumin | Cluster management (cumin/spicerack/debdeploy/etc...) | In use | Infrastructure Foundations |
datahubsearch | DataHub OpenSearch Cluster - used for Data Catalog MVP | In use | Data Engineering SREs |
dataset | dataset dumps storage | No longer used (deprecated) | |
db | Database host | In use | Data Persistence |
dbmonitor | Database monitoring | In use | Data Persistence |
dborch | Database orchestration (MySQL Orchestrator) | In use | Data Persistence |
dbprov | Database backup generation and data provisioning | In use | Data Persistence |
dbproxy | Database proxy | In use | Data Persistence |
dbstore | Database analytics | In use | Data Engineering SREs & Data Persistence |
debmonitor | Debian packages monitoring | In use | Infrastructure Foundations |
deploy | Deployment hosts | In use | Service Operations |
dns | DNS recursors | In use | Infrastructure Foundations |
doc | Documentation server (CI) | In use | Service Operations (Supportive Services) & Release Engineering |
doh | Wikidough Anycasted | In use | Traffic |
an-druid | Druid Cluster (Analytics). Due to naming legacy, druid100[1-3] are also in this cluster. | In use | Data Engineering SREs |
druid | Druid Cluster (Public) | In use | Data Engineering SREs |
dse-k8s-etcd | etcd server for the kubernetes cluster of Data Science and Engineering | In use | Data Engineering SREs |
dse-k8s-ctrl | control plane server for the kubernetes cluster of Data Science and Engineering | In use | Data Engineering SREs |
dse-k8s-worker | worker node for the kubernetes cluster of Data Science and Engineering | In use | Data Engineering SREs |
dumpsdata | dataset generation fileset serving to snapshot hosts | In use | Platform Engineering |
durum | Check service for Wikidough | In use | Traffic |
elastic | elasticsearch servers | In use | Search Platform SREs |
es | Database host for MediaWiki external storage (wiki content, compressed) | In use | Data Persistence |
etcd | Etcd server | In use | Service Operations |
etherpad | Etherpad server | In use | Service Operations |
eventlog | EventLogging host | In use | Data Engineering SREs |
flink-zk | Dedicated zookeeper cluster for Flink | in use (testing) | Data Platform SREs |
flowspec | Network controller | In use (testing) | Infrastructure Foundations |
fr* | Fundraising servers, e.g. frdb, frlog, frpm (puppetmaster) | In use | fr-tech SREs |
ganeti | Ganeti Virtualization Cluster | In use | Infrastructure Foundations |
ganeti-test | Ganeti Virtualization Cluster (test setup) | in use | Infrastructure Foundations |
gerrit | Gerrit code review (gerrit1001 in eqiad is currently used) | In use (deprecated) | Service Operations & Release Engineering |
gitlab | Gitlab servers | In use (phab:T274459) | Service Operations |
grafana | Grafana server | In use | Observability |
graphite | Graphite server | In use | Observability |
icinga | Icinga servers | In use | Observability |
idp | Identity provider (Apereo CAS) | In use | Infrastructure Foundations |
install | Installation server | In use | Infrastructure Foundations |
kafka | Kafka brokers | No longer used | Data Engineering SREs & Infrastructure Foundations |
kafka-main | Kafka brokers | In use | Data Engineering SREs & Infrastructure Foundations |
kafka-jumbo | Large general purpose Kafka cluster | In use | Data Engineering SREs & Infrastructure Foundations |
kafka-logging | Logging/o11y Kafka cluster | In use | Observability |
kafkamon | Kafka monitoring (VMs) | In use | Data Engineering SREs & Infrastructure Foundations |
karapace | DataHub Schema Registry server (standalone) - Used for the Data Catalog MVP | In use | Data Engineering SREs |
knsq | knams squid | No longer used (deprecated) | |
krb | Kerberos KDC/Kadmin | In use | Infrastructure Foundations & Data Engineering SREs |
kubernetes | Kubernetes cluster (k8s) | In use | Service Operations |
kubestage | Kubernetes staging cluster | In use | Service Operations |
kubestagetcd | Etcd cluster for the Kubernetes staging cluster | In use | Service Operations |
kubetcd | Etcd cluster for the Kubernetes cluster | In use | Service Operations |
lab | labs virtual node | No longer used (deprecated) | |
labcontrol | Controller node for WMCS (aka "labs") | No longer used (deprecated) | |
labnet | Networking host for WMCS | No longer used (deprecated) | |
labnodepool | Dedicated WMCS host for Nodepool (CI) | No longer used (deprecated) | |
labpuppetmaster | Puppetmasters for WMCS | No longer used (deprecated) | |
labsdb | Replication of production databases for WMCS | No longer used (deprecated) | |
labservices | Services for WMCS | No longer used (deprecated) | |
labstore | Disk storage for WMCS | In use (deprecated) | WMCS |
labtest* | Test hosts for WMCS | No longer used (deprecated) | |
labvirt | Virtualization node for WMCS | No longer used (deprecated) | |
labweb | Management websites for WMCS | No longer used (deprecated) | |
lists | Mailing lists running Mailman | In use | Legoktm and Ladsgroup |
logging-hd | Logging Cluster - OpenSearch data node (hdd class) | Planned | Observability |
logging-sd | Logging Cluster - OpenSearch data node (ssd class) | Planned | Observability |
logging-fe | Logging Cluster - OpenSearch/OpenSearch-Dashboards/Logstash node | Planned | Observability |
logstash | opensearch/logstash/opensearch-dashboards node | In use | Observability |
lvs | lvs load balancer | In use | Traffic |
maps | Maps cluster | In use | Content Transform Team and hnowlan |
maps-test | maps test cluster | No longer used (deprecated) | |
mc | memcached server for mediawiki | In use | Service Operations |
mc-gp | memcached gutter pool server for mediawiki | In use | Service Operations |
mc-wf | memcached servers for wikifunctions | In use | Service Operations |
ml-staging | Machine learning stanging env etcd and control plane machines | In use | ML team |
ml-serve | Machine learning serving cluster (ml-serve-ctrl* are VMs for k8s control plane) | In use | ML team |
ml-cache | Machine leaning caching nodes | In use | ML team |
mirror | public mirror, e.g. Debian mirror, Ubuntu mirror | In use | Infrastructure Foundations |
miscweb | miscellaneous web server | In use | Service Operations |
ms | media storage | No longer used (deprecated) | Data Persistence (Media Storage) |
ms-backup | media storage backup generation (workers) | In use | Data Persistence (Media Storage) |
ms-be | media storage backend | In use | Data Persistence (Media Storage) |
ms-fe | media storage frontend | In use | Data Persistence (Media Storage) |
mw | MediaWiki application server (MediaWiki PHP webservers, api, jobrunners, videoscalers) | In use | Service Operations |
mwdebug | MediaWiki application server for debugging and deployment staging (Ganeti VMs) | In use | Service Operations |
mwlog | MediaWiki logging host | In use | Service Operations |
mwmaint | MediaWiki maintenance host (formerly "terbium") | In use | Service Operations |
mx | Mail relays | In use | Infrastructure Foundations |
nas | NAS boxes (NetApp) | Unused | |
netflow | Network visibility | In use | Infrastructure Foundations |
netmon | Network monitor (librenms, rancid, etc) | In use | Infrastructure Foundations |
netbox | Netbox front-end instances | In use | Infrastructure Foundations |
netbox-dev | Netbox test instances | In use | Infrastructure Foundations |
netboxdb | Netbox back-end database instances | In use | Infrastructure Foundations |
notebook | Jupyterhub experimental server | Unused | |
nfs | NFS server | Unused | |
peek | Security Team workflow and project management tooling | In use | Security Team |
ocg | offline content generator (PDF) | No longer used (deprecated) | |
ores | ORES cluster | In use | Machine Learning SREs |
orespoolcounter | ORES PoolCounter | In use | Machine Learning SREs |
oresrdb | ORES Redis systems | No longer used (deprecated) | |
pc | Parser cache database | In use | SRE Data Persistence (DBAs), with support from Platform and Performance |
PDF Collections | No longer used (deprecated) | ||
people | peopleweb (people.wikimedia.org) | In use | Service Operations & Infrastructure Foundations |
parse | parsoid | Soon in use | Service Operations |
phab | Phabricator host (currently iridium is eqiad phab host) | In use | Service Operations |
ping | Ping offload server | In use | Infrastructure Foundations |
planet | Planet server | In use (mistake) | Service Operations |
pki | PKI Server (CFSSL) | In use | Infrastructure Foundations |
pki-root | PKI Root CA Server (CFSSL) | In use | Infrastructure Foundations |
poolcounter | PoolCounter cluster | In use | Service Operations |
prometheus | Prometheus cluster | In use | Observability |
proton | Proton cluster | No longer used (deprecated) | |
puppetboard | PuppetDB Web UI | In use | Service Operations |
puppetdb | PuppetDB cluster | In use | Service Operations |
puppetmaster | Puppet masters | In use | Infrastructure Foundations |
puppetserver | Puppet Servers | In use | Infrastructure Foundations |
pybal-test | PyBal testing and development | In use | Traffic |
rbf | Redis Bloom Filter server | Unused | |
rcs | Obsolete:RCStream server (recent changes stream) | No longer used (deprecated) | |
rdb | Redis server | In use | Service Operations |
registry | Docker registries | In use | Service Operations |
releases | Software Releases | In use | Service Operations |
relforge | Discovery's Relevance Forge (see discovery/relevanceForge.git, T131184) | In use | Search Platform SREs |
restbase | RESTBase server | In use | Service Operations |
rpki | RPKI#Validation | In use | Infrastructure Foundations |
sca | Service Cluster A - Includes various services | No longer used (deprecated) | |
scb | Service Cluster B - Includes various services. It's effectively the next generation of the sca cluster above | No longer used (deprecated) | |
schema | Event Schemas HTTP server | In use | Data Engineering SREs & Service Operations |
search-loader | Analytics to Elastic Search model data loader | In use | Search Platform SREs |
sessionstore | Cassandra cluster for sessionstore | In use | Data Persistence |
snapshot | Data dump processing node | In use | Platform Engineering |
sq | squid server | No longer used (deprecated) | |
srv | apache server | No longer used (deprecated) | |
stat | statistics computation hosts (see Analytics/Data access) | In use | Data Engineering SREs |
storage | storage host | No longer used (deprecated) | |
stewards | special hosts for wiki stewards (see T344164) | In use | SRE collaboration services |
testreduce | parsoid visual diff testing | In use | Service Operations |
thanos-be | Prometheus long term storage backend | In use | Observability |
thanos-fe | Prometheus long term storage frontend | In use | Observability |
thumbor | Thumbor | In use | Service Operations (& Performance) |
tmh | MediaWiki videoscaler (TimedMediaHandler). See T105009 and T115950. | No longer used (deprecated) | |
torrelay | Tor relay | No longer used (deprecated) | |
urldownloader | url-downloader | In use (added in T224551) | Service Operations |
virt | labs virtualization nodes | No longer used (deprecated) | |
wcqs | wikicommons query service | In use | Search Platform SREs |
wdqs | wikidata query service | In use | Search Platform SREs |
webperf | webperf metrics (performance team). See T179036. | In use | Performance & Service Operations |
wtp | wiki-text processor node (parsoid) | In use | Service Operations |
xhgui | A graphical interface for PHP debug profiles. See Performance/Runbook/XHGui service. | In use | Performance & Service Operations |
dragonfly-supernode | Supernode for Dragonfly P2P network (distributing docker images) (T286054) | In use | Service Operations |
Miscellaneous servers
Historically, we used per-datacenter naming schemes for any one-off or single host. This included any software that wasn't load balanced across multiple machines, or general task machines that could cluster (to an extent) but required opsen work to do so.
Instead of being named for their purpose, these hosts were named according to a naming convention for their datacenter:
- Hosts in eqiad were named for chemical elements, in order of increasing atomic number.
- Hosts in codfw were named for stars. Stars in the Orion constellation were reserved for fundraising (Alnilam, Alnitak, Bellatrix,
Betelgeuse, Heka, Meissa, Mintaka, Nair Al Saif, Rigel, Saiph, Tabit, Thabit). - Hosts in esams or knams were named for notable Dutch people.
These naming schemes are deprecated in favour of specialized cluster names above. Even if you're certain that the foobar service will only ever use a single host, you should name that host "foobar1001" (or 2001, 3001, etc. as appropriate to the datacenter).
One-off names were easy to come up with—especially for machines that did more than one kind of thing, where it's hard to identify a single descriptive name—but they were also opaque. Engineers had to know that the eqiad MediaWiki maintenance host was "terbium" and the codfw package-build host was "deneb." Naming these machines "mwmaint1001" and "build2001" is easier for sleepy oncallers to remember in an emergency, and friendlier to new hires who have to learn all the names at once.
Some older hosts in production still use these naming schemes, but new hosts should not use them.