SRE/Infrastructure naming conventions
This page documents the naming conventions of servers, routers, and data center sites.
Our servers currently fall in broadly two categories:
- Clustered servers: These use numeral sequences with a descriptive prefix (see #Networking and #Servers). For example: db1001.
- Miscellaneous servers: These used unique hostnames (see #Miscellaneous servers). For example: helium. This naming convention is deprecated and not used for new hosts, but some older miscellaneous-named hosts still exist.
Name reuse
Historically, we did not reuse names of past servers for new servers. For example, after db1001 is decommissioned, no other server will be named db1001. Ganeti VMs sometimes reuse hostnames, but bare metal typically will not.
The notable exception is networking gear, which are deterministically specified by rack. For example the access switch in Eqiad rack A8 is named asw-a8-eqiad. If it is replaced, the new switch will take the same name.
All hardware in the datacenter space is tracked in Netbox, which can be used to check for existing hostnames for both hardware and ganeti instances.
Data centers
Data centers were traditionally named as vendor initials (at time of lease signing) followed by the IATA code for a nearby major airport. For example, for Eqiad data center the vendor is Equinix, and IAD the large nearby airport. This convention was used from 2003 upto 2023. Because vendors go through acquisitions, and original initials no longer apply after some time, starting with Magru in 2023, only the airport code is used along with a freely chosen prefix.[1]
DC | Vendor (originally) | Airport Code |
---|---|---|
codfw | CyrusOne | DFW |
drmrs | Digital Realty | MRS |
eqdfw | Equinix | DFW |
eqiad | Equinix | IAD |
eqord | Equinix | ORD |
eqsin | Equinix | SIN |
esams | EvoSwitch | AMS |
knams | Kennisnet | AMS |
magru | Ascenty | GRU |
ulsfo | United Layer | SFO |
Networking
Naming for network equipment is based on role and location.
This also applies to: power distribution units, serial console servers, and other networking infrastructure.
Name prefix | Role | Example |
---|---|---|
asw | access switch | asw-a1-eqiad |
cr | core router | cr1-eqiad |
mr | management router | mr1-eqiad |
lsw | leaf switch | lsw1-e1-eqiad |
ssw | spine switch | ssw1-e1-eqiad |
msw | management switch | msw1-eqiad & msw-b2-eqiad |
pfw | payments fire wall | pfw1-eqiad |
ps1 / ps2 | power strips/distribution units | ps1-b3-eqiad |
scs | serial console server | scs-a8-eqiad |
fasw | Fundraising access switch | fasw-c-codfw |
cloudsw | Cloud L3 switches | cloudsw1-c8-eqiad |
OpenStack deployments
[Datacenter Site][numeric identifier](optional dev suffix to indicate non-external non-customer facing deployments) - [r (if region)][letter for AZ]
- Current Eqiad/Codfw deployments will not fully meet these standards until rebuilt: [eqiad0 (deployment), eqiad (region), nova (AZ)]
Deployment | Region | Availability Zone |
---|---|---|
eqiad0 | eqiad0-r | eqiad0-rb |
eqiad1 | eqiad1-r | eqiad1-rb |
codfw0dev | codfw0dev-r | codfw0dev-rb |
codfw1dev | codfw1dev-r | codfw1dev-rb |
Disks
- Arrays must use the
Storage array
device role in Netbox. - Naming follows two conventions:
- Array is attached to a single host:
- hostname_of_host_system-arrayN
- Example: ms2001-array1, ms2001-array2
- all arrays get a number, even if there is only a single array.
- Example: dataset1001-array1
- Array is attached to multiple hosts
- Labs uses this for labstore, each shelf connects to two different hosts. As such, the older single host naming scheme fails.
- servicehostgroup-arrayN-site
- Example: labstore-array1-codfw, labstore-array2-codfw
Kubernetes
Any cluster that is not the main wikikube cluster should use a consistent identifier and follow these conventions:
- Control plane service name: <identifier>-ctrl
- Ingress service name: <identifier>-ingress [-ro|-rw] for active/active or active/passive
- Hostnames for control plane : <identifier>-ctrlXXXX.$site.wmnet
- Hostnames for kubelets : <identifier>-workerXXXX.$site.wmnet
Servers
Datacenter numbering
Any system that runs in a dedicated services cluster with other machines will be named after their role/service task. As a rule, we attempt to name after the service, not just the software package. Also, servers within a group are numbered based on the datacenter they are located in.
Data center | Numeral range | Example |
---|---|---|
pmtpa / sdtpa (decommissioned) | 1-999 | cp7 |
eqiad | 1000-1999 | db1001 |
codfw | 2000-2999 | mw2187 |
esams / knams | 3000-3999 | cp3031 |
ulsfo | 4000-4999 | bast4001 |
eqsin | 5000-5999 | dns5001 |
drmrs | 6000-6999 | cp6011 |
magru | 7000-7999 | cp7001 |
When adding a new datacenter, make sure to update operations/puppet.git
's /typos
file which checks hostnames.
Hostname prefixes
The full list of hostname prefixes currently in use can be gathered from a cumin host (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet
) with:
sudo cumin --no-color 'A:all' 2>/dev/null | nodeset -S '\n' -e | sed 's/\..*//g' | sed 's/[0-9]\{4\}//g' | sort | uniq
Be aware that hosts with dev
in their name could have the dev
part before or after the 4 digits number.
Name prefix | Description | Status | Points of contact |
---|---|---|---|
acmechief | ACME certificate manager | In use | Traffic |
acmechief-test | ACME certificate manager staging environment | In use | Traffic |
alert | Alerting host (Icinga / Alertmanager) | In use | Observability |
amssq | esams caching server | No longer used (deprecated) | |
amslvs | esams LVS | No longer used (deprecated) | |
analytics | analytics nodes (Hadoop, Hive, Impala, and various other things) | Being replaced by an-worker | Data Platform SREs |
an-conf | Analytics Hadoop cluster zookeeper quorum | In use | Data Platform SREs |
an-coord | Analytics Hadoop cluster coordination node (Presto and Hive) | In use | Data Platform SREs |
an-db | Data Platform Postgresql database cluster | In use | Data Platform SREs |
an-druid | Druid Cluster (Analytics) | In use | Data Platform SREs |
an-launcher | Analytics job scheduler node | In use | Data Platform SREs |
an-master | Analytics Hadoop cluster namenode | In use | Data Platform SREs |
an-mariadb | Data Platform mariadb databases (analytics_meta) | In use | Data Platform SREs |
an-presto | Analytics Presto cluster workers | In use | Data Platform SREs |
an-redacteddb | analytics dedicated mariadb servers with sanitized data, as per the wikireplicas | In use | Data Platform SREs |
an-tool | Analytics tools node (YARN UI, Turnilo | In use | Data Platform SREs |
an-test-client | Analytics Hadoop-test client (equivalent to stat servers, but for test cluster) | In use | Data Platform SREs |
an-test-coord | Analytics Hadoop-test cluster coordinator (Hive, Presto, MariaDB) | In use | Data Platform SREs |
an-test-master | Analytics Hadoop-test cluster namenodes | In use | Data Platform SREs |
an-test-ui | Analytics Hadoop-test YARN UI | In use | Data Platform SREs |
an-test-worker | Analytics Hadoop-test cluster workers | In use | Data Platform SREs |
an-test-druid | Analytics Druid-test worker | In use | Data Platform SREs |
an-test-presto | Analytics Presto-test worker | In use | Data Platform SREs |
an-web | Analytics webserver (wikistats, published datasets | ||
an-worker | Analytics Hadoop cluster workers | In use, replacing analyticsNNNN | Data Platform SREs |
an-airflow | Airflow instances provided to client teams by Data Platform Engineering | Being migrated to dse-k8s | Data Platform SREs |
aphlict | notification server for Phabricator | In use | Collaboration Services |
apt | Advanced Package Tool Repository (Debian APT repo) | In use | Infrastructure Foundations |
aqs | Cassandra cluster for Analytics Query Service (+others) | In use | Data Persistence |
archiva | Archiva Artifact Repository | Being decommissioned | Data Platform SREs |
auth | Authentication server | In use | Infrastructure Foundations |
authdns | Authoritative DNS (gdsnd) | In use | Traffic |
backup | Backup hosts | In use | Data Persistence |
backupmon | Backup monitoring hosts | In use | Data Persistence |
bast | bastion host | In use | Infrastructure Foundations |
censorship | Censorship monitoring databases and scripts | No longer used (deprecated) | |
centrallog | Centralized syslog | In use | Observability |
cephosd | Ceph servers for use with Data Engineering and similar storage requirements | In use | Data Platform SREs |
certcentral | Central certificates service | No longer used (deprecated) | |
chartmuseum | Helm Chart repository ChartMuseum | In use | Service Operations |
civi | Fundraising CiviCRM | In use | FR-Tech SREs |
cloud*-dev | Any cloud role + '-dev' = internal deployment (PoC, Staging, etc) | In use | WMCS |
cloudbackup | Backup storage system for WMCS | In use | WMCS |
cloudcephmon | Ceph monitor and manager daemon for WMCS | In use | WMCS |
cloudcephosd | Ceph object storage data nodes for WMCS | In use | WMCS |
cloudceph | Converged Ceph object storage and monitor nodes for WMCS (only used for testing) | No longer used | |
cloudcontrol | OpenStack deployment controller for WMCS | In use | WMCS |
clouddb | Wiki replica servers for WMCS | In use | WMCS, with support from DBAs and Data Platform SREs |
cloudelastic | Replication of ElasticSearch for WMCS | In use | WMCS |
cloudgw | Cloud gateway server for WMCS | In use | WMCS |
cloudmetrics | Monitoring server for WMCS | In use | WMCS |
cloudnet | Network gateway for tenants of WMCS (Neutron l3) | In use | WMCS |
cloudservices | Misc OpenStack components (Designate) for WMCS | In use | WMCS |
cloudstore | Storage system for WMCS | In use | WMCS |
cloudvirt | OpenStack Hypervisor (libvirtd + KVM) for WMCS | In use | WMCS |
cloudvirtan | OpenStack Hypervisor (libvirtd + KVM) for WMCS (dedicated to Analytics) | No longer used | |
cloudvirt-wdqs | OpenStack Hypervisor (libvirtd + KVM) for WMCS (dedicated to WDQS) | No longer used | WMCS |
cloudweb | WMCS management websites (wikitech, horizon, striker) | In use | WMCS |
collab | Spare hardware for single-host-per-dc Collaboration services (Phabricator, Gerrit, Contint) | Planned | Collaboration Services |
conf | Configuration system host (etcd, zookeeper...) | In use | Service Operations |
config-master | host running the config-master site | In use | Infrastructure Foundations |
contint | Continuous Integration | In use | Collaboration Services |
cp | Cache proxy (Varnish) | In use | Traffic |
cumin | Cluster management (cumin/spicerack/debdeploy/etc...) | In use | Infrastructure Foundations |
datahubsearch | DataHub OpenSearch Cluster - used for DataHub | In use | Data Platform SREs |
dataset | dataset dumps storage | No longer used (deprecated) | |
db | Database host | In use | Data Persistence |
dbmonitor | Database monitoring | In use | Data Persistence |
dborch | Database orchestration (MySQL Orchestrator) | In use | Data Persistence |
dbprov | Database backup generation and data provisioning | In use | Data Persistence |
dbproxy | Database proxy | In use | Data Persistence |
dbstore | Analytics private mediawiki database replicas | In use | Data Platform SREs & Data Persistence |
debmonitor | Debian packages monitoring | In use | Infrastructure Foundations |
deploy | Deployment hosts | In use | Service Operations |
dns | DNS recursors | In use | Infrastructure Foundations |
doc | Documentation server (CI) | In use | Collaboration Services |
doh | Wikidough Anycasted | In use | Traffic |
druid | Druid Cluster (Public) | In use | Data Platform SREs |
dse-k8s-etcd | etcd server for the kubernetes cluster of Data Science and Engineering | In use | Data Platform SREs |
dse-k8s-ctrl | control plane server for the kubernetes cluster of Data Science and Engineering | In use | Data Platform SREs |
dse-k8s-worker | worker node for the kubernetes cluster of Data Science and Engineering | In use | Data Platform SREs |
dumpsdata | dataset generation fileset serving to snapshot hosts | In use | Data Platform SREs |
durum | Check service for Wikidough | In use | Traffic |
elastic | elasticsearch servers | In use | Data Platform SREs |
es | Database host for MediaWiki external storage (wiki content, compressed) | In use | Data Persistence |
etcd | Etcd server | In use | Service Operations |
etherpad | Etherpad server | In use | Collaboration Services |
eventlog | EventLogging host | In use | Data Platform SREs |
flink-zk | Dedicated zookeeper cluster for Flink | in use | Data Platform SREs |
flowspec | Network controller | In use (testing) | Infrastructure Foundations |
fr* | Fundraising servers, e.g. frdb, frlog, frpm (puppetmaster) | In use | fr-tech SREs |
ganeti | Ganeti Virtualization Cluster | In use | Infrastructure Foundations |
ganeti-test | Ganeti Virtualization Cluster (test setup) | in use | Infrastructure Foundations |
gerrit | Gerrit servers (code review) | In use | Collaboration Services & Release Engineering |
gitlab | Gitlab servers (code review, CI, CD) | In use (phab:T274459) | Service Operations |
grafana | Grafana server | In use | Observability |
graphite | Graphite server | In use | Observability |
icinga | Icinga servers | In use | Observability |
idm | Identity manager (Bitu) | In use | Infrastructure Foundations |
idp | Identity provider (Apereo CAS) | In use | Infrastructure Foundations |
install | Installation server | In use | Infrastructure Foundations |
kafka | Kafka brokers | No longer used | |
kafka-main | Kafka brokers | In use | Infrastructure Foundations |
kafka-jumbo | Large general purpose Kafka cluster | In use | Data Platform SREs & Infrastructure Foundations |
kafka-logging | Logging/o11y Kafka cluster | In use | Observability |
kafkamon | Kafka monitoring (VMs) | In use | Data Platform SREs & Infrastructure Foundations |
karapace | DataHub Schema Registry server (standalone) - Used for DataHub | In use | Data Platform SREs |
knsq | knams squid | No longer used (deprecated) | |
krb | Kerberos KDC/Kadmin | In use | Infrastructure Foundations & Data Platform SREs |
kubernetes | Kubernetes cluster (k8s) | In use | Service Operations |
kubestage | Kubernetes staging cluster | In use | Service Operations |
kubestagetcd | Etcd cluster for the Kubernetes staging cluster | In use | Service Operations |
kubetcd | Etcd cluster for the Kubernetes cluster | In use | Service Operations |
lab | labs virtual node | No longer used (deprecated) | |
labcontrol | Controller node for WMCS (aka "labs") | No longer used (deprecated) | |
labnet | Networking host for WMCS | No longer used (deprecated) | |
labnodepool | Dedicated WMCS host for Nodepool (CI) | No longer used (deprecated) | |
labpuppetmaster | Puppetmasters for WMCS | No longer used (deprecated) | |
labsdb | Replication of production databases for WMCS | No longer used (deprecated) | |
labservices | Services for WMCS | No longer used (deprecated) | |
labstore | Disk storage for WMCS | In use (deprecated) | WMCS |
labtest* | Test hosts for WMCS | No longer used (deprecated) | |
labvirt | Virtualization node for WMCS | No longer used (deprecated) | |
labweb | Management websites for WMCS | No longer used (deprecated) | |
lists | Mailing lists running Mailman | In use | Legoktm and Ladsgroup |
logging-hd | Logging Cluster - OpenSearch data node (hdd class) | In Use | Observability |
logging-sd | Logging Cluster - OpenSearch data node (ssd class) | Planned | Observability |
logging-fe | Logging Cluster - OpenSearch/OpenSearch-Dashboards/Logstash node | Planned | Observability |
logstash | opensearch/logstash/opensearch-dashboards node | In use | Observability |
lvs | lvs load balancer | In use | Traffic |
maps | Maps cluster | In use | Content Transform Team and hnowlan |
maps-test | maps test cluster | No longer used (deprecated) | |
matomo | Matomo analytics serer (formerly named Piwik) | In use | Data Platform SREs |
mc | memcached server for mediawiki | In use | Service Operations |
mc-gp | memcached gutter pool server for mediawiki | In use | Service Operations |
mc-wf | memcached servers for wikifunctions | In use | Service Operations |
mc-misc | memcached servers for anything else in need of memcached | Planned | Service Operations |
ml-staging | Machine learning stanging env etcd and control plane machines | In use | ML team |
ml-serve | Machine learning serving cluster (ml-serve-ctrl* are VMs for k8s control plane) | In use | ML team |
ml-cache | Machine leaning caching nodes | In use | ML team |
ml-lab | Machine learning experimenting/sandbox machines for ML models (similar to statboxes, but owned by ML). | In use | ML team |
mirror | public mirror, e.g. Debian mirror, Ubuntu mirror | In use | Infrastructure Foundations |
miscweb | miscellaneous web server | In use | Collaboration Services |
ms | media storage | No longer used (deprecated) | Data Persistence (Media Storage) |
ms-backup | media storage backup generation (workers) | In use | Data Persistence (Media Storage) |
ms-be | media storage backend | In use | Data Persistence (Media Storage) |
ms-fe | media storage frontend | In use | Data Persistence (Media Storage) |
mutual-os | Mutualized (shared) opensearch cluster | planning | Data Platform SRE |
mw | MediaWiki application server (MediaWiki PHP webservers, api, jobrunners, videoscalers) | In use | Service Operations |
mwdebug | MediaWiki application server for debugging and deployment staging (Ganeti VMs) | In use | Service Operations |
mwlog | MediaWiki logging host | In use | Service Operations |
mwmaint | MediaWiki maintenance host (formerly "terbium") | In use | Service Operations |
mx | Mail relays | In use | Infrastructure Foundations |
mx-out | Outbound mail relays | In use | Infrastructure Foundations |
mx-in | Inbound mail relays | In use | Infrastructure Foundations |
nas | NAS boxes (NetApp) | Unused | |
netflow | Network visibility | In use | Infrastructure Foundations |
netmon | Network monitor (librenms, rancid, etc) | In use | Infrastructure Foundations |
netbox | Netbox front-end instances | In use | Infrastructure Foundations |
netbox-dev | Netbox test instances | In use | Infrastructure Foundations |
netboxdb | Netbox back-end database instances | In use | Infrastructure Foundations |
nfs | NFS server | Unused | |
peek | Security Team workflow and project management tooling | In use | Security Team |
ocg | offline content generator (PDF) | No longer used (deprecated) | |
ores | ORES cluster | In use | Machine Learning SREs |
orespoolcounter | ORES PoolCounter | In use | Machine Learning SREs |
oresrdb | ORES Redis systems | No longer used (deprecated) | |
pay* | Fundraising servers, e.g. payments, pay-lb, pay-lvs | In use | FR-Tech SREs |
pc | Parser cache database | In use | SRE Data Persistence (DBAs), with support from Platform and Performance |
PDF Collections | No longer used (deprecated) | ||
people | peopleweb (people.wikimedia.org) | In use | Collaboration Services |
parse | parsoid | Soon to be no longer used (deprecated) | Service Operations |
parsoidtest | parsoid | Soon to be used | Service Operations |
phab | Phabricator host (currently iridium is eqiad phab host) | In use | Collaboration Services |
ping | Ping offload server | In use | Infrastructure Foundations |
planet | Planet server | In use (mistake) | Collaboration Services |
pki | PKI Server (CFSSL) | In use | Infrastructure Foundations |
pki-root | PKI Root CA Server (CFSSL) | In use | Infrastructure Foundations |
poolcounter | PoolCounter cluster | In use | Service Operations |
prometheus | Prometheus cluster | In use | Observability |
proton | Proton cluster | No longer used (deprecated) | |
puppetboard | PuppetDB Web UI | In use | Service Operations |
puppetdb | PuppetDB cluster | In use | Service Operations |
puppetmaster | Puppet masters | In use | Infrastructure Foundations |
puppetserver | Puppet Servers | In use | Infrastructure Foundations |
pybal-test | PyBal testing and development | In use | Traffic |
rbf | Redis Bloom Filter server | Unused | |
rcs | Obsolete:RCStream server (recent changes stream) | No longer used (deprecated) | |
rdb | Redis server | In use | Service Operations |
registry | Docker registries | In use | Service Operations |
releases | Software Releases | In use | Service Operations |
relforge | Discovery's Relevance Forge (see discovery/relevanceForge.git, T131184) | In use | Search Platform SREs |
restbase | Cassandra cluster for RESTBase service (+others) | In use | Data Persistence |
rpki | RPKI#Validation | In use | Infrastructure Foundations |
sca | Service Cluster A - Includes various services | No longer used (deprecated) | |
scb | Service Cluster B - Includes various services. It's effectively the next generation of the sca cluster above | No longer used (deprecated) | |
schema | Event Schemas HTTP server | In use | Data Platform SREs & Service Operations |
search-loader | Analytics to Elastic Search model data loader | In use | Search Platform SREs |
sessionstore | Cassandra cluster for sessionstore | In use | Data Persistence |
snapshot | Data dump processing node | In use | Data Platform SREs |
sq | squid server | No longer used (deprecated) | |
srv | apache server | No longer used (deprecated) | |
stat | statistics computation hosts (see Analytics/Data access) | In use | Data Platform SREs |
storage | storage host | No longer used (deprecated) | |
stewards | special hosts for wiki stewards (see T344164) | In use | SRE collaboration services |
testreduce | parsoid visual diff testing | In use | Service Operations |
thanos-be | Prometheus long term storage backend (swift storage) | In use | Observability / Data Persistence |
thanos-fe | Prometheus long term storage frontend (swift proxy software) | In use | Observability / Data Persistence |
thumbor | Thumbor | In use | Service Operations (& Performance) |
titan | Thanos frontends | In use | Observability |
tmh | MediaWiki videoscaler (TimedMediaHandler). See T105009 and T115950. | No longer used (deprecated) | |
torrelay | Tor relay | No longer used (deprecated) | |
urldownloader | url-downloader | In use (added in T224551) | Service Operations |
virt | labs virtualization nodes | No longer used (deprecated) | |
vrts | VRTS ticketing system | In use | Collaboration Services |
wcqs | wikicommons query service | In use | Data Platform SREs |
wdqs | Wikidata Query Service "full" graph - deprecated (why? See T337013 ) | In use (deprecated, being replaced by wdqs-main, wdqs-scholarly, and possibly wdqs-categories) | Data Platform SREs |
wdqs-categories | Wikidata Query Service Deepcat search graph | in testing, see T374016 | |
wdqs-main | Wikidata Query Service graph split - main. See T337013 | In use | Data Platform SREs |
wdqs-scholarly | Wikidata Query Service graph split - scholarly. See T337013 | In use | |
webperf | webperf metrics (performance team). See T179036. | In use | Performance & Service Operations |
wikikube-ctrl | Wikikube Kubernetes cluster control plane | In use | Service Operations |
wikikube-worker | Wikikube Kubernetes cluster worker nodes | In use | Service Operations |
wtp | wiki-text processor node (parsoid) | No longer used (deprecated) | Service Operations |
xhgui | A graphical interface for PHP debug profiles. See Performance/Runbook/XHGui service. | In use | Performance & Service Operations |
dragonfly-supernode | Supernode for Dragonfly P2P network (distributing docker images) (T286054) | In use | Service Operations |
Miscellaneous servers
Historically, we used per-datacenter naming schemes for any one-off or single host. This included any software that wasn't load balanced across multiple machines, or general task machines that could cluster (to an extent) but required opsen work to do so.
Instead of being named for their purpose, these hosts were named according to a naming convention for their datacenter:
- Hosts in eqiad were named for chemical elements, in order of increasing atomic number.
- Hosts in codfw were named for stars. Stars in the Orion constellation were reserved for fundraising (Alnilam, Alnitak, Bellatrix,
Betelgeuse, Heka, Meissa, Mintaka, Nair Al Saif, Rigel, Saiph, Tabit, Thabit). - Hosts in esams or knams were named for notable Dutch people.
These naming schemes are deprecated in favour of specialized cluster names above. Even if you're certain that the foobar service will only ever use a single host, you should name that host "foobar1001" (or 2001, 3001, etc. as appropriate to the datacenter).
One-off names were easy to come up with—especially for machines that did more than one kind of thing, where it's hard to identify a single descriptive name—but they were also opaque. Engineers had to know that the eqiad MediaWiki maintenance host was "terbium" and the codfw package-build host was "deneb." Naming these machines "mwmaint1001" and "build2001" is easier for sleepy oncallers to remember in an emergency, and friendlier to new hires who have to learn all the names at once.
Some older hosts in production still use these naming schemes, but new hosts should not use them.
- ↑ P&T Weekly Status Updates: 2023-12-04, Wikimeida Foundation, Google Docs (restricted)