Infrastructure naming conventions

From Wikitech
Jump to navigation Jump to search

This page documents the naming conventions of servers, routers, data center sites, and other infrastructure relevant to Wikimedia Foundation clusters.

Our servers currently fall in broadly two categories:

Name re-use

We never re-use names of past servers for new servers. For example, after db1001 is decommissioned, no other server will be named db1001.

The notable exception is networking gear, which are deterministically specified by rack. For example the access switch in Eqiad rack A8 is named asw-a8-eqiad. If it is replaced, the new switch will take the same name.

All previous servers are kept (even if decommissioned) in Racktables. Please check there for existing server names before deciding on a name for any new servers. (Note: Racktables is restricted by login.)

Server clusters

Clusters are named as vendor initials (at time of lease signing) followed by the IATA code for a nearby major airport.

For example: our Dallas site is named codfw. The vendor is CyrusOne, and DFW being the large nearby airport. (Technically, Love Field airport is closer but less well-known.)

Cluster Vendor Airport Code
codfw CyrusOne DFW
eqdfw Equinix DFW
eqiad Equinix IAD
eqord Equinix ORD
eqsin Equinix SIN
esams EvoSwitch AMS
knams Kennisnet AMS
ulsfo United Layer SFO

Networking

Naming for network equipment is based on role and location.

This also applies to: power distribution units, serial console servers, and other networking infrastructure.

Name prefix Role Example
asw access switch asw-a1-eqiad
cr core router cr1-eqiad
mr management router mr1-eqiad
msw management switch msw1-eqiad & msw-b2-eqiad
pfw payments fire wall pfw1-eqiad
ps1 / ps2 power strips/distribution units ps1-b3-eqiad
scs serial console server scs-a8-eqiad
fasw Fundraising access switch fasw-c-codfw
cloudsw Cloud L3 switches cloudsw1-c8-eqiad

OpenStack deployments

[Datacenter Site][numeric identifier](optional dev suffix to indicate non-external non-customer facing deployments) - [r (if region)][letter for AZ]

  • Current Eqiad/Codfw deployments will not fully meet these standards until rebuilt: [eqiad0 (deployment), eqiad (region), nova (AZ)]
Deployment Region Availability Zone
eqiad0 eqiad0-r eqiad0-rb
eqiad1 eqiad1-r eqiad1-rb
codfw0dev codfw0dev-r codfw0dev-rb
codfw1dev codfw1dev-r codfw1dev-rb

Disks

  • Arrays must use the Storage array device role in Netbox.
  • Naming follows two conventions:
  • Array is attached to a single host:
  • hostname_of_host_system-arrayN
  • Example: ms2001-array1, ms2001-array2
  • all arrays get a number, even if there is only a single array.
  • Example: dataset1001-array1
  • Array is attached to multiple hosts
  • Labs uses this for labstore, each shelf connects to two different hosts. As such, the older single host naming scheme fails.
  • servicehostgroup-arrayN-site
  • Example: labstore-array1-codfw, labstore-array2-codfw

Servers

Any system that runs in a dedicated services cluster with other machines will be named after their role/service task. As a rule, we attempt to name after the service, not just the software package. Also, servers within a group are numbered based on the datacenter they are located in.

Datacenter Numeral range Example
pmtpa / sdtpa 1-999 cp7
eqiad 1000-1999 db1001
codfw 2000-2999 mw2187
esams / knams 3000-3999 cp3031
ulsfo 4000-4999 bast4001
eqsin 5000-5999 dns5001

When adding a new datacenter, make sure to update operations/puppet.git's /typos file which checks hostnames.

Name prefix Description Status Points of contact
acmechief ACME certificate manager In use Traffic
acmechief-test ACME certificate manager staging environment In use Traffic
alert Alerting host (Icinga / Alertmanager) In use Observability
amssq esams caching server No longer used (deprecated)
amslvs esams LVS No longer used (deprecated)
analytics analytics nodes (Hadoop, Hive, Impala, and various other things) Being replaced by an-worker Analytics SREs
analytics-master analytics master nodes Being replaced by an-master Analytics SREs
analytics-tool virtual machines in production (Ganeti) running analytics tools/websites Being replaced by an-tool Analytics SREs
an-coord analytics coordination node In use Analytics SREs
an-master analytics master node In use, replacing analytics-master Analytics SREs
an-tool analytics tools node In use Analytics SREs
an-test-(coord/master/worker) analytics hadoop test cluster nodes In use Analytics SREs
an-worker analytics worker node In use, replacing analyticsNNNN Analytics SREs
an-scheduler analytics job scheduler node In use Analytics SREs
an-airflow analytics job scheduler node dedicated to the Discovery team In use Analytics SREs
aphlict notification server for Phabricator In use Service Operations
apt Advanced Package Tool Repository (Debian APT repo) In use Infrastructure Foundations
aqs Analytics Query Service In use Analytics SREs
archiva Archiva Artifact Repository In use Analytics SREs
auth Authentication server In use Infrastructure Foundations
authdns Authoritative DNS (gdsnd) In use Traffic
backup Backup hosts In use Data Persistence
bast bastion host In use Infrastructure Foundations
censorship Censorship monitoring databases and scripts In use Traffic
centrallog Centralized syslog In use Observability
certcentral Central certificates service No longer used (deprecated)
chartmuseum Helm Chart repository ChartMuseum In use Service Operations
cloud*-dev Any cloud role + '-dev' = internal deployment (PoC, Staging, etc) In use (new) WMCS
cloudcephmon Ceph monitor and manager daemon for WMCS In use (new) WMCS
cloudcephosd Ceph object storage data nodes for WMCS In use (new) WMCS
cloudceph Converged Ceph object storage and monitor nodes for WMCS (currently only slated for testing) In use (new) WMCS
cloudcontrol OpenStack deployment controller for WMCS In use (new) WMCS
clouddb Wiki replica servers for WMCS Soon to be in use WMCS, with support from DBAs
cloudelastic Replication of ElasticSearch for WMCS In use (new) WMCS
cloudgw Cloud gateway server for WMCS Planned WMCS
cloudnet Network gateway for tenants of WMCS (Neutron l3) In use (new) WMCS
cloudservices Misc OpenStack components (Designate) for WMCS In use (new) WMCS
cloudvirt OpenStack Hypervisor (libvirtd + KVM) for WMCS In use (new) WMCS
cloudvirtan OpenStack Hypervisor (libvirtd + KVM) for WMCS (dedicated to Analytics) In use (new) WMCS
cloudstore Storage system for WMCS In use (new) WMCS
cloudbackup Backup storage system for WMCS In use (new) WMCS
conf Configuration system host (etcd, zookeeper...) In use Service Operations
contint Continuous Integration In use Service Operations
cp Cache proxy (Varnish) In use Traffic
cumin Cluster management (cumin/spicerack/debdeploy/etc...) In use Infrastructure Foundations
dataset dataset dumps storage No longer used (deprecated) Service Operations & Platform Engineering
db Database host In use Data Persistence
dbmonitor Database monitoring In use Data Persistence
dborch Database orchestration (MySQL Orchestrator) In use Data Persistence
dbprov Database backup generation and data provisioning In use Data Persistence
dbproxy Database proxy In use Data Persistence
dbstore Database analytics In use Analytics SREs & Data Persistence
debmonitor Debian packages monitoring In use Infrastructure Foundations
deploy Deployment hosts In use Service Operations
dns DNS recursors In use Infrastructure Foundations
doc Documentation server (CI) In use Service Operations (Supportive Services) & Release Engineering
an-druid Druid Cluster (Analytics). Due to naming legacy, druid100[1-3] are also in this cluster. In use Analytics SREs
druid Druid Cluster (Public) In use Analytics SREs
dumpsdata dataset generation fileset serving to snapshot hosts In use Service Operations & Platform Engineering
elastic elasticsearch servers In use Search Platform SREs
es Database host for MediaWiki external storage (wiki content, compressed) In use Data Persistence
etcd Etcd server In use Service Operations
etherpad Etherpad server In use Service Operations
eventlog EventLogging host In use Analytics SREs
flowspec Network controller In use (testing) Infrastructure Foundations
fr* Fundraising servers, e.g. frdb, frlog, frpm (puppetmaster) In use fr-tech SREs
ganeti Ganeti Virtualization Cluster In use Infrastructure Foundations
gerrit Gerrit code review (gerrit1001 in eqiad is currently used) In use (deprecated) Service Operations (Supportive Services) & Release Engineering
gitlab In use (phab:T274459)
grafana Grafana server In use Observability
graphite Graphite server In use Observability
icinga Icinga servers In use Observability
idp Identity provider (Apereo CAS) In use Infrastructure Foundations
install Installation server In use (rare) Infrastructure Foundations
kafka Kafka brokers In use Analytics SREs & Infrastructure Foundations
kafka-jumbo Large general purpose Kafka cluster In use Analytics SREs & Infrastructure Foundations
kafkamon Kafka monitoring (VMs) In use Analytics SREs & Infrastructure Foundations
knsq knams squid No longer used (deprecated)
krb Kerberos KDC/Kadmin In use Infrastructure Foundations & Analytics SREs
kubernetes Kubernetes cluster (k8s) In use Service Operations
kubestage Kubernetes staging cluster In use Service Operations
kubestagetcd Etcd cluster for the Kubernetes staging cluster In use Service Operations
kubetcd Etcd cluster for the Kubernetes cluster In use Service Operations
lab labs virtual node No longer used (deprecated) WMCS
labcontrol Controller node for WMCS (aka "labs") In use (deprecated) WMCS
cloudmetrics Monitoring server for WMCS In use (deprecated) WMCS
labnet Networking host for WMCS In use (deprecated) WMCS
labnodepool Dedicated WMCS host for Nodepool (CI) In use (deprecated) WMCS
labpuppetmaster Puppetmasters for WMCS In use (deprecated) WMCS
labsdb Replication of production databases for WMCS In use (deprecated) WMCS with support from DBAs
labservices Services for WMCS In use (deprecated) WMCS
labstore Disk storage for WMCS In use (deprecated) WMCS
labtest* Test hosts for WMCS In use (deprecated) WMCS
labvirt Virtualization node for WMCS In use (deprecated) WMCS
labweb Management websites for WMCS In use (deprecated) WMCS
logstash elasticsearch/logstash/kibana node In use Observability
lvs lvs load balancer In use Traffic
maps Maps cluster In use
maps-test maps test cluster No longer used (deprecated)
mc memcached server In use Service Operations
mc-gp memcached gutter pool server In use Service Operations
ml-etcd Machine learning etcd cluster In use ML team
ml-serve Machine learning serving cluster (ml-serve-ctrl* are VMs for k8s control plane) In use ML team
miscweb miscellaneous web server planned; to replace krypton Service Operations
ms media storage No longer used (deprecated) Data Persistence (Media Storage)
ms-be media storage backend In use Data Persistence (Media Storage)
ms-fe media storage frontend In use Data Persistence (Media Storage)
mw MediaWiki application server (MediaWiki PHP webservers, api, jobrunners, videoscalers) In use Service Operations
mwdebug MediaWiki application server for debugging and deployment staging (Ganeti VMs) In use Service Operations
mwlog MediaWiki logging host In use Service Operations
mwmaint MediaWiki maintenance host (formerly "terbium") In use Service Operations
mx Mail relays In use Infrastructure Foundations
nas NAS boxes (NetApp) Unused
netflow Network visibility In use Infrastructure Foundations
netmon Network monitor (smokeping, torrus, librenms, rancid, netbox) In use Infrastructure Foundations
netbox Netbox front-end instances In use Infrastructure Foundations
netbox-dev Netbox test instances In use Infrastructure Foundations
netboxdb Netbox back-end database instances In use Infrastructure Foundations
notebook Jupyterhub experimental server In use Analytics SREs
nfs NFS server Unused
peek Security Team workflow and project management tooling In use Security Team
ocg offline content generator (PDF) No longer used (deprecated)
ores ORES cluster In use Machine Learning SREs
orespoolcounter ORES PoolCounter In use Machine Learning SREs
oresrdb ORES Redis systems In use Machine Learning SREs
pc Parser cache database In use SRE Data Persistence (DBAs), with support from Platform and Performance
pdf PDF Collections No longer used (deprecated)
people peopleweb (people.wikimedia.org) In use Service Operations & Infrastructure Foundations
parse parsoid Soon in use Service Operations
phab Phabricator host (currently iridium is eqiad phab host) In use Service Operations
ping Ping offload server In use Infrastructure Foundations
planet Planet server In use (mistake) Service Operations
pki PKI Server (CFSSL) In use Infrastructure Foundations
pki-root PKI Root CA Server (CFSSL) In use Infrastructure Foundations
poolcounter PoolCounter cluster In use Service Operations
prometheus Prometheus cluster In use Observability
proton Proton cluster In use Service Operations
puppetboard PuppetDB Web UI In use Service Operations
puppetdb PuppetDB cluster In use Service Operations
puppetmaster Puppet masters In use Infrastructure Foundations
pybal-test PyBal testing and development In use Traffic
rbf Redis Bloom Filter server Unused Service Operations
rcs Obsolete:RCStream server (recent changes stream) No longer used (deprecated)
rdb Redis server In use Service Operations
registry Docker registries In use Service Operations
releases Software Releases In use Service Operations
relforge Discovery's Relevance Forge (see discovery/relevanceForge.git, T131184) In use Search Platform SREs
restbase RESTBase server In use Service Operations
rpki RPKI#Validation In use Infrastructure Foundations
sca Service Cluster A - Includes various services In use Service Operations
scb Service Cluster B - Includes various services. It's effectively the next generation of the sca cluster above In use Service Operations
schema Event Schemas HTTP server In use Analytics SREs & Service Operations
search-loader Analytics to Elastic Search model data loader In use Search Platform SREs
sessionstore Service Operations
snapshot Data dump processing node In use Service Operations & Platform Engineering
sq squid server No longer used (deprecated)
srv apache server No longer used (deprecated)
stat statistics computation hosts (see Analytics/Data access) In use Analytics SREs
storage storage host No longer used (deprecated)
testreduce parsoid visual diff testing In use Service Operations
thanos-be Prometheus long term storage backend In use Observability
thanos-fe Prometheus long term storage frontend In use Observability
thumbor Thumbor In use Service Operations (& Performance)
tmh MediaWiki videoscaler (TimedMediaHandler). See T105009 and T115950. No longer used (deprecated)
torrelay Tor relay No longer used (deprecated)
urldownloader url-downloader In use (added in T224551) Service Operations
virt labs virtualization nodes No longer used (deprecated)
wdqs wikidata query service In use Search Platform SREs
webperf webperf metrics (performance team). See T179036. In use Performance & Service Operations
wtp wiki-text processor node (parsoid) In use Service Operations
xhgui A graphical interface for PHP debug profiles built on MongoDB. See Performance/Runbook/XHGui service. In use Performance & Service Operations

Miscellaneous servers

Any one-off or single service host. This includes pretty much all non-MediaWiki software on the cluster that isn't load balanced across multiple machines. Or general task machines that can cluster (to an extent) but require opsen work to do so. The naming of these is based on location (since they tend to do more than one kind of thing or provide more than one particular service/site type). The use of those names is deprecated in favour of specialized cluster names above, when possible.

Datacenter Site Convention Example Notes
codfw Star Names acamar Only use modern proper star names that are a single word long and contain no odd characters.

Orion constellation is reserved for fundraising (Alnilam, Alnitak, Bellatrix, Betelgeuse, Heka, Meissa, Mintaka, Nair Al Saif, Rigel, Saiph, Tabit, Thabit).

eqiad Elements helium Next atomic # assignment (incremental by atomic #): 112
esams / knams Notable Dutch vandale