Data Services include services that allow for direct access to databases and dumps, as well as web interfaces for querying and programmatic access to data stores. Services currently offered are: Wiki Replicas, ToolsDB, Wikimedia Dumps, Shared Storage, Quarry and PAWS.
Wiki Replicas are MySQL/MariaDB databases that near-realtime replicate from the production MediaWiki databases of Wikimedia Foundation wikis. The database tables are sanitized for public use.
ToolsDB is a service that allows a Tool shared user to create and maintain a Tool specific database. See Help:Toolforge/Database#User databases for help on ToolsDB.
It's acessible on the following addresses:
- tools.db.svc.eqiad1.wikimedia.cloud (preferred)
The Postgres database used by Wikilabels (used by Ores) is on a replicated VM cluster: clouddb-wikilabels-01 is the primary with clouddb-wikilabels-02 as the usual replica. Changes that affect the postgresql service, including upgrades/reboots, should be coordinated with Aaron Halfaker.
Wikimedia Dumps offers a range of data downloads including full text dumps, and other datasets. Toolforge users can directly access dumps data through their Tool account, see Help:Toolforge/Dumps. Cloud VPS users can request to have the share available, see Help:Shared storage#/public/dumps. More documentation about dumps can be found at Data dumps.
Shared Storage is offered via NFS for Toolforge and Cloud VPS users. Shares currently offered are described at Help:Shared storage. The Toolforge environment is setup for access by default, and other Cloud VPS projects can access some resources on special request.
Wikimedia Dumps are also offered via the Shared Storage services, but treated as a Data Service because of their wide use.
CirrusSearch elasticsearch replicas
The "Cloud Elastic" servers are a replica of the CirrusSearch elasticsearch indices made available to Wikimedia Cloud Services applications (both Cloud VPS and Toolforge). These servers are not generally accessible from the internet at large, rather they are only accessible through applications running inside Cloud Services. Applications can use the full power of the elasticsearch search API's to query the search indices in ways that CirrusSearch does not expose directly on the wiki's themselves.
See Help:CirrusSearch elasticsearch replicas for more details on how to access and use the Cloud Elastic service.
Quarry is a graphical web interface that allows users to query the Wiki Replicas with SQL. It only needs a Wikimedia account to login, and is extensively used by analysts, researchers, and people of all experience levels to easily access the databases. See m:Research:Quarry for help.
PAWS is a Jupyter notebooks on the cloud service that hosts python notebooks and a terminal accessible through a web browser. It also only requires a Wikimedia account to login, and allows for access to the Wiki Replicas, ToolsDB and Dumps. See PAWS for help.
We provide a clone of the OSM (OpenStreetMap) database for usage inside Toolforge and Cloud VPS. See Help:Toolforge/Database#Connecting to OSM via the official CLI PostgreSQL and Openstreetmap Databases for more information.