Jump to content

Help talk:Toolforge/Database

Rendered with Parsoid
From Wikitech
Latest comment: 3 months ago by Novem Linguae in topic What's the _p in enwiki_p mean?

remote databases

grep '^192\.168\.99\.' /etc/hosts > labsdb.hosts seems broken. I guess everything is 10.* now. For my usecase I only need enwiki.labsdb, that points to 10.64.4.11

So you can connect via mysql --defaults-file="${HOME}"/replica.my.cnf -h 10.64.4.11 enwiki_p

--physikerwelt (talk) 16:09, 17 January 2015 (UTC)Reply

text table

Is this the right place to mention that many (all?) of the wikis replicated in labsdb instances don't have the text table populated but use a separate storage mechanism for revision text? It would have saved me several hours hacking at database code that was never going to work. GoldenRing (talk) 13:38, 9 April 2015 (UTC)Reply

Someone else asked at the parent talk page so expectations vary but are never fulfilled :-).
I don't know where I would put that information, and I am generally unhappy with the current documentation structure as it feels too wordy to me and the multiple pages/collapsed sections unuseful because they hide information if you search for it with Google Chrome. But I can't unlearn what I've got to know over the past years to make an educated decision about that.
So: Hey, it's a wiki! Be bold, etc. --Tim Landscheidt (talk) 22:29, 9 April 2015 (UTC)Reply

Access to multiple databases

Hi, I need to query multiple lang wikipedias. How can I do this without creating multiple connections?? --Kanzat (talk) 08:35, 2 May 2015 (UTC)Reply

You can access the databases with the corresponding prefixes, for example:
SELECT p1.page_id,
       p2.page_id
FROM enwiki_p.page AS p1
JOIN dewiki_p.page AS p2 USING (page_namespace,
                                page_title)
WHERE p1.page_namespace = 0
  AND p1.page_title = 'Hamburg';
--Tim Landscheidt (talk) 04:51, 3 May 2015 (UTC)Reply

PHP

So, if a tool gets mysql_connect(): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock', what is it doing wrong? Context: phabricator:T107618. --Nemo 20:57, 31 July 2015 (UTC)Reply

Static MediaWiki

I'm thinking of trying to get MediaWiki to work with the Tool Labs replicas (read-only). I basically want to set up a filter that performs textual modifications to the pages before they are served - specifically, I want to add (approximate) Pinyin readings to Chinese text to help language learners. If anyone tries to edit a page, I will provide an explanation and a link to the source wiki. Has anyone done this sort of thing before/can anyone think of any snags I might run up against? Would this be considered a valid use of Tool Labs servers?

I have already noticed, as User:GoldenRing mentions, that the text tables are not present in the replicas. Can anyone tell me where they are?

If necessary, I will make a fork of MediaWiki that supports this sort of use so it will be easy to do in future.

--Spacemartin (talk) 14:16, 18 November 2015 (UTC)Reply

In the WMF cluster, the article contents are not stored in the database, but on separate servers. These are neither replicated to Labs nor can they be directly accessed otherwise. The most efficient way is probably RESTBase.
For your purposes, I would suggest to develop either a JavaScript gadget that users could individually enable on wiki or a MediaWiki extension. The former option is much easier to do and deploy than the latter. --Tim Landscheidt (talk) 04:48, 19 November 2015 (UTC)Reply
Too late! Pinyin Wiki :-) --Spacemartin (talk) 19:50, 20 November 2015 (UTC)Reply

No external access to database by Python

pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'enwiki.labsdb:3306' ([Errno 11001] getaddrinfo failed)")
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2005, "Unknown MySQL server host 'ruwiki.labsdb' (0)")

I get this errors when try run a scripts from my computer (tried modules pymysql and sqlalchemy). On the server it works without problem. Script like:

import pymysql.cursors
connection = pymysql.connect(host='enwiki.labsdb:3306',
  user=u,
  password=pw,
  db='enwiki_p',
  charset='utf8',
  cursorclass=pymysql.cursors.DictCursor)

Also I tried open SSH-connect from the script, but it not help. (Schema: open ssh, test ok, try access to db - get the errors, close ssh.). --Vladis13 (talk) 04:51, 23 September 2016 (UTC)Reply

The MySQL servers are not open to the outside world; they are only accessible from inside the labs network. If you want to connect from your local computer, see Help:Tool_Labs/Database#Steps_to_setup_SSH_tunneling_for_testing_a_tool_labs_application_which_makes_use_of_tool_labs_databases_on_your_own_computer on tunneling over SSH. valhallasw (Merlijn van Deen) (talk) 06:20, 23 September 2016 (UTC)Reply
I found problem. 1) In Windows no the tool "ssh" about talk the Help. Instead can use the PuTTY (I did add this in Help.). 2) In module pymysql need set port like: pymysql.connect(host='127.0.0.1', port=4711, but not pymysql.connect(host='127.0.0.1:4711'. --Vladis13 (talk) 07:15, 23 September 2016 (UTC)Reply
SQLAlchemy.py not works on server. It requires module MySQLdb.py which not installed there. --Vladis13 (talk) 07:58, 23 September 2016 (UTC)Reply

PostGIS

Is it possible to have a tool that needs PostGIS (extension of PostgreSQL). I would like to make a tool that does spatial queries for requested images for Commons. --Tobias47n9e (talk) 12:43, 3 January 2017 (UTC)Reply

Tobias47n9e I would suggest opening a Phabricator task with the request and we can discuss the various options. --BryanDavis (talk) 15:59, 3 January 2017 (UTC)Reply
@BryanDavis: What category would you suggest on Phabricator? I see that the postgis package is in the apt-cache. But I am not sure if it is installed. Can a tool have access to a Postgresql database? service postgresql status says that it is an unrecognized service. --Tobias47n9e (talk) 16:21, 3 January 2017 (UTC)Reply
@Tobias47n9e: Tag the phab task with #tool-labs and also #DBA. I don't know that we do have a postgres db at the moment in Tool Labs. I kind of doubt that we will provision postgres just for this, but maybe we can find another way to help you build your tool. --BryanDavis (talk) 17:01, 3 January 2017 (UTC)Reply
@BryanDavis: Here is the issue: https://phabricator.wikimedia.org/T154497 --Tobias47n9e (talk) 17:29, 3 January 2017 (UTC)Reply

EXPLAIN

The inability to use EXPLAIN seriously sucks. I imagine that's some view-related limitation, but surely there is some way to allow safe execution of explain queries... --tgr (talk) 08:53, 15 October 2017 (UTC)Reply

The documentation on using the database is split a bit between two pages, this one and Help:MySQL_queries. This one focuses more on the Toolforge-specific parts, while the latter is relevant labs-wide. Well, that's the theory anyway. The latter page has a section on EXPLAINing queries, Help:MySQL_queries#Optimizing_queries, with a link to an on-line tool that helps to run the EXPLAIN queries (basically by running the query and using the show-me-the-query-plan-of-a-running-query function). valhallasw (Merlijn van Deen) (talk) 09:34, 15 October 2017 (UTC)Reply
I made some edits to Help:MySQL_queries#Optimizing_queries that will hopefully be helpful. --BryanDavis (talk) 23:06, 15 October 2017 (UTC)Reply
As far as allowing EXPLAIN, it's an upstream bug. MySQL would never fix it because of some secret requirement from their commercial side, see phab:T50875 for details. I don't know if anyone ever asked the MariaDB people if they'd fix it. Anomie (talk) 13:45, 16 October 2017 (UTC)Reply

Identifying lag

right now the example query is

SELECT lag FROM heartbeat_p.heartbeat JOIN meta_p.wiki ON CONCAT(shard, '.analytics.db.svc.eqiad.wmflabs') = slice WHERE dbname = 'fawiki';

shouldn't it be

SELECT lag FROM heartbeat_p.heartbeat JOIN meta_p.wiki ON CONCAT(shard, '.labsdb') = slice WHERE dbname = 'fawiki';

? -- Seth (talk) 17:33, 9 May 2018 (UTC)Reply

The *.labsdb service names are deprecated. They are literally DNS CNAME pointers to the *.analytics.db.svc.eqiad.wmflabs service names which are actively maintained. New code and examples should use *.analytics.db.svc.eqiad.wmflabs and/or *.web.db.svc.eqiad.wmflabs as appropriate rather than the legacy labsdb service names. --BryanDavis (talk) 19:22, 9 May 2018 (UTC)Reply
Hi BryanDavis!
If I login at tools-login.wmflabs.org and start sql de, then the first sql query (with .analytics....) results in an empty set. The second sql query (with labsdb) results in the wanted row as expected.
What am I doing wrong? -- seth (talk) 22:24, 9 May 2018 (UTC)Reply
You're not doing anything wrong -- the situation is a bit confusing. Historically, we used to have 's3.labsdb' hostnames for the database servers. These names then made their way into the `meta_p.wiki` table (that table should contain just 's3', but this is now difficult to change). More recently, the database servers were revamped, including a hostname change from 's3.labsdb' to 's3.analytics.etc'. However, the wiki table still contains the old name, which cannot be used as-is anymore...
So either we have to change the slice column in wiki table (which might break tools that split the shard on '.labsdb', as well as people who use the query which was on the documentation page originally), or we have to add a new column that just contains the 's3' identifier. phab:T186675 might be a good moment to also fix this issue.
I have now rewritten the example query to no longer be dependent on a specific postfix -- instead, the query just splits on the first '.' and only uses the first component. valhallasw (Merlijn van Deen) (talk) 09:05, 10 May 2018 (UTC)Reply

Access to user database from paws or quarry

Magnus Manske has published a database called "s52680__science_source_p", see http://magnusmanske.de/wordpress/?p=559. I can get a hold on the database with

   user@tools-bastion-03:~$ mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.eqiad.wmflabs
   MariaDB [(none)]> USE s52680__science_source_p;
   MariaDB [s52680__science_source_p]> SHOW TABLES;

But how do I get access to that database with Quarry or PAWS? In Quarry, the following command

   USE s52680__science_source_p;

results in "Access denied for user '<user>'@'%' to database 's52680__science_source_p'". In PAWS I tried

   conn = pymysql.connect(
       host="tools.db.svc.eqiad.wmflabs",
       user=os.environ['MYSQL_USERNAME'],
       password=os.environ['MYSQL_PASSWORD'],
       database='s52680__science_source_p',
       charset='utf8'
   )

but get 'OperationalError: (1044, "Access denied for user '<user>'@'%' to database 's52680__science_source_p'")'. --Fnielsen (talk) 21:34, 2 August 2018 (UTC)Reply

The answer for Quarry is to wait for task T151158 to be implemented. Currently there is no mechanism to change database servers from the Wiki Replica server to the ToolsDB server.
For Paws, the feature request task is task T188406. --BryanDavis (talk) 23:11, 2 August 2018 (UTC)Reply

Toolforge dumping sqldump from enwiki_p to toolsDB custom space

Hi! Is this the right place to talk about Toolforge? I am new to toolforge and had some questions regarding accessing the replica databases(especially enwiki) from the toolsDB server. My objective right now is to get the ipblocks and revisions table in enwiki database over to my custom tools database space where i can play with the data and create custom tables as i want. But i am not able to access the enwiki database from my tool's custom toolsDB server(understandably enwiki is on different shard/server). Is there a way i can bring in data from enwiki to my tool's DB space? I assume this must have already been used by other users as well trying to work with Wikipedia metadata.

I am trying playing around with the mysql command as follows to generate a dump-

mysqldump -u s53882 -p --single-transaction --quick --lock-tables=false enwiki_p ipblocks > ipblocks.sql

After running this command i am prompted for password, and after entering the password i get the following message- mysqldump: Got error: 1049: "Unknown database 'enwiki_p'" when selecting the database

Please let me know if I should use something other than enwiki_p here, I am running this command from my tools space on toolforge tools.blocklog@tools-bastion-02:~$ pwd /data/project/blocklog

Thanks in Advance! Arnab777as3uj (talk) 01:19, 16 November 2018 (UTC)Reply

local vs tools?

In one place, an example is given of sql local, and in another, sql tools. As far as I can see, these are actually the same host. Is there some reason they're called out differently in different places? RoySmith (talk) 16:00, 8 December 2019 (UTC)Reply

I can't really say where the various aliases for tools.db.svc.eqiad.wmflabs came from in the sql convenience wrapper for the mysql cli tool. I can confirm that "local", "tools", "tools-db", and "toolsdb" are all equivalent aliases. We should probably pick one and make all of the docs here on wikitech use it. My vote would be for "tools" rather than the more ambiguous "local". --BryanDavis (talk) 17:39, 8 December 2019 (UTC)Reply

Connecting to the database replicas from your own computer

Example of doing this from a script

# Create tunnel 
ssh -N -f -M -S /tmp/file-sock -L 4711:meta.analytics.db.svc.wikimedia.cloud:3306 login.toolforge.org

# Run SQL query
mysql --defaults-file=replica.my.cnf --host=127.0.0.1 --port=4711 < allwikis.sql > allwikis.txt

# Kill tunnel
ssh -S /tmp/file-sock -O exit login.toolforge.org

Contents of allwikis.sql

use meta_p;
SELECT * FROM wiki;

It dumps a list of all wikis. For the above you only need to have passwordless ssh setup at which point one can fully automate SQL queries from scripts and programs running on your own computer.

-- Green Cardamom (talk) 20:42, 16 February 2024 (UTC)Reply

I think this is basically what is documented at Help:Toolforge/Database#SSH_tunneling_for_local_testing_which_makes_use_of_Wiki_Replica_databases -- BryanDavis (talk) 22:50, 16 February 2024 (UTC)Reply

What's the _p in enwiki_p mean?

I think I heard somewhere that it stands for public and means it's been sanitized of private data, but wanted to double check. I plan to add a sentence about it to this article once I find out the answer. Thanks. Novem Linguae (talk) 12:21, 11 August 2024 (UTC)Reply