Help talk:Toolforge/Database
remote databases
grep '^192\.168\.99\.' /etc/hosts > labsdb.hosts
seems broken. I guess everything is 10.* now.
For my usecase I only need enwiki.labsdb, that points to 10.64.4.11
So you can connect via mysql --defaults-file="${HOME}"/replica.my.cnf -h 10.64.4.11 enwiki_p
text table
Is this the right place to mention that many (all?) of the wikis replicated in labsdb instances don't have the text table populated but use a separate storage mechanism for revision text? It would have saved me several hours hacking at database code that was never going to work. GoldenRing (talk) 13:38, 9 April 2015 (UTC)
- Someone else asked at the parent talk page so expectations vary but are never fulfilled :-).
- I don't know where I would put that information, and I am generally unhappy with the current documentation structure as it feels too wordy to me and the multiple pages/collapsed sections unuseful because they hide information if you search for it with Google Chrome. But I can't unlearn what I've got to know over the past years to make an educated decision about that.
- So: Hey, it's a wiki! Be bold, etc. --Tim Landscheidt (talk) 22:29, 9 April 2015 (UTC)
Access to multiple databases
Hi, I need to query multiple lang wikipedias. How can I do this without creating multiple connections?? --Kanzat (talk) 08:35, 2 May 2015 (UTC)
- You can access the databases with the corresponding prefixes, for example:
SELECT p1.page_id,
p2.page_id
FROM enwiki_p.page AS p1
JOIN dewiki_p.page AS p2 USING (page_namespace,
page_title)
WHERE p1.page_namespace = 0
AND p1.page_title = 'Hamburg';
PHP
So, if a tool gets mysql_connect(): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock'
, what is it doing wrong? Context: phabricator:T107618. --Nemo 20:57, 31 July 2015 (UTC)
Static MediaWiki
I'm thinking of trying to get MediaWiki to work with the Tool Labs replicas (read-only). I basically want to set up a filter that performs textual modifications to the pages before they are served - specifically, I want to add (approximate) Pinyin readings to Chinese text to help language learners. If anyone tries to edit a page, I will provide an explanation and a link to the source wiki. Has anyone done this sort of thing before/can anyone think of any snags I might run up against? Would this be considered a valid use of Tool Labs servers?
I have already noticed, as User:GoldenRing mentions, that the text tables are not present in the replicas. Can anyone tell me where they are?
If necessary, I will make a fork of MediaWiki that supports this sort of use so it will be easy to do in future.
--Spacemartin (talk) 14:16, 18 November 2015 (UTC)
- In the WMF cluster, the article contents are not stored in the database, but on separate servers. These are neither replicated to Labs nor can they be directly accessed otherwise. The most efficient way is probably RESTBase.
- For your purposes, I would suggest to develop either a JavaScript gadget that users could individually enable on wiki or a MediaWiki extension. The former option is much easier to do and deploy than the latter. --Tim Landscheidt (talk) 04:48, 19 November 2015 (UTC)
- Too late! Pinyin Wiki :-) --Spacemartin (talk) 19:50, 20 November 2015 (UTC)
No external access to database by Python
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'enwiki.labsdb:3306' ([Errno 11001] getaddrinfo failed)")
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2005, "Unknown MySQL server host 'ruwiki.labsdb' (0)")
I get this errors when try run a scripts from my computer (tried modules pymysql and sqlalchemy). On the server it works without problem. Script like:
import pymysql.cursors
connection = pymysql.connect(host='enwiki.labsdb:3306',
user=u,
password=pw,
db='enwiki_p',
charset='utf8',
cursorclass=pymysql.cursors.DictCursor)
Also I tried open SSH-connect from the script, but it not help. (Schema: open ssh, test ok, try access to db - get the errors, close ssh.). --Vladis13 (talk) 04:51, 23 September 2016 (UTC)
- The MySQL servers are not open to the outside world; they are only accessible from inside the labs network. If you want to connect from your local computer, see Help:Tool_Labs/Database#Steps_to_setup_SSH_tunneling_for_testing_a_tool_labs_application_which_makes_use_of_tool_labs_databases_on_your_own_computer on tunneling over SSH. valhallasw (Merlijn van Deen) (talk) 06:20, 23 September 2016 (UTC)
- I found problem. 1) In Windows no the tool "ssh" about talk the Help. Instead can use the PuTTY (I did add this in Help.). 2) In module pymysql need set port like:
pymysql.connect(host='127.0.0.1', port=4711
, but notpymysql.connect(host='127.0.0.1:4711'
. --Vladis13 (talk) 07:15, 23 September 2016 (UTC) - SQLAlchemy.py not works on server. It requires module MySQLdb.py which not installed there. --Vladis13 (talk) 07:58, 23 September 2016 (UTC)
- I found problem. 1) In Windows no the tool "ssh" about talk the Help. Instead can use the PuTTY (I did add this in Help.). 2) In module pymysql need set port like:
PostGIS
Is it possible to have a tool that needs PostGIS (extension of PostgreSQL). I would like to make a tool that does spatial queries for requested images for Commons. --Tobias47n9e (talk) 12:43, 3 January 2017 (UTC)
- Tobias47n9e I would suggest opening a Phabricator task with the request and we can discuss the various options. --BryanDavis (talk) 15:59, 3 January 2017 (UTC)
- @BryanDavis: What category would you suggest on Phabricator? I see that the postgis package is in the apt-cache. But I am not sure if it is installed. Can a tool have access to a Postgresql database?
service postgresql status
says that it is an unrecognized service. --Tobias47n9e (talk) 16:21, 3 January 2017 (UTC)- @Tobias47n9e: Tag the phab task with #tool-labs and also #DBA. I don't know that we do have a postgres db at the moment in Tool Labs. I kind of doubt that we will provision postgres just for this, but maybe we can find another way to help you build your tool. --BryanDavis (talk) 17:01, 3 January 2017 (UTC)
- @BryanDavis: Here is the issue: https://phabricator.wikimedia.org/T154497 --Tobias47n9e (talk) 17:29, 3 January 2017 (UTC)
- @Tobias47n9e: Tag the phab task with #tool-labs and also #DBA. I don't know that we do have a postgres db at the moment in Tool Labs. I kind of doubt that we will provision postgres just for this, but maybe we can find another way to help you build your tool. --BryanDavis (talk) 17:01, 3 January 2017 (UTC)
- @BryanDavis: What category would you suggest on Phabricator? I see that the postgis package is in the apt-cache. But I am not sure if it is installed. Can a tool have access to a Postgresql database?
EXPLAIN
The inability to use EXPLAIN seriously sucks. I imagine that's some view-related limitation, but surely there is some way to allow safe execution of explain queries... --tgr (talk) 08:53, 15 October 2017 (UTC)
- The documentation on using the database is split a bit between two pages, this one and Help:MySQL_queries. This one focuses more on the Toolforge-specific parts, while the latter is relevant labs-wide. Well, that's the theory anyway. The latter page has a section on EXPLAINing queries, Help:MySQL_queries#Optimizing_queries, with a link to an on-line tool that helps to run the EXPLAIN queries (basically by running the query and using the show-me-the-query-plan-of-a-running-query function). valhallasw (Merlijn van Deen) (talk) 09:34, 15 October 2017 (UTC)
- I made some edits to Help:MySQL_queries#Optimizing_queries that will hopefully be helpful. --BryanDavis (talk) 23:06, 15 October 2017 (UTC)
- As far as allowing EXPLAIN, it's an upstream bug. MySQL would never fix it because of some secret requirement from their commercial side, see phab:T50875 for details. I don't know if anyone ever asked the MariaDB people if they'd fix it. Anomie (talk) 13:45, 16 October 2017 (UTC)
Identifying lag
right now the example query is
SELECT lag FROM heartbeat_p.heartbeat JOIN meta_p.wiki ON CONCAT(shard, '.analytics.db.svc.eqiad.wmflabs') = slice WHERE dbname = 'fawiki';
shouldn't it be
SELECT lag FROM heartbeat_p.heartbeat JOIN meta_p.wiki ON CONCAT(shard, '.labsdb') = slice WHERE dbname = 'fawiki';
? -- Seth (talk) 17:33, 9 May 2018 (UTC)
- The
*.labsdb
service names are deprecated. They are literally DNS CNAME pointers to the*.analytics.db.svc.eqiad.wmflabs
service names which are actively maintained. New code and examples should use*.analytics.db.svc.eqiad.wmflabs
and/or*.web.db.svc.eqiad.wmflabs
as appropriate rather than the legacy labsdb service names. --BryanDavis (talk) 19:22, 9 May 2018 (UTC)
- Hi BryanDavis!
- If I login at tools-login.wmflabs.org and start sql de, then the first sql query (with .analytics....) results in an empty set. The second sql query (with labsdb) results in the wanted row as expected.
- What am I doing wrong? -- seth (talk) 22:24, 9 May 2018 (UTC)
- You're not doing anything wrong -- the situation is a bit confusing. Historically, we used to have 's3.labsdb' hostnames for the database servers. These names then made their way into the `meta_p.wiki` table (that table should contain just 's3', but this is now difficult to change). More recently, the database servers were revamped, including a hostname change from 's3.labsdb' to 's3.analytics.etc'. However, the wiki table still contains the old name, which cannot be used as-is anymore...
- So either we have to change the slice column in wiki table (which might break tools that split the shard on '.labsdb', as well as people who use the query which was on the documentation page originally), or we have to add a new column that just contains the 's3' identifier. phab:T186675 might be a good moment to also fix this issue.
- I have now rewritten the example query to no longer be dependent on a specific postfix -- instead, the query just splits on the first '.' and only uses the first component. valhallasw (Merlijn van Deen) (talk) 09:05, 10 May 2018 (UTC)
Access to user database from paws or quarry
Magnus Manske has published a database called "s52680__science_source_p", see http://magnusmanske.de/wordpress/?p=559. I can get a hold on the database with
user@tools-bastion-03:~$ mysql --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.eqiad.wmflabs MariaDB [(none)]> USE s52680__science_source_p; MariaDB [s52680__science_source_p]> SHOW TABLES;
But how do I get access to that database with Quarry or PAWS? In Quarry, the following command
USE s52680__science_source_p;
results in "Access denied for user '<user>'@'%' to database 's52680__science_source_p'". In PAWS I tried
conn = pymysql.connect( host="tools.db.svc.eqiad.wmflabs", user=os.environ['MYSQL_USERNAME'], password=os.environ['MYSQL_PASSWORD'], database='s52680__science_source_p', charset='utf8' )
but get 'OperationalError: (1044, "Access denied for user '<user>'@'%' to database 's52680__science_source_p'")'. --Fnielsen (talk) 21:34, 2 August 2018 (UTC)
- The answer for Quarry is to wait for task T151158 to be implemented. Currently there is no mechanism to change database servers from the Wiki Replica server to the ToolsDB server.
- For Paws, the feature request task is task T188406. --BryanDavis (talk) 23:11, 2 August 2018 (UTC)
Toolforge dumping sqldump from enwiki_p to toolsDB custom space
Hi! Is this the right place to talk about Toolforge? I am new to toolforge and had some questions regarding accessing the replica databases(especially enwiki) from the toolsDB server. My objective right now is to get the ipblocks and revisions table in enwiki database over to my custom tools database space where i can play with the data and create custom tables as i want. But i am not able to access the enwiki database from my tool's custom toolsDB server(understandably enwiki is on different shard/server). Is there a way i can bring in data from enwiki to my tool's DB space? I assume this must have already been used by other users as well trying to work with Wikipedia metadata.
I am trying playing around with the mysql command as follows to generate a dump-
mysqldump -u s53882 -p --single-transaction --quick --lock-tables=false enwiki_p ipblocks > ipblocks.sql
After running this command i am prompted for password, and after entering the password i get the following message- mysqldump: Got error: 1049: "Unknown database 'enwiki_p'" when selecting the database
Please let me know if I should use something other than enwiki_p here, I am running this command from my tools space on toolforge tools.blocklog@tools-bastion-02:~$ pwd /data/project/blocklog
Thanks in Advance! Arnab777as3uj (talk) 01:19, 16 November 2018 (UTC)
local vs tools?
In one place, an example is given of sql local, and in another, sql tools. As far as I can see, these are actually the same host. Is there some reason they're called out differently in different places? RoySmith (talk) 16:00, 8 December 2019 (UTC)
- I can't really say where the various aliases for tools.db.svc.eqiad.wmflabs came from in the
sql
convenience wrapper for themysql
cli tool. I can confirm that "local", "tools", "tools-db", and "toolsdb" are all equivalent aliases. We should probably pick one and make all of the docs here on wikitech use it. My vote would be for "tools" rather than the more ambiguous "local". --BryanDavis (talk) 17:39, 8 December 2019 (UTC)
Connecting to the database replicas from your own computer
Example of doing this from a script
# Create tunnel ssh -N -f -M -S /tmp/file-sock -L 4711:meta.analytics.db.svc.wikimedia.cloud:3306 login.toolforge.org # Run SQL query mysql --defaults-file=replica.my.cnf --host=127.0.0.1 --port=4711 < allwikis.sql > allwikis.txt # Kill tunnel ssh -S /tmp/file-sock -O exit login.toolforge.org
Contents of allwikis.sql
use meta_p; SELECT * FROM wiki;
It dumps a list of all wikis. For the above you only need to have passwordless ssh setup at which point one can fully automate SQL queries from scripts and programs running on your own computer.
-- Green Cardamom (talk) 20:42, 16 February 2024 (UTC)
- I think this is basically what is documented at Help:Toolforge/Database#SSH_tunneling_for_local_testing_which_makes_use_of_Wiki_Replica_databases -- BryanDavis (talk) 22:50, 16 February 2024 (UTC)
What's the _p in enwiki_p mean?
I think I heard somewhere that it stands for public and means it's been sanitized of private data, but wanted to double check. I plan to add a sentence about it to this article once I find out the answer. Thanks. Novem Linguae (talk) 12:21, 11 August 2024 (UTC)