Nova Resource:Wikidata-dev/Documentation

From Wikitech
Jump to navigation Jump to search

Wikidata-dev/Documentation

Description

Wikidata aims to create a free knowledge base about the world that can be read and edited by humans and machines alike. It will provide data in all the languages of the Wikimedia projects, and allow for the central access to data in a similar vein as Wikimedia Commons does for multimedia files. Wikidata is proposed as a new Wikimedia hosted and maintained project.



Further information about Wikidata can be found here on meta: m:Wikidata. General installation instructions are here on mw:Wikibase/Installation.

This site is a documentation of the Wikidata team's test/dev instances.


List of servers

Instance Name Proxy Purpose Remarks
wikidata-lexeme https://wikidata-lexeme.wmflabs.org former test system for WikibaseLexeme, now closed instance will be deleted on , see T197583
wikidata-constraints https://wikidata-constraints.wmflabs.org test system for WikibaseQualityConstraints manually administered
wikidata-mobile https://wikidata-termbox.wmflabs.org, https://docker-ui.wmflabs.org Wikidata Mobile Termbox prototype and also Wikibase Docker UI prototype?
federated-wikis https://items-repo.wmflabs.org unknown, presumably to test federation in combination with federated-wikis2
federated-wikis2 https://props-repo.wmflabs.org unknown, presumably to test federation in combination with federated-wikis
wikibase-vue https://wikibase-vue.wmflabs.org unknown, presumably to experiment with using Vue for Wikibase componentswikiba.se website Apache serves wikiba.se from /srv/se/wikiba/output, alternative version with Vue seems to be under /srv/wikibase-vue
wikibase none unknown Apache serves wikiba.se from /srv/se/wikiba/output, but no proxy configured?
wikibase-stretch https://wikibase.wmflabs.org unknown, presumably to test MediaWiki+Wikibasewikiba.se website on Debian Stretch Apache serves wikiba.se from /srv/se/wikiba/output
mediawiki-mcr https://mw-mcr.wmflabs.org unknown, presumably to test Multi-Content Revisions no /var/www, /srv empty, neither apache2 nor nginx installed?

Use the Server Admin Log! You can fill in information via IRC #wikimedia-cloud connect by posting a line in the following format: !log <project> <message>, e.g. !log wikidata-dev Message. I used to add the instance, too, like so: !log wikidata-dev wikidata-dev-9 Something went wrong.

Currently installed stack

  • 12.04.2 LTS (GNU/Linux 3.2.0-37-virtual x86_64)
  • latest MediaWiki from gerrit
  • PHP 5.3.10-1ubuntu3.4+wmf1
  • mysql Ver 14.14 Distrib 5.5.29

Installation instructions: Wikidata on Labs

Install Wikidata from Puppet

Our puppet files on labs are more or less ready to use. They are made in a way so that you have two different labs instances/servers, one running a Wikidata repo and the other one the Wikidata client. To get a client that is configured to use the repo, you have to get a repo first and give some details to puppet for the config of the client.

How to install a repo

  • Add a new instance on Special:NovaInstance. It will take a while to finish building. You can verify the status here: http://nagios.wmflabs.org/icinga/ (extended search -> hostgroup wikidata-dev -> here are our servers) When everything is nice and green, log in to the instance (ssh -A from bastion). Run puppetd -tv. Run aptitude update && aptitude upgrade. Probably the server would like to rebooted.
  • On Special:NovaInstance in the right column of your new instance click "configure". Here is what puppet has to offer. Scroll down and select "puppetmaster::self". Save the setting.
  • Run puppetd -tv again on the machine. The directory /var/lib/git/operations/puppet will now be populated from gerrit. (/etc/puppet is symlinked here.)
  • In labsconsole, go back to the list of instances and click "configure" again. In the upper part, our project's files are available. Select one of the following setups:
    • role::wikidata-repo-latest::labs not working due to bug 44129
    • role::wikidata-repo::labs

For the repo you just have to fill in two of the input fields underneath the roles:

    • Set wikidata_experimental to true of false (without any quotation marks) and save the page. This setting defaults to true.
    • For the propagation of changes to a client wiki give the IP address of the client. (This can be the internal IP address 10.4.x.x.) If you haven't created the client yet, you can fill this in later and run puppet again to modify the setting.
  • In the shell run puppetd -tv. Mediawiki/the Wikidata repo will now be installed. This takes a while.
  • In many cases, the populateSitesTable script fails to execute on first run. Until this is fixed, you have to re-run puppetd -tv again after the first run has finished.
  • Edit the file /srv/mediawiki/orig/LocalSettings.php (respectively ${install_path}/orig/LocalSettings.php) to give the correct URL in $wgServer!
  • You will need the following information to set up a client:
    • the IP address of the instance
    • The URL of you wiki
  • Create the client (see below).
  • Come back here to your shell and allow the client to connect to the repo's mysql database.
Features and limitations
  • It contains our test data.
  • Until now, every repo wiki thinks, it's an enwiki.
  • Sometimes the creation of a main page outside the main namespace fails. You can login to the wiki, delete the main page manually and run cd /srv/mediawiki && php maintenance/importDump.php wikidata-repo-mainpage.xml.
  • puppet will create a cronjob for the www-data user that runs the dispatchChanges.php script on the repo every ten minutes for almost ten minutes (to look at it: sudo -u www-data crontab -l). Modifications of the cronjob will be overwritten by puppet.
Manual config
  • For a publicly editable wiki (like "test"), create our account demo – test: On your shell run php /srv/mediawiki/maintenance/createAndPromote.php demo test. For a dev instance for the Wikidata team replace demo/test with our secret login/password. ;)
  • Copy the customized Sidebar from the bottom of this page to Mediawiki:Sidebar
  • For maintenance you can copy Silke's scripts to the instance. They are on github.
  • Put your e-mail address into the crontabs (crontab -e): MAILTO="blah@wikimedia.de", otherwise you might spam others.

How to install a client

  • Add a new instance on Special:NovaInstance. It will take a while to finish building. You can verify the status here: http://nagios.wmflabs.org/cgi-bin/nagios3/status.cgi?hostgroup=wikidata-dev&style=detail When everything is nice and green, log in to the instance (ssh -A from bastion). Run puppetd -tv. Run aptitude update && aptitude upgrade. Probably the server would like to rebooted.
  • On Special:NovaInstance in the right column of your new instance click "configure". Here is what puppet has to offer. Scroll down and select "puppetmaster::self". Save the setting.
  • Run puppetd -tv again on the machine. The directory /var/lib/git/operations/puppet will now be populated from gerrit. (/etc/puppet is symlinked here.)
  • In labsconsole, go back to the list of instances and click "configure" again. In the upper part, our project's files are available. Select one of the following setups:
    • role::wikidata-client-latest::labs not working due to bug 44129
    • role::wikidata-client::labs
  • Input fiels "wikidata_experimental": Enter true or false (without quotation marks)
  • Input field "wikidata_repo_ip": Put the repo's IP address here.
  • Input field "wikidata_repo_url": The repo's URL in a form that omits the protocol, e.g. //wikidata-dev-repo.wikimedia.de
  • Save the page.
  • In the shell run puppetd -tv. Mediawiki/the Wikidata repo will now be installed. This takes a while.
Features and limitations
  • It's connected to a given repo via puppet.
  • It should contain test data as well.
  • It does not (yet) have all extensions that are installed in the real WP.
  • Until now, every wiki thinks, it's an enwiki.
  • Experimental features are enabled by default.
Manual config
  • For maintenance you can copy Silke's scripts to the instance. They are on github.
  • Put your e-mail address into the crontabs (crontab -e): MAILTO="blah@wikimedia.de", otherwise you might spam others.

Additional Manuel Setup Information

SiteMatrix extension

If you install a Wikidata instance via puppet, the following is already done.

To install the SiteMatrix extension, you also need to clone the operations/mediawiki-config repository and can put it in /var/www/mediawiki-config.

In the LocalSettings.php file, add the following:

require_once( "$IP/extensions/SiteMatrix/SiteMatrix.php" );
$wgSiteMatrixFile = "$IP/../../mediawiki-config/langlist";
$wgSiteMatrixClosedSites = "$IP/../../mediawiki-config/closed.dblist";
$wgSiteMatrixPrivateSites = "$IP/../../mediawiki-config/private.dblist";
$wgSiteMatrixFishbowlSites = "$IP/../../mediawiki-config/fishbowl.dblist";

Apache rewrite rules for pretty URLs

These rewrite rules make a Wikidata repo have URLs equivalent to Wikipedia URLs. This is not implemented in the puppet installation.

Here is what your /etc/apache2/sites-enabled/foo should contain:

<VirtualHost *:80>
        ServerName wikidata-dev-repo.wikimedia.de
        ServerAdmin webmaster@localhost
        ServerAlias *.wikidata-dev-repo.wikimedia.de

        # for compliace with Wikipedia URLs
        RewriteEngine On
        RewriteCond %{HTTP_HOST} ^(de|en|he|hu)\.wikidata-dev-repo\.wikimedia\.de$
        RewriteRule ^/(wiki|title)/(.*)$ http://wikidata-dev-repo.wikimedia.de/title/%1wiki:$2 [R,QSA]
        RewriteRule ^/title/(de|en|he|hu)wiki:(.*)$ /index.php?title=Special:ItemByTitle/$1wiki/$2 [L,QSA]

        DocumentRoot /var/www/wikidata-dev-repo.wikimedia.de/w/
        <Directory />
                Options FollowSymLinks
                AllowOverride None
        </Directory>
        # MediaWiki!
        Alias /w /var/www/wikidata-dev-repo.wikimedia.de/w
        Alias /wiki /var/www/wikidata-dev-repo.wikimedia.de/w/index.php

        ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
        <Directory "/usr/lib/cgi-bin">
                AllowOverride None
                Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
                Order allow,deny
                Allow from all
        </Directory>

        ErrorLog /var/log/apache2/error.log

        # Possible values include: debug, info, notice, warn, error, crit,
        # alert, emerg.
        LogLevel warn

        CustomLog /var/log/apache2/access.log combined

</VirtualHost>

Maintenance

How to prepare demo and review time on internal dev system

On Wednesday before demo time, we make sure the internal demo is reset to our test data. Log in to "dev" (currently wikidata-dev-9).

  • announce a code freeze
  • stop the cronjobs that pull from git
  • run /usr/local/bin/prepare-demo-time.sh (which deletes and reimports test data, pulls MW and all extensions from git, runs update.php and rebuildLocalisationCache.php on repo and client)
  • restart the cronjobs after demo time!

How to update all installed extensions on an instance

In Silke's github, there is a script update-extensions.sh that you can copy to /usr/local/bin. It pulls all installed extensions from git (always from the master branch). Use the path to the extensions directory as an argument, e.g.

update-extensions.sh /srv/devrepo/w/extensions

How to update the public demo

On Wednesday after demo time or first thing on Thursday morning, we update the public demo with our latest code. You have to do this on the Labs instances wikidata-testrepo and wikidata-testclient. These are instances managed by puppet. Right now, you cannot update MediaWiki core by running git pull (shallow clone problem). This is why you have to delete the wiki and have puppet recreate it like so:

cp /srv/mediawiki/orig/LocalSettings.php /tmp/ # containing the correct $wgServer and permissions settings
mysql -u root -p
drop database repo; (resp. drop database client)
rm -rf /srv/mediawiki
cd /var/lib/git/operations/puppet
git reset --hard # abandon manual changes in git, they should be merged
GIT_SSH=/var/lib/git/ssh git pull # update the puppetmaster::self with what was merged in gerrit during the last week
puppetd -tv # run this command twice
cp /tmp/LocalSettings.php /srv/mediawiki/orig/
php /srv/mediawiki/maintenance/createAndPromote.php demo test # create demo account
php /srv/mediawiki/maintenance/createAndPromote.php --sysop <login> <password> # create a privileged account for the team

Our weekly tags are based on the date (YYYY-MM-DD). See here: https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/Wikibase.git;a=tags

cd /srv/mediawiki/extensions/Wikibase
git checkout 2012-11-07 (or whatever the tag is)
cd ../Diff
git checkout 2012-11-07 (or whatever the tag is)
cd ../DataValues
git checkout 2012-11-07 (or whatever the tag is)
# go back to the install path:
cd ../..
# check if new sql tables have to be added
php maintenance/update.php --quick
php maintenance/rebuildLocalisationCache.php
service memcached restart

In puppet, the import of the main page tends to fail. Go to http://wikidata-test-repo.wikimedia.de/wiki/Testwiki:Main_Page, log in as a sysop and delete the main page manually. Run

cd /srv/mediawiki && php maintenance/importDump.php wikidata-repo-mainpage.xml
php maintenance/rebuildrecentchanges.php

How to tag a commit with a date

To mark a given moment, e.g. right after demo time, we use git tags. To tag the last commit, log in to dev-2.

  • Not everyone has got permissions to do this!
cd /var/www/wikidata-test-repo.wm.blabla/w/extensions/Wikibase
git log

Copy the last commit's ID.

Change to your local machine where you have cloned the code. Execute

git tag -a 2012-07-05 -m "Version deployed on July 5th 2012" [commitID]
git push origin 2012-07-05

To push all local tags use:

git push --tags

How to fill a repo with test data

  • If the instance already contained data, execute the delete all data script. (${install_path} on dev is /srv/devrepo/w, on test /srv/medawiki)
cd ${install_path}/extensions/Wikibase/lib/maintenance
php deleteAllData.php
  • If there are no data, execute the import script:
cd ${install_path}/extensions/Wikibase/repo/maintenance
php importInterlang.php --verbose --ignore-errors simple simple-elements.csv
php importProperties.php --verbose en en-elements-properties.csv

Run the maintenance/update.php script!

Customized Sidebar

* navigation
** mainpage|mainpage-description
** recentchanges-url|recentchanges
** Special:ItemByTitle/enwiki/Helium|Example data entry
** Special:CreateItem|Create a new item
** Special:NewProperty|Create a new property
* SEARCH
* TOOLBOX
* LANGUAGES

Upcoming modifications: Solr

I started to work on Solr/WikibaseSolr/Solarium. This is not working, yet. Here is the status:

Solr on dev repo

  • is disabled because the import of properties fails. Probably the xml schema has to be changed.
  • On dev, the Solarium library is not installed via the respective MediaWiki extension but I used composer to install it.
  • to enable it this is what you do on wikidata-dev-9:
    • open a screen!
    • cd /opt/apache-solr-3.6.2/example
    • java -jar start.jar > /tmp/solrtest.log
    • detach from the screen session (find a better way once this is working properly)
    • edit /srv/devrepo/w/LocalSettings.php and comment in the following lines:
// for WikibaseSolr
$wgWBSSolariumAutoloader = "/srv/devrepo/w/extensions/WikibaseSolr/vendor/solarium/solarium/library/Solarium/Autoloader.php";
require_once( "$IP/extensions/WikibaseSolr/WikibaseSolr.php" );
require_once( "$IP/extensions/WikibaseSolr/includes/SpecialSolrTest.php" );
$wgSpecialPages['SolrTest'] = 'SpecialSolrTest';
$wgWBStores['solrstore'] = 'SolrStore';
$wgWBSettings['defaultStore'] = 'solrstore';

Solr in puppet on test repo

I prepared the integration into puppet. The last state is here: https://gerrit.wikimedia.org/r/#/c/52043/

What has to be done:

  • Make WikibaseSolr know about mw:Extension:Solarium. Talk to Max Semenik! Using the Solarium extension is the way we want to use in puppet.

Howto maintain our config in puppet

General intro and FAQ: Help:Self-hosted puppetmaster

Development

  • The current puppet sandbox instances are wikidata-repotest and wikidata-client-test. They are configured to talk to each other on the level of mysql. See Help:Self-hosted puppetmaster for the setup of a fresh instance. It can make sense to test on a fresh one before submitting code to gerrit. ;)
  • These instances don't have public IPs, they can be accesses through the instance proxy like so: <instance-name>.instance-proxy.wmflabs.org/wiki, e.g. wikidata-repotest.instance-proxy.wmflabs.org. Note that you have to explicitly configure $wgServer = 'http://instance-name.instance-proxy.wmflabs.org'; for it to work. You do this in /srv/mediawiki/orig/LocalSettings.php.
  • What we develop here is not used in production, it's just for our test environment.
  • On those servers, our puppet module is in /var/lib/git/operations/puppet/modules/wikidata_singlenode. You can find thre folders: manifests, templates and files with the following content (March 14th, 2013):
.
├── files <- static files
│   ├── notitle.php <- notitle magic word, file is used by notitle extension
│   ├── simple-elements.xml <- test data: chemical elements dump for clients
│   ├── StartProfiler.php <- config file for Profiling
│   ├── wikidata.cnf <- custom mysql my.cnf to allow instances within Labs to access each other's databases, needed for LoadBalancer
│   ├── wikidata-favicon.ico <- our favicon
│   ├── Wikidata-logo-democlient.png <- our logo for the client
│   ├── Wikidata-logo-demo.png <- our logo for the apache document root
│   ├── Wikidata-logo-demorepo.png <- our logo for the repo
│   ├── wikidata-move-mainpage <- config where to move the default main page before importing the custom one
│   ├── wikidata-replication.logrotate <- logrotate config for the dispatcher script on the repo
│   ├── wikidata-repo-mainpage.xml <- the test repo's main page with explanations about Wikidata
│   └── wikidata-runJobs.logrotate <- logrotate config for runJobs on the client
├── manifests
│   └── init.pp <- our central puppet file
└── templates <- templates that are modified from time to time
    ├── wikidata-client-requires.php <- additional LocalSettings.php lines for Wikibase clients
    └── wikidata-repo-requires.php <- additional LocalSettings.php lines for Wikibase repos
  • In general, the files that you have to touch are manifests/init.pp and either templates/wikidata-repo-requires.php or templates/wikidata-client-requires.php.
  • If you want to modify the general mediawiki installation, that's in /var/lib/git/operations/puppet/modules/mediawiki_singlenode.
  • To test your modifications run puppetd -tv on the instance. You might want to remove the database and the /srv/mediawiki folder to have puppet install a new wiki from the beginning.

Submission workflow

  • git clone the puppet repo from gerrit: ssh://username@gerrit.wikimedia.org:29418/operations/puppet.git on your local machine.
  • All files and templates you submit should have a puppet disclaimer in the header saying that they cannot be modified manually or else they will be overwritten by puppet.
  • Copy the modifications you made and tested on the sandbox instances to you your computer and commit them from that directory.
  • Invite Andrew Bogott (WMF) as a reviewer.

Wikidata Code Documentation

The Wikidata source code documentation is available at http://wikidata-docs.wikimedia.de . It covers the PHP and JavaScript code.

In order to generate the code documentation, checkout Mediawiki and all the Wikibase extensions.

Install Doxygen. The config file used for Wikidata can be found here. Save the config file (wikidata-doxygen) on your system and adjust the following variable definitions to match your setup: OUTPUT_DIRECTORY, INPUT, STRIP_FROM_PATH, STRIP_FROM_INC_PATH , WARN_LOGFILE. Optionally also adjust PROJECT_NAME and PROJECT_BRIEF.

In order to generate the JavaScript code documentation, the JS files need to be handled as Java files due to the lack of JS support in Doxygen. This can be done using this shell script.

Generate the PHP and JS code documentation by executing:

doxygen ${path_doxygen_config_file}/wikidata-doxygen

Due to renaming .js to .java the documentation will now show them as Java files. To clean up that documentation and to also clean up local paths, run this shell script (adjust local paths to be replaced before).

Schedule this process in the cron tab on a regular basis.

The code documentation is now ready to be served via the web server. Here is what your /etc/apache2/sites-enabled/docs should contain. Replace ${OUTPUT_DIRECTORY} with the output directory defined in the Doxygen config file.

<VirtualHost *:80>
        ServerName wikidata-docs.wikimedia.de
        ServerAdmin webmaster@localhost
        ServerAlias *.wikidata-docs.wikimedia.de

        DirectoryIndex index.htm index.html index.php

        DocumentRoot ${OUTPUT_DIRECTORY}
        <Directory />
                Options +Indexes FollowSymLinks
                AllowOverride All
                Order deny,allow
                Allow from all
        </Directory>

        ErrorLog /var/log/apache2/docs-error.log

        LogLevel warn

        CustomLog /var/log/apache2/docs-access.log combined
        ServerSignature Off

</VirtualHost>