Jump to content

etherpad.wikimedia.org

From Wikitech

https://etherpad.wikimedia.org

Note: etherpads are 100% public and open. Anyone can read them. "Obscure names" are never as obscure as you think and are NOT secure. Also the etherpad database is not suitable for any long-term storage — don't expect important data to stay there.

Hardware

Running on etherpad1004, a VM on ganeti01.svc.eqiad.wmnet cluster.

The failover machine is etherpad2002, also a VM in the codfw ganeti cluster.

About

We built our own package dependent on our own nodejs packages. Everything is puppetized.

The database that it uses is on ... just look this up in the puppet site manifest. Cluster m1 as of this writing.

The app runs on port 9000 and requests are reverse proxied by envoy which also terminates SSL.

The EtherpadLite extension (not currently used) allows embedding it into wiki pages.

Database layout

Etherpad-lite has decided to implement a key/value store on top of a RDBMS for some reason. Well it is an abstraction layer so they can work with other backends as well but it seems like the recommended option is an RDDBS (MySQL).

http://etherpad.org/doc/v1.2.1/#index_database_structure seems to be the official documentation (version dependent obviously)

Deleting pads via site admin

To request a deletion, file a security task on Phabricator.

A variety of ways exist (some are not available/do not work):

  1. Deletion through admin and a plugin(we do not have admin and users on purpose for now) so this is ruled out
  2. Deletion through the API https://github.com/ether/etherpad-lite/wiki/HTTP-API (suggested method):
  1. Login to the etherpad host, at the moment, etherpad1004.eqiad.wmnet
  2. Search the API key created on etherpad first start, found on /var/lib/etherpad-lite/APIKEY.txt
  3. Call the deletion api:
curl 'localhost:9001/api/1/deletePad?apikey=<api key gotten from the previous step>&padID=<pad name as used on the URI>'
  1. If everthings is ok, it should respond with {"code":0,"message":"ok","data":null}
  1. Deletion through the CLI https://github.com/ether/etherpad-lite/wiki/Getting-to-know-the-tools-in-bin. Supposedly this should work but it doesn't
  2. Deletion through the DB (this seems to be the only alternative viable option to the API)

Suppose DELETEME is the pad id of the pad you want to remove (pad id can be taken from the url)

 delete from store where `key` like '%DELETEME%'; 

Note that I had good luck deleting pad content via the below, which tosses revisions, chats, and I don't know exactly what the pad2readonly bit is. This is a lot faster than the %DELETEME% query above, now that the db is so bloated.

 delete from store where `key` like 'pad:DELETEME%';
 delete from store where `key` like 'pad2readonly:DELETEME%';

Restoring a pad to a previous revision

Since version 1.2.13 or API >=1.2.11 (we are on 1.2.15 as of 20220325), it is possible to use a "restoreRevision(padId, rev)" function. This will create a new revision that is like a previous revision including the author data. If for some reason a pad gets corrupted or vandalized it's possible to go back. Example:

curl 'localhost:9001/api/1.2.15/restoreRevision?apikey=<API KEY>&padID=<pad name>&rev=<old revision id>'

Get the <API KEY> from /var/lib/etherpad-lite/APIKEY.txt, replace <pad name> with the name of the pad and <old revision id> with the rev ID number you want to restore from. You can see those as part of the URL when you use the timeslider to go back in history.

If it worked the API will just respond "ok" and you will have a new revision with data from the old revision you restored from.

How to list all pads

Two different plugins existed at the time of investigation, one was not installing correctly, one was not of any decent quality

MediaWiki extension

Yes, don't we want to use that and embed in a wiki?

Extension:EtherpadLite

Converting etherpad content into wikitext

  • Small Python script to convert Etherpads into wiki pages - please help turn this into a Toolforge tool!

Maintenance work

Building new debs whenever there are new releases/security patches is the main one here. However since this uses MariaDB misc, also have a look at MariaDB/misc

Upgrading Etherpad version

Etherpad is installed as a Debian package called "etherpad-lite". The puppet role for Etherpad simply installs this package from our own APT repository.

First, identify which server is the current Etherpad server by looking at manifests/site.pp in the operations/puppet git repository. (as of 2024-03 this is etherpad1004.eqiad.wmnet).

On that server run sudo apt-cache policy etherpad-lite or dpkg -l | grep etherpad to identify the currently installed version.

Normally, debian packages are built on the "build" servers and require no access to the internet (which is why our build servers sport internal IPs). In fact our tool of use (pbuilder), even goes into lengths to ensure that the build environment doesn't have internet access. See https://github.com/wikimedia/puppet/tree/production/modules/package_builder#networking for an explanation of the mechanism behind this.

Build node

Of course, with etherpad fetching npm modules during the build time, the above won't work. Instead of using the standard build host, we 've been using a WMCS machine for this. so, ssh into packager-etherpad01.packaging.eqiad1.wikimedia.cloud.

Step #1, fetch stuff

The next step is to clone [1].

  git clone https://gerrit.wikimedia.org/r/operations/debs/etherpad-lite
  curl https://codeload.github.com/ether/etherpad-lite/tar.gz/1.8.6 -o 1.8.6.tar.gz
  cd etherpad-lite
  git checkout upstream
  git checkout master
  gbp import-orig ../1.8.6.tar.gz

Step #2, refresh patches

First, push all quilt patches

  QUILT_PATCHES=debian/patches quilt push -a

If you are lucky, they will all apply cleanly. Proceed to step "Bump debian/changelog" then. Otherwise, you 'll need to mess a bit with those

If they did not apply cleanly, you 'll need to refresh them. The push command above will have probably push some but stopped right before pushing the problematic patch. At that point what you need is:

  • To force push the patch first to get as many hunks applied as possible
 QUILT_PATCHES=debian/patches quilt push -f
  • To then figure out which hunks of the patch did not get applied (quilt will tell you). Then apply manually with an editor and save
  • Then refresh the patch
  QUILT_PATCHES=debian/patches quilt refresh
  • Repeat the above until all patches have been applied.

Step #4, bump debian changelog

Next, run dch -i to edit the debian/changelog. Your name and email address should have been added automatically. Edit the version string in the first line. For example if this is the first package for version 1.8.5 then set it to "(1.8.5-1)". Replace UNRELEASED with the actual release name, for example 'buster'. Edit the commit message below with something like "Bump to version 1.8.5". Write/quit the spawned editor to save changes. 'git commit the change locally.

Step #5, build

First, make sure you 've popped all quilt patches

  QUILT_PATCHES=debian/patches quilt pop -a

Create a file .pbuilderrc in your home dir. Add the following content to it (SECURITY_UPDATES=no is needed to make sure webproxy is not used on WMCS, see T316421#9518250) :

  USENETWORK=yes
  BUILD_HOME=$BUILDDIR
  SECURITY_UPDATES=no

Then issue the gbp buildpackage command

  gbp buildpackage --git-pbuilder --git-no-pbuilder-autoconf --git-dist=bullseye --git-arch=amd64

You might get an error talking about uncommitted changes in the source tree. In that case make sure you git committed your change to the changes file above and you might also have to "git add .pc", a dot directory created by the quilt patches command.

In case of "aborting due to unexpected upstream changes" during the build check the difference of the upstream files and local files. If the diff looks fine (like newlines) delete ../etherpad-lite_1.8.6.orig.tar.gz and let gbp recreate the .tar.gz on the next gbp buildpackage run. If there were any actual local changes, they would need to be fixed on the quilt step by updating the 2 patches under debian/patches or introducing a new patch.

Step #6, copy files to APT server

Assuming everything when according to plan, you should have a package in /var/cache/pbuilder/result/bullseye-amd64/ now. tar all the files (source, deb, .dsc, .changes in a simple .tar.gz file). Now that the package built successfully, copy (via scp -3 perhaps) the resulting files over to the current APT repo server. Identify the correct server by looking for aptrepo_server in ./hieradata/common.yaml in the operations/puppet repo. example: scp -3 packager02.packaging.eqiad1.wikimedia.cloud:etherpad.tar.gz apt1001.wikimedia.org

Step #7, import package into APT repo

On the APT repo host, use reprepro to import the package by pointing at the file ending in .changes.

    export REPREPRO_BASE_DIR=/srv/wikimedia
    export GNUPGHOME=/root/.gnupg
    sudo -E reprepro -C main include buster-wikimedia etherpad-lite_1.8.6-1.1_amd64.changes

See the reprepro page for more details on how to setup a basedir and GNUPG home to make that work.

Run sudo -E reprepro ls etherpad-lite to confirm the new version has been imported.

Step #8, install new package version on Etherpad host

Switch to the etherpad host itself and run sudo apt-get update and sudo apt-get install etherpad-lite (optionally you can first add -n to simulate an install without actually doing it).

Confirm things are still working. Done.

See also