etherpad.wikimedia.org
https://etherpad.wikimedia.org
- Note: etherpads are 100% public and open. Anyone can read them. "Obscure names" are never as obscure as you think and are NOT secure. Also the etherpad database is not suitable for any long-term storage — don't expect important data to stay there.
Hardware
Running on etherpad1004, a VM on ganeti01.svc.eqiad.wmnet cluster.
The failover machine is etherpad2002, also a VM in the codfw ganeti cluster.
About
We built our own package dependent on our own nodejs packages. Everything is puppetized.
The database that it uses is on ... just look this up in the puppet site manifest. Cluster m1 as of this writing.
The app runs on port 9000 and requests are reverse proxied by envoy which also terminates SSL.
The EtherpadLite extension (not currently used) allows embedding it into wiki pages.
Database layout
Etherpad-lite has decided to implement a key/value store on top of a RDBMS for some reason. Well it is an abstraction layer so they can work with other backends as well but it seems like the recommended option is an RDDBS (MySQL).
http://etherpad.org/doc/v1.2.1/#index_database_structure seems to be the official documentation (version dependent obviously)
Deleting pads via site admin
- To request a deletion, file a security task on Phabricator.
A variety of ways exist (some are not available/do not work):
- Deletion through admin and a plugin(we do not have admin and users on purpose for now) so this is ruled out
- Deletion through the API https://github.com/ether/etherpad-lite/wiki/HTTP-API (suggested method):
- Login to the etherpad host, at the moment,
etherpad1004.eqiad.wmnet
- Search the API key created on etherpad first start, found on
/var/lib/etherpad-lite/APIKEY.txt
- Call the deletion api:
- Login to the etherpad host, at the moment,
curl 'localhost:9001/api/1/deletePad?apikey=<api key gotten from the previous step>&padID=<pad name as used on the URI>'
- If everthings is ok, it should respond with
{"code":0,"message":"ok","data":null}
- If everthings is ok, it should respond with
- Deletion through the CLI https://github.com/ether/etherpad-lite/wiki/Getting-to-know-the-tools-in-bin. Supposedly this should work but it doesn't
- Deletion through the DB (this seems to be the only alternative viable option to the API)
Suppose DELETEME is the pad id of the pad you want to remove (pad id can be taken from the url)
delete from store where `key` like '%DELETEME%';
Note that I had good luck deleting pad content via the below, which tosses revisions, chats, and I don't know exactly what the pad2readonly bit is. This is a lot faster than the %DELETEME% query above, now that the db is so bloated.
delete from store where `key` like 'pad:DELETEME%'; delete from store where `key` like 'pad2readonly:DELETEME%';
Restoring a pad to a previous revision
Since version 1.2.13 or API >=1.2.11 (we are on 1.2.15 as of 20220325), it is possible to use a "restoreRevision(padId, rev)" function. This will create a new revision that is like a previous revision including the author data. If for some reason a pad gets corrupted or vandalized it's possible to go back. Example:
curl 'localhost:9001/api/1.2.15/restoreRevision?apikey=<API KEY>&padID=<pad name>&rev=<old revision id>'
Get the <API KEY> from /var/lib/etherpad-lite/APIKEY.txt
, replace <pad name> with the name of the pad and <old revision id> with the rev ID number you want to restore from. You can see those as part of the URL when you use the timeslider to go back in history.
If it worked the API will just respond "ok" and you will have a new revision with data from the old revision you restored from.
How to list all pads
Two different plugins existed at the time of investigation, one was not installing correctly, one was not of any decent quality
MediaWiki extension
Yes, don't we want to use that and embed in a wiki?
Converting etherpad content into wikitext
- Small Python script to convert Etherpads into wiki pages - please help turn this into a Toolforge tool!
Maintenance work
Building new debs whenever there are new releases/security patches is the main one here. However since this uses MariaDB misc, also have a look at MariaDB/misc
Upgrading Etherpad version
Etherpad is installed as a Debian package called "etherpad-lite". The puppet role for Etherpad simply installs this package from our own APT repository.
First, identify which server is the current Etherpad server by looking at manifests/site.pp in the operations/puppet git repository. (as of 2024-03 this is etherpad1004.eqiad.wmnet).
On that server run sudo apt-cache policy etherpad-lite or dpkg -l | grep etherpad to identify the currently installed version.
Normally, debian packages are built on the "build" servers and require no access to the internet (which is why our build servers sport internal IPs). In fact our tool of use (pbuilder), even goes into lengths to ensure that the build environment doesn't have internet access. See https://github.com/wikimedia/puppet/tree/production/modules/package_builder#networking for an explanation of the mechanism behind this.
Build node
Of course, with etherpad fetching npm modules during the build time, the above won't work. Instead of using the standard build host, we 've been using a WMCS machine for this. so, ssh into packager-etherpad01.packaging.eqiad1.wikimedia.cloud.
Step #1, fetch stuff
The next step is to clone [1].
git clone https://gerrit.wikimedia.org/r/operations/debs/etherpad-lite curl https://codeload.github.com/ether/etherpad-lite/tar.gz/1.8.6 -o 1.8.6.tar.gz cd etherpad-lite git checkout upstream git checkout master gbp import-orig ../1.8.6.tar.gz
Step #2, refresh patches
First, push all quilt patches
QUILT_PATCHES=debian/patches quilt push -a
If you are lucky, they will all apply cleanly. Proceed to step "Bump debian/changelog" then. Otherwise, you 'll need to mess a bit with those
If they did not apply cleanly, you 'll need to refresh them. The push command above will have probably push some but stopped right before pushing the problematic patch. At that point what you need is:
- To force push the patch first to get as many hunks applied as possible
QUILT_PATCHES=debian/patches quilt push -f
- To then figure out which hunks of the patch did not get applied (quilt will tell you). Then apply manually with an editor and save
- Then refresh the patch
QUILT_PATCHES=debian/patches quilt refresh
- Repeat the above until all patches have been applied.
Step #4, bump debian changelog
Next, run dch -i to edit the debian/changelog. Your name and email address should have been added automatically. Edit the version string in the first line. For example if this is the first package for version 1.8.5 then set it to "(1.8.5-1)". Replace UNRELEASED with the actual release name, for example 'buster'. Edit the commit message below with something like "Bump to version 1.8.5". Write/quit the spawned editor to save changes. 'git commit the change locally.
Step #5, build
First, make sure you 've popped all quilt patches
QUILT_PATCHES=debian/patches quilt pop -a
Create a file .pbuilderrc in your home dir. Add the following content to it (SECURITY_UPDATES=no
is needed to make sure webproxy is not used on WMCS, see T316421#9518250) :
USENETWORK=yes BUILD_HOME=$BUILDDIR SECURITY_UPDATES=no
Then issue the gbp buildpackage command
gbp buildpackage --git-pbuilder --git-no-pbuilder-autoconf --git-dist=bullseye --git-arch=amd64
You might get an error talking about uncommitted changes in the source tree. In that case make sure you git committed your change to the changes file above and you might also have to "git add .pc", a dot directory created by the quilt patches command.
In case of "aborting due to unexpected upstream changes" during the build check the difference of the upstream files and local files. If the diff looks fine (like newlines) delete ../etherpad-lite_1.8.6.orig.tar.gz
and let gbp recreate the .tar.gz on the next gbp buildpackage run. If there were any actual local changes, they would need to be fixed on the quilt step by updating the 2 patches under debian/patches or introducing a new patch.
Step #6, copy files to APT server
Assuming everything when according to plan, you should have a package in /var/cache/pbuilder/result/bullseye-amd64/
now. tar all the files (source, deb, .dsc, .changes in a simple .tar.gz file). Now that the package built successfully, copy (via scp -3 perhaps) the resulting files over to the current APT repo server. Identify the correct server by looking for aptrepo_server in ./hieradata/common.yaml in the operations/puppet repo. example: scp -3 packager02.packaging.eqiad1.wikimedia.cloud:etherpad.tar.gz apt1001.wikimedia.org
Step #7, import package into APT repo
On the APT repo host, use reprepro to import the package by pointing at the file ending in .changes.
export REPREPRO_BASE_DIR=/srv/wikimedia export GNUPGHOME=/root/.gnupg sudo -E reprepro -C main include buster-wikimedia etherpad-lite_1.8.6-1.1_amd64.changes
See the reprepro page for more details on how to setup a basedir and GNUPG home to make that work.
Run sudo -E reprepro ls etherpad-lite to confirm the new version has been imported.
Step #8, install new package version on Etherpad host
Switch to the etherpad host itself and run sudo apt-get update and sudo apt-get install etherpad-lite (optionally you can first add -n to simulate an install without actually doing it).
Confirm things are still working. Done.
See also
- http://etherpad.wikimedia.org
- m:Etherpad
- mw:Etherpad Lite
- D'Angelo, Gabriele; Iorio, Angelo Di; Zacchiroli, Stefano (2018-11-03). "Spacetime Characterization of Real-Time Collaborative Editing". Proceedings of the ACM on Human-Computer Interaction. 2. ("We [...] studied the full editing histories of about 14 000 textual documents (or pads, in EtherPad terminology) from http://etherpad.wikimedia.org/, which is one of the most popular public instances of Etherpad, hosted by the Wikimedia Foundation.")