Parsoid: Difference between revisions
Line 37: | Line 37: | ||
Parsoid and its configuration are deployed (separately) using git-deploy. Doing deployments with git-deploy is very easy. You run <code>git deploy start</code>, make whichever changes you need to make to the git clone (such as pulling, changing branches, committing live hacks, etc.), then run <code>git deploy sync</code>. The sync command pushes the new state to all backends and restarts them. |
Parsoid and its configuration are deployed (separately) using git-deploy. Doing deployments with git-deploy is very easy. You run <code>git deploy start</code>, make whichever changes you need to make to the git clone (such as pulling, changing branches, committing live hacks, etc.), then run <code>git deploy sync</code>. The sync command pushes the new state to all backends and restarts them. |
||
=== Pre-deploy checks === |
|||
* Perform manual VisualEditor editing tests with non-ASCII content too to catch encoding issues |
|||
=== Deploying the latest version of Parsoid === |
=== Deploying the latest version of Parsoid === |
||
Line 57: | Line 60: | ||
catrope@tin$ git deploy sync |
catrope@tin$ git deploy sync |
||
</pre> |
</pre> |
||
=== Post-deploy checks === |
|||
* Test VE editing on enwiki and non-latin wikis |
|||
=== Misc stuff === |
=== Misc stuff === |
Revision as of 22:15, 4 November 2013
Parsoid is a service that parses converts between wikitext and HTML. The HTML contains additional metadata that allows it to be converted back ("round-tripped") to wikitext. VisualEditor fetches the HTML for a given page from Parsoid, edits it, then delivers the modified HTML to Parsoid, which converts it back to wikitext. Parsoid is a stateless HTTP server running on port 8000.
Monitoring
- Parsoid eqiad cluster in Ganglia, only lists the worker machines. The Varnish hosts are cp1045 and cp1058.
- Nagios has service checks for HTTP on port 8000 on both the individual backends and on the LVS service IP, and on port 80 on cp1045 and cp1058 and their service IP.
- pybal does health checks on all backends every second, and depools boxes that are down as long as the % of depooled boxes does not exceed 50%. To see these health checks and depools/repools happen in real time, run
ssh parsoid.svc.eqiad.wmnet
(this will drop you into either lvs1003 or lvs1006, depending on which is active), thentail -f /var/log/pybal.log | grep parsoid
- pybal also manages the Varnish hosts in the same way; they're at
parsoidcache.svc.eqiad.wmnet
- pybal also manages the Varnish hosts in the same way; they're at
- There is very rudimentary logging in /var/lib/parsoid/nohup.out on each Parsoid node. This log is truncated on each restart.
When something goes wrong
Roan and Gabriel know most about the Parsoid infrastructure. Send them a mail or (if urgent) call if there are issues you can't solve.
Reverting a Parsoid deployment
Code
ssh tin cd /srv/deployment/parsoid/Parsoid git deploy revert # pick the last good deployed version
Config and modules
ssh tin cd /srv/deployment/parsoid/config git deploy revert # pick the last good deployed version
If git deploy revert
fails:
git deploy start git reset --hard <desired changeset> git deploy --force sync
Deploying changes
Parsoid and its configuration are deployed (separately) using git-deploy. Doing deployments with git-deploy is very easy. You run git deploy start
, make whichever changes you need to make to the git clone (such as pulling, changing branches, committing live hacks, etc.), then run git deploy sync
. The sync command pushes the new state to all backends and restarts them.
Pre-deploy checks
- Perform manual VisualEditor editing tests with non-ASCII content too to catch encoding issues
Deploying the latest version of Parsoid
catrope@fenari$ ssh tin catrope@tin$ cd /srv/deployment/parsoid/Parsoid catrope@tin$ git deploy start catrope@tin$ git pull catrope@tin$ git deploy sync
Changing the Parsoid configuration
catrope@fenari$ ssh tin catrope@tin$ cd /srv/deployment/parsoid/config catrope@tin$ git deploy start catrope@tin$ vim localsettings.js [make your changes] catrope@tin$ git commit -a catrope@tin$ git deploy sync
Post-deploy checks
- Test VE editing on enwiki and non-latin wikis
Misc stuff
- Restart parsoid hosts via salt, in batches of 5:
salt -b 5 -G 'deployment_target:parsoid' parsoid.restart_parsoid parsoid
- To abort a deployment after running
git deploy start
but beforegit deploy sync
, rungit deploy abort
. - There is a lock file preventing multiple deployments on the same code base from being active at the same time. If
git deploy start
complains about this lock, you can rungit deploy abort
to make it go away (if you know this isn't a legitimate warning due to someone else actively deploying). - If the sync step complains you didn't change anything, you can run
git deploy --force sync
(note order of arguments!) to make it sync anyway. - To change which hosts are pooled or change their weights, edit
/home/wikipedia/common/docroot/noc/pybal/eqiad/parsoid
as root on fenari
Data flow
Parsoid runs entirely on an internal subnet, so requests to it are proxied through the ve-parsoid API module. This module is implemented in extensions/VisualEditor/ApiVisualEditor.php
and is invoked with a POST request to /w/api.php?action=ve-parsoid
. The API module then sends a request to Parsoid, either GET /$prefix/$pagename
to get the HTML for a page, or POST /$prefix/$pagename
to submit HTML and get wikitext back. Parsoid itself also issues requests to /w/api.php
to get the wikitext of the requested page and to do template expansion.
Once the ve-parsoid API module receives a response from Parsoid, it either relays it back to the client (when requesting HTML), or saves the returned wikitext to the page (when submitting HTML).
(POST /w/api.php?action=ve-parsoid) (GET /en/Barack_Obama?oldid=1234) (requests for page content and template expansions) Client browser ------------------------------------------> API ----------------------------> Parsoid -----------------------------------------------------> API ^ | ^ | ^ | | (response) | | (HTML) | | (responses) | +------------------------------------------------------+ +---------------------------------+ +----------------------------------------------------------+ (POST /w/api.php?action=ve-parsoid) (POST /en/Barack_Obama; oldid=1234) Client browser ------------------------------------------> API ----------------------------> Parsoid | ^ | (save page) | | (wikitext) | | +---------------------------------+ | Database
Caching and load balancing
Parsoid is load balanced using LVS. The assigned service IPs are:
- parsoidcache.svc.eqiad.wmnet = 10.2.2.9 served by lvs1003/lvs1006, backends are cp1045 and cp1058
- parsoid.svc.eqiad.wmnet = 10.2.2.28 served by lvs1003/lvs1006, backends are wtp1001-1024
The parsoidcache LVS balances two front-end Varnishes running on cp1045 / cp1058 (see parsoid-frontend.inc.vcl.erb). Those only hash requests for backends (see parsoid-backend.inc.vcl.erb). Cache misses are then forwarded to LVS in front of the Parsoid backends.
10.2.2.29:80 {cp1045,cp1058}:80 10.2.2.28:8000 wtp10NN:8000 MW API -> LVS -----> Varnish ---------------> LVS ---------------------> Parsoid
All request URLs include the oldid as a query parameter. The Parsoid PHP extension in sends update requests to the front-end LVS IP on edits, template updates and visibility changes. The Parsoid backends perform additional requests with 'Cache-Control: only-if-cached' to the caches and reuse cached HTML to speed up serialization and re-rendering of pages. As an example, expansions of templates, extensions and images are reused after an edit without performing API requests for these. See this document for more detail.