XmlRcs
XmlRcs is a transforming proxy for Event Platform/EventStreams (Wikimedia's recent changes feed) that exposes data as XML instead of JSON, using a lightweight TCP connection with a few simple commands. It runs as a volunteer maintained service in the Wm-bot Cloud VPS project.
Rationale
Wikimedia has had the IRC feed for a long time. While there are numerous problems with it (e.g. the complex data format with IRC color codes, wikitext notation, and embedding of localised interface messages to encode data), the underlying communication protocol (IRC) is relatively easy to implement in any programming language.
This IRC feed, however, has been deprecated and replaced with RCStream, which was again deprecated and replaced with EventStreams, which is supposed to be more stable platform that should make it easy for programmers to retrieve events from Wikimedia sites in real-time. While it may be a better platform in many ways, it does add complexity to the stack. It adds a dependency on third party technologies, such as WebSockets and JSON. While JSON is an easy data format to decode, WebSockets is quite new and lacking good implementations for popular programming languages and frameworks (such as .Net or Qt). In JavaScript or Python, RCStream's WebSocket can be used directly, but it's hard for developers working in lower level languages like C or C++.
XmlRcs intends to solve this problem. It introduces a simple and lightweight TCP protocol, using XML packets to encode the event data.
How it works
This service is another layer behind the WebSockets server. It's implemented as a python daemon that converts the WebSockets and JSON into raw data and put them in Redis, which are then retrieved using a C++ daemon that acts as a server to which clients can connect and subscribe to for various feeds.
The daemon is listening by default on port 8822 (TCP) and running on server wm-bot.wm-bot.wmcloud.org, example usage:
telnet wm-bot.wm-bot.wmcloud.org 8822 Trying 208.80.155.196... Connected to wm-bot.wm-bot.wmcloud.org. Escape character is '^]'. S en.wikipedia.org <ok></ok> <edit wiki="enwiki" server_name="en.wikipedia.org" revid="642587049" oldid="625934858" summary="cat" title="Dunbar Douglas, 4th Earl of Selkirk" namespace="0" user="Brendandh" bot="False" patrolled="False" minor="False" type="edit" length_new="4485" length_old="4446" timestamp="1421317382"></edit> <edit wiki="enwiki" server_name="en.wikipedia.org" revid="642587048" oldid="638351579" summary="Added source and Explanation of how JMB past papers were used to examine present grade inlfation in the British education system." title="Joint Matriculation Board" namespace="0" user="85.3.139.236" bot="False" patrolled="False" minor="False" type="edit" length_new="4990" length_old="4735" timestamp="1421317382"></edit> <edit wiki="enwiki" server_name="en.wikipedia.org" revid="642587050" oldid="631962647" summary="Added charts section." title="Pacifica (The Presets album)" namespace="0" user="Ss112" bot="False" patrolled="False" minor="False" type="edit" length_new="7697" length_old="6946" timestamp="1421317382"></edit> exit Connection closed by foreign host.
As you can see, you only need to connect to port 8822 using TCP and subscribe using simple commands, the output is XML nodes that contain the information about edits.
Commands
Every command is a plain text terminated with a new line
S
Subscribe to a feed, syntax: S <hostname of wiki>
Example: S en.wikipedia.org
You can use magic word "all" to subscribe to all wikis
Response: "<ok></ok>" on success, "<error>reason</error>" on error
D
Remove a subscription, syntax D <hostname of wiki>
Example: D en.wikipedia.org
Using magic word "all" will remove subscription to "all wikis" but in case you were subscribed to other wikis as well, these subscriptions will stay.
Response: "<ok></ok>" on success, "<error>reason</error>" on error
clear
Removes all subscription
Response: "<ok></ok>" on success, "<error>reason</error>" on error
stat
Display various system information
ping
Check if connection is alive,
Response: "<pong></pong>"
exit
Close the connection
Important: you are supposed to send raw text "pong" in case you receive XML node "ping" if you fail to do that, you may be randomly disconnected
Output
In this moment daemon responds always in XML. Each XML node is only on 1 line - terminated by a newline.
error
Example:
meh <error>Unknown: meh</error>
Non-critical error message
fatal
Example:
<fatal>Redis server is down</fatal>
Critical error which implies that XmlRcs daemon became defunct, this error should be very rare
warning
Example:
<warning>restarting daemon</warning>
Warning message informing clients about server event
ok
Example
S this.is.a.test <ok>S this.is.a.test</ok>
ping
Example
<ping></ping>
Daemon sends randomly these messages to verify if client is still online, if you fail to reply with
pong
you may get disconnected within a minute (note: the reply doesn't need to be "pong" the last_response time gets reset on any input)
edit
Information about wiki edit, example
<edit wiki="wikidatawiki" server_name="www.wikidata.org" revid="188428371" oldid="188099357" summary="/* wbcreateclaim-create:1| */ Property:P361: Q18770801" title="Q17467648" namespace="0" user="RobotMichiel1972" bot="True" patrolled="True" minor="False" type="edit" length_new="5168" length_old="4758" timestamp="1421402947"></edit>
- wiki: name of a wiki as a shortcut (enwiki)
- server_name: fqdn of server (en.wikipedia.org)
- revid: revision id (54635262)
- oldid: previous revision id (5635323)
- summary: summary of edit
- title: name of page
- user: name of user
- bot
- patrolled
- minor
- type: type of edit (edit is regular edit, new is a newpage)
- length_new: size of new edit
- length_old: size of old edit
- timestamp
Maintainer info
Whole thing is living on instance xmlrcs2.huggle.eqiad1.wikimedia.cloud. It consists of 3 components, which always need to be started in this order:
- redis server (started by init.d)
- xmlrcsd (server daemon for XmlRcs, systems service: xmlrcsd)
- EventStream to redis daemon (systemd service: es2r)
$ ssh wm-bot.wm-bot.wmcloud.org
$ sudo service redis restart
$ sudo service xmlrcs restart
$ sudo service es2r restart
C# Library
There is a C# library: https://github.com/huggle/XMLRCS/tree/master/clients/c%23/XmlRcs
You can download it from "releases" page (precompiled .dll).
Launching an instance
To launch a new instance of xmlrcs, you will need to compile xmlrcsd to get an executable. To do this cd into <xmlrcs source code directory>/src/xmlrcsd and compile using cmake. (Note: do not use GCC, G++ or clang, these compilers are not supported) You will run cmake . then make. From there you will have an xmlrcsd executable. You then just need to have redis running, start this new executable, as well as <xmlrcs source code dir>/src/es2r/es2r.py