RCStream

From Wikitech
Jump to: navigation, search

This service has been deprecated in favor of EventStreams available at https://stream.wikimedia.org/v2/stream/recentchange

RCStream is a simple server daemon that broadcasts activity ("recent changes") from MediaWiki wikis using the Socket.IO 0.9 protocol. An instance runs at stream.wikimedia.org, broadcasting changes from all public wikis in the Wikimedia production cluster .

API

RCStream provides a simple API for subscribing to RCFeeds of MediaWiki wikis. After connecting you emit a 'subscribe' event, specifying the wikis you wish to subscribe to. This use any of the below formats:

  • a single hostname, such as nl.wikipedia.org.
  • an array of hostnames.
  • hostnames matching a wildcard pattern such as *.wikivoyage.org or nl.*.
  • all wikis, by subscribing to the special topic name *.

You then receive 'change' events whose data is an RCFeed structure containing the type of change, the title of the page, the new revision number, etc.

The Socket.IO server uses the /rc namespace. It also implements an /rcstream_status endpoint that exposes internal state about connected clients and queue size that may help when debugging.

Client

As of January 2015, RCStream implements version 0.9 of the Socket.IO protocol, not 1.0 (phab:T68232). See also socket.io 0.9 and socket.io-client 0.9 on GitHub for more information.

JavaScript

// Requires socket.io-client 0.9.x:
// browser code can load a minified Socket.IO JavaScript library;
// standalone code can install via 'npm install socket.io-client@0.9.1'.

var io = require( 'socket.io-client' );
var socket = io.connect( 'https://stream.wikimedia.org/rc' );

socket.on( 'connect', function () {
     socket.emit( 'subscribe', 'commons.wikimedia.org' );
} );

socket.on( 'change', function ( data ) {
    console.log( data.title );
} );

Python

Install dependencies:
pip install socketIO_client==0.5.6
Get stream of events:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import socketIO_client

class WikiNamespace(socketIO_client.BaseNamespace):
    def on_change(self, change):
        print('%(user)s edited %(title)s' % change)

    def on_connect(self):
        self.emit('subscribe', 'commons.wikimedia.org')


socketIO = socketIO_client.SocketIO('https://stream.wikimedia.org')
socketIO.define(WikiNamespace, '/rc')

socketIO.wait()

If you needed to convert the structure to XML, you could import dicttoxml and then in on_change() do something like:

        xml = dicttoxml.dicttoxml(change)
        dom = dicttoxml.parseString(xml)
        print dom.toprettyxml()

Other client libraries

Service

RCStream runs on a set of backend servers (currently, rcs100x; puppet node; puppet role). Backend nodes: rcs1001, rcs1002.

The backend servers run instances of RCStream, with an nginx reverse proxy on each server.

An LVS load balancer (stream-lb) is situated in front of the backend servers, exposed as stream.wikimedia.org.

The Beta cluster has a simplified setup on a single VM instance running the rcstream role, exposed as http://stream.wmflabs.org.

See also

Source code: rcstream