Machine Learning/LiftWing/Streams

From Wikitech

If you want to call a specific Lift Wing model server every time that an event is posted to an event stream in Kafka, we suggest to use our ChangeProp rules defined for Lift Wing. ChangeProp can be configured to listen to a Kafka topics and call Lift Wing to generate a score. In turn, Lift Wing can be configured to post an event containing the score to EventGate, that will finally enqueue it to a Kafka topic.

As of December 2023, we have two model servers configured using ChangeProp. These servers publish events from Lift Wing to Event Gate:

model server source event stream output event stream
revscoring-drafttopic mediawiki.revision-create -> mediawiki.page_change.v1(schema) mediawiki.revision_score_drafttopic (schema)
outlink-topic-model mediawiki.page_change.v1 mediawiki.page_outlink_topic_prediction_change.v1 (schema)

The requirements for you are the following:

  • A model server needs to be deployed to Lift Wing, and it must have passed basic sanity checks from the ML team (namely it needs to be able to sustain a decent traffic level without crashing etc..).
  • Decide what is the source event stream. For example, ORES has always been configured to score every rev-id registered in mediawiki.revision-create but you may need a different source.
  • Decide if you need to filter or not the traffic in the stream. For example, let's say that your model in Lift Wing supports only enwiki and itwiki etc.. You can specify this in the task to the ML team (more on it later on).
  • Decide the schema of the event that will be generated by Lift Wing a posted to EventGate. For example, all the ORES scores use the mediawiki.revision-score schema. We also have the mediawiki.page_prediction_classification_change schema to represent a classification model output (topic, revert, quality, etc.). If you need a different one, you'll have to work with Data Engineering to create and deploy it. Please inform also the ML team in case so we'll need to add the necessary code to your model server to support the use case.
  • Your new event stream will contain the events generated by a specific model server enqueued in a Kafka topic. We have some conventions about stream naming, and a mediawiki-config deployment is needed to declare the stream in stream configuration. (When using eventgate-main, it will also need a deployment.) We'll follow up with you in the task about this don't worry!

After reading the above, you can create a task to the ML team with what you have decided, we'll take it from there and work with you to implement the new stream!

Streams (Admins only, Machine Learning team)

Once a task has been created with the above information, we need to do the following steps:

Staging configuration and testing

    liftwing:
      uri: 'https://inference-staging.svc.codfw.wmnet:30443'
      models:
        goodfaith:
          concurrency: 2
          match_config:
            database: '/^(en|zh)wiki$/'
          namespace: revscoring-editquality-goodfaith
          kafka_topic: 'liftwing.test-events'

The Change-Prop staging config is a little different from production, since we don't want to have a continuous stream of events to evaluate, but just few ones to check that the whole pipeline works. In this case:

  • The kafka topic to use to listen to for any event is liftwing.test-events in the Kafka Main eqiad cluster (the Change-prop's staging config is configured only with the eqiad cluster as of now, Feb 2023). This topic should mimic, in this case, mediawiki.revision-create, so we can send to it any revision-create events and use them to test Change-prop. The main benefit from this setting is that we don't cause any other Change-prop's config/rule to be triggered (since many of the workflows use mediawiki.revision-create as well) but only the lift wing ones.
  • Find the Kafka topic that represents your source of data. For example, in our case we have mediawiki.revision-create, and the correspondent topic is eqiad.mediawiki.revision-create (ask to an SRE to help you).
  • From a stat100x node, collect an event from mediawiki.revision-create to a file called test.json using kafkacat:
    • kafkacat -t eqiad.mediawiki.revision-create -b kafka-main1001.eqiad.wmnet:9093 -X security.protocol=ssl -X ssl.ca.location=/etc/ssl/certs/wmf-ca-certificates.crt -o latest -c 1 > test.json
  • Verify that the match_config rule specified in the configuration highlighted above works. In the above example we are saying that we are matching a field in the source event called "database" with the regex that follows.
  • Then send the event to staging.liftwing.test-events (from a stat100x node): cat test.json | kafkacat -P -t staging.liftwing.test-events -b kafka-main1001.eqiad.wmnet:9093 -X security.protocol=ssl -X ssl.ca.location=/etc/ssl/certs/wmf-ca-certificates.crt
  • You should now see in the model-server's logs on Lift Wing staging an access log entry with the new request logged.
  • Last but not the least, verify on the target Kafka topic that the event has been posted correctly (you can use kafkacat -C as described above from a stat100x node).
  • If you get a validation error from EventGate, check logstash for more info.

Current settings to publish events from Lift Wing staging to EventGate:

model server source Kafka topic target Kafka topic
revscoring-drafttopic liftwing.test-events mediawiki.revision-score-test
outlink-topic-model liftwing.test-outlink-events mediawiki.page_prediction_change.rc0 (T349919)