Changeprop
changeprop (or Change Propagation) is the name given to a service that processes change events generated by MediaWiki and stored in Kafka. Various actions are taken based on the messages read from Kafka. Common actions take the form of HTTP requests or CDN purges.
What it does
- Changeprop uses Kafka to ensure guaranteed delivery. We use the Apache Kafka message broker to attain at least once delivery semantics: once an event is in Kafka, we can be sure that it will be processed and an event will follow. This allows us to build very long and complex sequences of dependencies without fear of loss of events.
- Automatic retries with exponential delays, large job deduplication, and persistent error tracking via a dedicated error topic in Kafka
- The config system allows us to add simple update rules with only a few lines of YAML and without code changes or deploys
- Fine-granted monitoring dashboard allows us to track rates and delays for individual topics, rates of event production and much more. Changeprop graphs can occasionally be used to discover bugs in other parts of the infrastructure around it.
How it works
Changeprop reads events from Kafka. The topics changeprop reads from are defined in config.yaml
- the dc_name
variable is a prefix to the topic defined on a per-rule basis. So for example in eqiad for the mw_purge rule which uses the resource_change topic, the full topic will be eqiad.resource_change
. Each rule specifies the topic to which it subscribes.
Rules
Rules define a list of cases to which a rule is to respond. General rule properties allow the definition of things like retries, delays and other features.
The "match" section of a rule dictates a pattern to match, which can include URL matching and tag matching (for example, mw_purge events also contain "tags":["purge"]
and will only match if the URL pattern and the URL matches the pattern specified). URL match patterns are frequently used to target specific sites (for example have a rule only apply to Wiktionary) or classes of article. Matches can also be fine tuned to not match using not_match. If the match it satisfied, the exec section is executed. The exec will generally be a HTTP request of a defined method to the specified URI. A rule can have multiple match and corresponding exec sections in its cases list - if a pattern is created where matches are mutually exclusive, a rule can act as a switch statement using the same topic and the same semantics but different matches.
Headers and other parameters can be defined for an exec section - see the existing rules for details.
Service interactions
Changeprop talks to Redis to manage rate limiting and exclusion lists for problematic or high-traffic articles. All communication is done via Nutcracker. In Kubernetes, a local Nutcracker sidecar container runs within the changeprop pod, proxying access to a list of redis servers.
Many of changeprop's operations are accomplished by sending HTTP requests to RESTBase.
Where it runs
Changeprop currently runs in Kubernetes in codfw and eqiad. There is also an instance in the staging cluster that does not process prod traffic. In labs, changeprop runs in regular Docker on deployment-changeprop-1.deployment-prep.eqiad1.wikimedia.cloud.
Adding features
Adding a new rule
- Add the rule to
deployment-charts/charts/changeprop/templates/_config.yaml
- Bump the
Chart.yaml
version - Commit, get review and merge
- Deploy
changeprop
andchangeprop-jobqueue
from the deployment host using Kubernetes/Deployments#Code_deployment/configuration_changes
Deploying
To Kubernetes
Changeprop uses the Kubernetes/Deployments workflow to deploy changes.
To deployment-prep
In the Beta Cluster, Changeprop runs in Docker on deployment-changeprop-1.deployment-prep.eqiad1.wikimedia.cloud. The configuration passed to changeprop is generated by scripts in the deployment-charts
repository, in order to use the same templates and avoid deviation. This means that if you want to change the configuration in beta/deployment-prep, you will first need to edit the configuration in deployment-charts
. The values for deployment-prep are stored in the values-beta.yaml
file.
Generating the configuration
In deployment-charts
, cd to charts/changeprop
and ./make_beta_config.py
. The output from this command will be the configuration to be deployed.
For example, to generate the changeprop configuration from your localhost:
cd /home/somepath/deployment-charts/charts/changeprop && ./make_beta_config.py . changeprop'
To generate the jobqueue configuration:
cd /home/somepath/deployment-charts/charts/changeprop && ./make_beta_config.py . jobqueue'
Deploying the configuration
The configuration is in config.yaml in a docker volume on deployment-changeprop-1.deployment-prep.eqiad1.wikimedia.cloud and deployment-docker-cpjobqueue01.deployment-prep.eqiad.wmflabs, named changeprop
and cpjobqueue
respectively. Configuration needs to be edited within this volume. The host directory can be discovered using `docker volume inspect`.
Ensure that the config is world readable when copying in a new file. Then run service changeprop restart
to load the configuration. Files other than config.yaml in this volume will be ignored.
For example, to generate and deploy the changeprop configuration from your localhost:
cd /home/somepath/deployment-charts/charts/changeprop && ./make_beta_config.py . changeprop' \ | \ ssh deployment-changeprop-1.deployment-prep.eqiad1.wikimedia.cloud \ sudo sh -xc \''cat > $(docker volume inspect changeprop -f {{.Mountpoint}})/config.yaml && systemctl restart changeprop'\'
To generate and deploy the cpjobqueue configuration:
cd /home/somepath/deployment-charts/charts/changeprop && ./make_beta_config.py . jobqueue | \ ssh deployment-changeprop-1.deployment-prep.eqiad1.wikimedia.cloud \ sudo sh -xc \''cat > $(docker volume inspect cpjobqueue -f {{.Mountpoint}})/config.yaml && systemctl restart cpjobqueue'\'
Ideally the docker volume would have been pre-created with a fixed host path.
Testing
changeprop can be tested by issuing events to Kafka that changeprop will consume. An example test command against the resource_change topic for the k8s staging cluster is: cat mw_purge_example.json | kafkacat -b localhost:9092 -p 0 -t 'staging.resource_change'
.
All IDs in these examples are random UUIDs. Not varying UUID between tests runs the risk of being seen as a duplicate event and being skipped. The "dt"
field should also be changed to be close to the current time and date, as changeprop will not take action on older events.
mw_purge
{"$schema":"/resource_change/1.0.0","meta":{"dt": "2020-04-02T17:16:25Z", "uri":"https://en.wikipedia.org/wiki/Draft:Editta_Braun","id":"22350141-bbe2-488d-9f73-a1aa6094ac5c","domain":"en.wikipedia.org","stream":"resource_change"},"tags":["purge"]}
null_edit
{"$schema":"/resource_change/1.0.0","meta":{"uri":"https://fr.wikipedia.org/wiki/Oribiky","id":"b92d40b0-3206-469d9615-2fbf61a04418","dt":"2020-04-02T17:16:28Z","domain":"fr.wikipedia.org","stream":"resource_change"},"tags":["null_edit"]}
How to monitor it
There is a Grafana dashboard for Changeprop. The various graphs provide information about things such as rule execution rate and rule backlogs for each rule for various streams.
Rule backlog is the time between the creation of event and the beginning of processing. If the backlog grows over time - change propagation can't keep up with the event rate and either concurrency should be increased, or some other action taken. Backlogs can have occasional spikes, but steady backlog growth is a clear indication of a problem.
Debugging
Querying configuration
Changeprop's configuration can be queried if you have access to deploy1001:
- ssh to the deploy server for the datacenter
- cd to the appropriate directory (for example /srv/deployment-charts/helmfile.d/services/staging/changeprop)
- run
kube_env changeprop $CLUSTER
to set up your Kubernetes environment - show the configuration via
kubectl describe configmap changeprop-staging-base-config
The suffixes nutcracker-config
and metrics-config
are also available as configmaps.
Non-issues
Periodically Changeprop will log a message along the lines of the following:
{"name":"change-propagation","hostname":"changeprop-staging-684b9ddbd-4wdkn","pid":141,"level":"ERROR","err":{"message":"Local: Broker transport failure","name":"changeprop-staging","stack":"Error: Local: Broker transport failure\n at Function.createLibrdkafkaError [as create] (/srv/service/node_modules/node-rdkafka/lib/error.js:334:10)\n at /srv/service/node_modules/node-rdkafka/lib/kafka-consumer.js:448:29","code":-195,"errno":-195,"origin":"kafka","rule_name":"page_create","executor":"RuleExecutor","levelPath":"error/consumer"},"msg":"Local: Broker transport failure","time":"2020-04-29T13:10:17.443Z","v":0}
This can be ignored as long as the occurrences aren't too close together (currently they happen roughly once every hour in staging), they will not interrupt normal operation of changeprop.
Where it lives
- Changeprop's code can be cloned from Gerrit at
ssh://gerrit.wikimedia.org:29418/mediawiki/services/change-propagation
. It can be browsed in Phabricator. - Changeprop is deployed to Kubernetes as a Helm chart. It lives in the deployment-charts repo.
- The example config.yaml file contains many illuminating examples of how rules are matched and processed.
- There is a per-environment Helmfile values file which overrides the defaults configured in the Helm chart's values file. This is the file for staging values, there are corresponding production files in the per-DC directories.
See also
- Changeprop emerged out of the older and now decommissioned EventBus system. This page is largely out of date and does not represent the current system. A more modern overview of the Event systems currently in use can be seen on Event*
- mw:Requests for comment/Requirements for change propagation (T102476) - RFC that describes the different approaches being explored in the development of Changeprop