Jump to content

Data Platform/Systems/AQS

From Wikitech
(Redirected from Aqs)

The Analytics Query Service (AQS) is a set of services that provide public-facing APIs that serve analytics data from both a Cassandra and a Druid backend. It is how Wikistats (stats.wikimedia.org) gets data to visualize.

Services

Device Analytics
Repository: generated-data-platform/aqs/device-analytics
API specification: swagger.json
Data source: Cassandra
Geo Analytics
Repository: generated-data-platform/aqs/geo-analytics
API specification: swagger.json
Data source: Cassandra
Media Analytics
Repository: generated-data-platform/aqs/media-analytics
API specification: swagger.json
Data source: Cassandra
Page Analytics
Repository: generated-data-platform/aqs/page-analytics
API specification: swagger.json
Data source: Cassandra
Edit Analytics
Repository: generated-data-platform/aqs/edit-analytics
API specification: swagger.json
Data source: Druid
Editor Analytics
Repository: generated-data-platform/aqs/editor-analytics
API specification: swagger.json
Data source: Druid
Commons Impact Metrics
Repository: generated-data-platform/aqs/commons-impact-analytics (GitLab)
API specification: swagger.json
Data source: Cassandra
Common functionality
AQS Assist: generated-data-platform/aqs/aqsassist
service-lib-golang: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/servicelib-golang
Test environments
Cassandra test env: generated-data-platform/aqs/aqs-docker-cassandra-test-env
Druid test env: generated-data-platform/aqs/aqs-docker-druid-test-env
QA
QA test suite: generated-data-platform/aqs/aqs_tests
User documentation site
Repository: generated-data-platform/aqs/analytics-api
Live link: doc.wikimedia.org/analytics-api

Monitoring

Grafana dashboards:

Project overview

Epic task in Phabricator | API Platform Team workboard | AQS2.0 workboard

  1. Yes Done Implement the new, stand-alone AQS service(s)
  2. Yes Done Deploy to k8s
  3. Yes Done Switch RESTBase to proxying requests from the old AQS service, to the new k8s-based one
    1. Phase 1: Unique devices endpoint (Device Analytics service)
    2. Phase 2: Pageviews and legacy endpoints (Page Analytics service), editors/by-country endpoint (Geo Analytics service), and media requests endpoints (Media Analytics service)
    3. Phase 3: Edited pages, edits, and bytes difference endpoints (Edit Analytics service) and editors and registered users endpoints (Editor Analytics service)
  4. Underway In progress Deprecate the http://{project}/api/rest_v1/metrics resources
  5. Eventually phase out the RESTBase /metrics hierarchy

Communicating user-facing changes

When making a user-facing change to an AQS API:

  1. Consult the stability policy for versioning guidance.
  2. Update the service's API spec and the documentation site.
  3. Update the combined changelog (source).
  4. Send an announcement to the analytics mailing list.

Running a service

AQS 2.0 consists of several repositories. Some correspond to individual services that expose APIs. Others correspond to cross-service common functionality or test environments.

Setup

You will need:

Go (aka "golang") is an opinionated language in various ways. Among these is that you're probably much better off keeping your Go code under your "GOPATH" rather than wherever you may be used to keeping code. (There are, of course, always ways for savvy developers to cheat the system. If you choose to do that, any consequences are on you.) On my Mac, I cloned all the AQS 2.0 repositories under ~/go/src/.

Start a service

The various service README files contain details about running that particular service. But the summary is that you'll need to open several command line (aka "terminal") windows/tabs and run commands in each. The following describes how to execute the "pageviews" services. Other services operate similarly.

  • In one terminal, navigate to <GOPATH>/aqs-docker-test-env
  • Run "make startup", wait for it to say "Startup complete", then leave it running
  • In another terminal, also in <GOPATH>/aqs-docker-test-env, run "make bootstrap" and wait for it to complete
  • Navigate (either in that terminal or a different one) to <GOPATH>/pageviews
  • Run "make"
  • Run "./pageviews" (and leave it running)
  • In another terminal, navigate to <GOPATH>/pageviews and run "make test"
  • In your browser, visit http://localhost:8080/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Banana/daily/20190101/20190102

We haven't started the Druid-based endpoint(s) yet, but the process will likely be similar, with perhaps some differences in how to launch the test environment.

Tips and troubleshooting

Because Go is an opinionated language, it may refuse to run over seemingly small things, such as whitespace. If you see something like this:

   goimports: format errors detected

You can execute this to see what Go is unhappy about:

   goimports -d *.go

And this to automatically fix it:

   goimports -w *.go

Our services depend on several packages, including our own “aqsassist”, which is in active development. This means you may sometimes need to update dependencies for your local service to run. You can update all dependencies via:

   go get .

or update specific dependencies via something like:

   go get gitlab.wikimedia.org/repos/generated-data-platform/aqs/aqsassist

Local Testing

This section aims to explain how to prepare and run both local test environments (Cassandra and Druid) to test all the AQS services using the AQS QA test suite. In this section there are instructions to do this with both test environments: Cassandra and Druid. That way we could run two docker compose projects with all the AQS services running with the two existing test environments (Cassandra and Druid). There are enough instructions below to do that with all cassandra-based and druid-based services in both test environments.

All the steps explained here are ready for almost all the services and both test environments.

At this moment we could build and run two test environments ready to be used by QA engineers to run the AQS test suite:

  • aqs-docker-cassandra-test-env-qa: It's a cassandra test environment with the following services/containers: (available here)
    • cassandra-qa: A docker container with a Cassandra database already populated with some sample data
    • device-analytics-qa: A docker container to run the service listening on the port 8090
    • geo-analytics-qa: A docker container to run the service listening on the port 8091
    • media-analytics-qa: A docker container to run the service listening on the port 8092
    • page-analytics-qa: A docker container to run the service listening on the port 8093
  • aqs-docker-druid-test-env-qa: It's a Druid test environment with the following services/containers: (available here)
    • druid-qa: A docker container with a Druid database already populated with some sample data
    • editor-analytics-qa: A docker container to run the service listening on the port 8094
    • edit-analytics-qa: A docker container to run the service listening on the port 8095

Quick start (for QA engineers)

This quick start guide shows how to build, start and populate the docker-compose project for both testing environments: Cassandra and Druid:

  • Clone all service repositories that belong to the test env you want to run
  • Inside every service project, run make docker_qa to create the service docker image (it will be called like the service: geo-analytics, editor-analytics and so son)
  • For Cassandra test env:
    • Clone the aqs-docker-cassandra-test-env repository
    • Inside the test-env project, run make startup-qa to create the new docker-compose project
    • Before creating the project, you could change the port where you want the service to listen to (in the snippet below we are mapping the default port of the service container, 8080, to the 8091 port in our host. It will be really useful if we want to run all the services at the same time to test them at the same time:
    • Inside the test-env project, run make bootstrap-qa to populate cassandra in the new docker-compose project (it takes around 15 minutes to fully populated the database)
    • Take a look at your Docker Dashboard to be sure that a docker compose project called aqs-docker-test-env-qa has been created with all its services running
  • For Druid test env:
    • Clone the aqs-docker-druid-test-env repository
    • Enter to the aqs-docker-druid-test-env-build folder and run docker build -t aqs-docker-druid-test-env .(this will create a druid image already populated with the sample data)
    • Enter to the aqs-docker-druid-test-env-run folder and run make startup-qa
    • Take a look at your Docker Dashboard to be sure that a docker compose project called aqs-docker-druid-test-env-qa has been created with all its services running
  • Try making a request (for instance to http://localhost:8091/metrics/editors/by-country/en.wikipedia/100..-edits/2018/11) to check everything is working fine (change the port and the request according to the service you want to try)
  • The following are the ports where each service will be listening:
    • aqs-docker-test-env-qa:
      • device-analytics: 8090
      • geo-analytics: 8091
      • media-analytics: 8092
      • page-analytics: 8093
    • aqs-docker-druid-test-env-qa:
      • editor-analytics: 8094
      • edit-analytics: 8095

Full guide (in case you want to customize these environments)

This full guide describes all the necessary steps to create a docker compose project composed of geo-analytics service and cassandra test env as a sample. All the steps related to the service could be done for any other AQS services to run it using docker. Test environments (Cassandra and Druid) could be tuned following the same pattern we use here to add more services to the final docker-composed project.

Service config changes

These steps modify the config.yaml file in geo-analytics

Needed changes:

  • config.yaml file: We need to change the cassandra hostname and the service listen_address to be able to run properly the new AQS test env via docker.

You must change the cassandra host to "cassandra". It’s the name of the service we will set in the docker compose config file with which we are going to run the QA test environment (cassandra + geo). It’s needed so that the container service (geo-analytics in this case) can connect to the cassandra one. Take the opportunity to change the listen_address property to 0.0.0.0 to accept remote connections from outside the service container (to be able to run the QA test suite from your host)

 . . .
 listen_address: 0.0.0.0
 . . .
 cassandra:
   hosts: cassandra
 . . .

Create the service image

These steps describe how it was done for the geo-analytics service. Same steps can be done for any other AQS service

Needed changes:

 docker_qa: ## create a docker container to run QA test via Docker
  curl -s "https://blubberoid.wikimedia.org/v1/production" -H 'content-type: application/yaml' --data-binary @".pipeline/blubber.yaml" > Dockerfile
  docker build -t geo-analytics .

Needed steps

  • If not available, put the blubber file into the .pipeline folder (change the entry_point according to the service where you are putting this file)
  • If not available, put the config file into the .pipeline folder (this file is the same for all the services)
  • Add the new target to the Makefile
  • Now you could build the service image with the following command:
make docker_qa

Build the QA test environment (for cassandra-based services)

These steps must be run in the aqs-docker-test-env project folder.

Needed changes:

All changes described here are available in the aqs-docker-cassandra-test-env

This file defines a new docker-compose project compound of a cassandra engine and a sample service (geo-analytics in this case). If you take a look at this file, you will see that the service specific part could be customized to add any additional cassandra-based service you want to include to this dockerized env. In this moment some of these services are already included:

A service has to be added to define the cassandra container. For this specific service we also add a _healthcheck_ property to define how to know when cassandra is available. That way service containers will start when the database is available (to avoid failures about trying to connect to it when not yet available):

 cassandra:
    image: cassandra:3.11
    container_name: cassandra-qa
    ports:
      - "9042:9042"
    volumes:
      - .:/env
    networks:
      - network1
    healthcheck:
      test: ["CMD-SHELL", "[ $$(nodetool statusgossip) = running ]"]
      interval: 20s
      timeout: 10s
      retries: 5

An another one for each service you want to add to this test environment (in this case we are adding geo-analytics):

 geo-analytics:
  image: geo-analytics
  container_name: geo-analytics-qa
  ports:
   - "8091:8080"
  networks:
   - network1
  depends_on:
      cassandra:
        condition: service_healthy
  • New targets in the Makefile:
 bootstrap-qa: schema-qa load-qa
 schema-qa:
    docker exec -it cassandra-qa cqlsh -f /env/schema.cql
 load-qa:
    docker exec -it cassandra-qa cqlsh -f /env/test_data.cql --cqlshrc=/env/cqlshrc
 startup-qa:
    docker-compose -f docker-compose-qa.yml up -d

Needed steps:

Once you have customized both files according to the specified changes (docker-compose-qa-yml and Makefile) you could build and run the new dockerized test environment:

  • Create the docker-compose project:
 make startup-qa
  • Populate the cassandra container
make bootstrap-qa

QA test env and the included services are already running as docker containers within a docker compose project. The compose project will be named as aqs-docker-test-env-qa

Each service will be listening on a different port according to the service configuration. For example, in this case geo-analytics will be listening on the port 8091.

Build and run the QA test environment (for druid-based services)

These steps must be run in the aqs-docker-druid-test-env project folder.

Needed changes:

All changes described here are available in the aqs-docker-druid-test-env

This file defines a new docker-compose project composed of a Druid engine and a sample service (editor-analytics in this case). If you take a look at this file, you will see that the service specific part could be customized to add any additional Druid-based service you want to include to this dockerized env. In this moment some of these services are already included:

A service has to be added to define the Druid container. For this specific service we also add a _healthcheck_ property to define how to know when Druid is available. That way service containers will start when the database is available (to avoid failures about trying to connect to it when not yet available):

druid:
 image: bpirkle/aqs-docker-druid-test-env:latest
 container_name: druid-qa
 ports:
  - "8888:8888"
  - "8082:8082"
 networks:
  - network1
 healthcheck:
  interval: 10s
  retries: 9
  timeout: 90s
  test:
   - CMD-SHELL
   - nc -z 127.0.0.1 8888

An another one for each service you want to add to this test environment (in this case we are adding editor-analytics):

 editor-analytics:
  image: editor-analytics
  container_name: editor-analytics-qa
  ports:
   - "8094:8080"
  networks:
   - network1
  depends_on:
      druid:
        condition: service_healthy
  • New targets in the Makefile (inside the aqs-docker-druid-test-env-build folder):
startup-qa:
	docker-compose -f docker-compose-qa.yml up -d
shutdown-qa:
	docker-compose -f docker-compose-qa.yml down

Needed steps:

  • Once you have customized both files according to the specified changes (docker-compose-qa-yml and Makefile) you could build and run the new test environment docker image (from the aqs-docker-druid-test-env-build folder):
docker build -t aqs-docker-druid-test-env .
  • After creating the image, you can start the docker-compose project (from the aqs-docker-druid-test-env-run folder):
 make startup-qa

QA test env and the included services are already running as docker containers within a docker compose project. The compose project will be named as aqs-docker-druid-test-env-qa

Each service will be listening on a different port according to the service configuration. For example, in this case editor-analytics will be listening on the port 8094.

Demos

Notes for developers

  • Developers will need to keep a new additional config-devel.yaml just with a different cassandra host (“localhost”) to be able to connect test env while developing and listen_address = “localhost” (we’ll have to do something similar with Druid-based services). This is the way to run the service using an alternative config file:
 make clean build && ./geo-analytics --config config-devel.yaml
config.yaml config-devel.yaml
listen_address: 0.0.0.0 listen_address: localhost
cassandra.hosts:
- cassandra
cassandra.hosts:
- localhost
  • Dockerfile (created by blubber) and this config-devel.yaml should be added to .gitignore
  • A .dockerignore file should be added to the repo to avoid using an already build service binary when creating the service docker image (the already built binary mustn't be included into the dockerized environment because it has to be build inside the right docker container)

To keep in mind

  • We need to change the host of the database (to the name of the service, cassandra) to be able to connect from the service container to the cassandra one
    • It doesn’t matter because when deploying to production, the config file will be replaced automatically. We can use cassandra for the default one and create then config-devel.yaml to use “localhost” when developing
  • We need to allow remote connections (Listen 0.0.0.0:8080 instead of localhost:8080) to allow our host to connect to the service (Postman, curl, . . .)
    • 0.0.0.0 should be by default so we can use this value in the default config file for all the services. That value will be automatically replaced by the right one when deploying to production
  • In the end we need to keep two cassandra containers: the one for developing and the one for QA (a docker compose with two services)
    • It’s not really a problem because developers usually doesn’t start the QA one and QA engineers don't use the development one
    • Anyway, we can keep both test-env at the same time (aqs-docker-test-env and aqs-docker-test-env-qa). The only thing we have to keep in mind is that we cannot run both at the same time because they are listening in the same port.

Adding a Wiki

Data generated by a wiki goes through two main data pipelines before it ends up in AQS. Readership data flows from our varnish caches, through Kafka, into private and public datasets. The public pageviews data ends up in the Pageview API, served by AQS from Cassandra. Editing data flows slower through monthly whole-history loads from mediawiki database replicas. This ends up in Druid as mediawiki_history_reduced and is also served by AQS. (TODO: linkify all this)

To add a new wiki, you need to edit the include lists for these two pipelines:

Developing a New Endpoint

This is roughly how writing a new AQS endpoint goes. Note that some endpoints may use Druid rather than Cassandra, so that process may be different:

  • Development
    • Write the Oozie job to move data from Hadoop to Cassandra; Verify the output is correct by outputting to plain JSON/test Hive table; The Oozie job will be unable to load into Wikimedia Cloud Cassandra instances (You just have to hope that the loading works)
    • Write the AQS endpoint, which includes the table schema spec and unit tests
  • Testing
    • Tunnel to any of the aqs-test Cassandra instances (i.e., aqs-test1001.analytics.eqiad1.wikimedia.cloud, ..., aqs-test1010.analytics.eqiad1.wikimedia.cloud)
    • Create a keyspace in this cloud Cassandra instance of the same name as is detailed in the Oozie job properties file; Insert some data into the table
    • From any of the aqs-test machines, run your version of AQS with the new endpoint, pointing to one of the Cassandra instances (e.g., 172.16.4.205); Test by manually running queries against your local instance (i.e., localhost:7231)
  • Productionize
    • Deploy the Oozie job, but don't run it
    • Deploy AQS to point to the new data; Once it is running, it will automatically create the relevant keyspace in the production Cassandra instance
    • Manually run a few queries against the new AQS endpoint (i.e., aqs1010; localhost:7232), and ensure that they all respond with 404 (because no data is loaded into Cassandra yet)
    • Run the Oozie job to load the data into Cassandra
    • Manually run a few queries against the new AQS endpoint, and ensure that they all respond with the proper responses, as the data should now be loaded into Cassandra
    • Submit a pull request for the restbase repository on GitHub with the schema of the new endpoint added; Now the endpoint should be publicly accessible!
    • Update all relevant documentation

Deployment

This step-by-step serves for deploying to both staging (beta) and production. Watch out for specific differences between beta and prod in each step of this section.

Step 0: Testing AQS locally

With cassandra

Testing your change in our staging environment (beta) requires either having a stable patch merged in AQS and deployed through scap, or a lot of git black magic and messing around that you shouldn't do. A good solution for quick testing is setting up your own mini AQS in your local machine where you can make changes to the APIs instantly, update dependencies, load data... without switching between machines or sending gerrit patches.

  1. Install Zookeeper (brew install zookeeper on mac).
  2. Install Cassandra (brew install cassandra@2.2 on mac. Be aware that without the @2.2, brew will install version 3, which we don't have yet in production).
  3. Make sure you're using the right Java version (8). Cassandra will complain a lot about Java 9 and 10, so make sure that your JAVA_HOME environment variable points to your Java 8 installation (/usr/libexec/java_home -V will show the versions currently installed). To do that, set export JAVA_HOME=`/usr/libexec/java_home -v 1.8`
  4. Once the Cassandra service is running, start AQS by running the server with the default config provided on the repo: ./server.js -c config.example.wikimedia.yaml
  5. To load data or make changes in Cassandra, run cqlsh

With druid

The easiest to test AQS druid integration is to use the production druid cluster (AQS can only query druid, no data loss is possible). This is doable by following those steps:

  1. Start an SSH tunnel between your machine and the druid-public broker : ssh -N druid1010.eqiad.wmnet -L 8082:druid-public-broker.svc.eqiad.wmnet:8082
  2. Start AQS locally with the appropriate configuration, as suggested in this gerrit patch (WARNING: Update the datasources: mediawiki_history to the correct value for testing).
  3. You should be able to query your local AQS for druid-oriented queries (for instance: http://localhost:7231/analytics.wikimedia.org/v1/edits/aggregate/all-projects/all-editor-types/all-page-types/monthly/20180101/20190101)

Step 1: Update the AQS deploy repository

Note: Be aware that this process requires having Docker installed as an instantiation of docker is done when building.

Note: Even if you're deploying to staging (beta), the code you want to deploy should be merged to master. Otherwise, the whole deployment process won't work.

  • If it's the first time you deploy:
    • Get the deploy repository: git clone ssh://$USER@gerrit.wikimedia.org:29418/analytics/aqs/deploy .
    • Make sure AQS source git repo has the deploy.dir config variable set (see Services/FirstDeployment#Local Git).
  • Run npm install in the source repository and make sure that no error is returned. Do also the same thing with npm test
  • Are you deploying a new endpoint? You need to add a bit of code to the fake data script that matches the x-amples definition in AQS's v1 yaml. Otherwise endpoint checks will fail on deployment. An alternative is to set x-monitor to false, in which case your new endpoints won't get checked (tip: while this fixes the deploy, not testing the endpoint is not advised).
  • Then (regardless if first time or not):
    • Make sure both aqs-deploy and aqs repositories are on master, have latest, are clean, including submodules updated
    • Follow Services/Deployment#Preparing_the_Deploy_Repository (basically, run ./server.js build --deploy-repo --force --review -c config.test.yaml in the source folder).
    • Check that src's sha1 in the review corresponds to the code you want to deploy).
    • Merge the newly created change to aqs deploy repo to master.

Issues with "src" path

Remove src path from deploy repo. (We're not sure why this was added to the docs, we should discuss and explain or remove.)

Issues with git review

It uses git review only if you pass it the --review param, omit it and it will not try to submit patch, it will commit it but it will not be pushed. Sometimes the build hangs. In this case, check the sync-repo branch of the deploy repository. It should have the commit in there and that can be pushed to gerrit. It's ok to kill the build if it's been hanging for a while.

NPM vulnerabilities

Whenever possible, it is convenient to run npm audit and make sure that no dependencies pose a threat to the service. Most vulnerabilities will be solved by upgrading packages, but in some cases they will correspond to a second or third-level dependency that can only be upgraded by forcing versions in package-lock.json . Forcing versions can be avoided if you are certain that the code carrying the vulnerability will not be run by AQS (task T207945 is an example of this). If this is not the case, you can enforce the new version by editing package-lock.json and making sure that the version change doesn't break tests.

See note about hoek npm vulnerability here: https://phabricator.wikimedia.org/T206474

NPM has more information about dealing with vulnerabilities.

Step 2: Deploy using scap

  • Tell the #wikimedia-analytics and #wikimedia-operations IRC channels that you are deploying (use !log for instance)
  • Ssh into the deployment machine that suits your needs:
    • For staging (beta) use: deployment-deploy01.deployment-prep.eqiad1.wikimedia.cloud.
    • For production use: deployment.eqiad.wmnet .
  • Execute scap:
    • cd /srv/deployment/analytics/aqs/deploy
    • git pull
    • git submodule update --init
    • scap deploy -e aqs "YOUR DEPLOYMENT MESSAGE"
    • [optional] To see more detailed error logs during deployment, run scap deploy-log from /srv/deployment/analytics/aqs/deploy while you deploy.

Note: after T156049 scap will deploy only to aqs1010 (or deployment-aqs01 in case of beta) as first step (canary) and it will ask for confirmation before proceeding to the rest of the cluster. After that, it will deploy to one host at the time serially. You can force scap to ask for confirmation after each host or not, but telling him to proceed to all the other hosts (after the canary) will not cause a deployment to all of them at the same time, since the previously mentioned constraint will hold. Each host will be de-pooled from the load-balancer before the aqs restart, and re-pooled after that.

Step 3: Test

Staging (beta)

Beta thus far just has a modest dataset with pageviews to Barack Obama page in 2016 from es.wikipedia, en.wikipedia and de.wikipedia

You can run some queries like the following to see that aqs is running well:

 wget http://localhost:7232/analytics.wikimedia.org/v1/pageviews/ 
 curl  http://localhost:7232/analytics.wikimedia.org/v1/pageviews/per-article/de.wikipedia/all-access/all-agents/Barack_Obama/daily/2016010100/2016020200

Should return daily records

curl  http://localhost:7232/analytics.wikimedia.org/v1/pageviews/per-article/de.wikipedia/all-access/all-agents/Barack_Obama/monthly/2016010100/2016020200

Should return monthly records

curl  http://localhost:7232/analytics.wikimedia.org/v1/pageviews/aggregate/en.wikipedia/all-access/all-agents/daily/2015100100/2016103100

Should return aggregate data for en.wikipedia, if any

curl http://localhost:7232/analytics.wikimedia.org/v1/pageviews/aggregate/es.wikipedia/all-access/all-agents/monthly/2015100100/2016103100

Should return monthly aggregate data for en.wikipedia

Production

From (one of) the deployed machine, run /srv/deployment/analytics/aqs/deploy/test/test_local_aqs_urls.sh.

Troubleshooting Deployment

Issues with deployment to labs deploy

had to:

SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service deployment-aqs01.deployment-prep.eqiad1.wikimedia.cloud

Issues with scap

  • Depool machine
  • Delete deployment directory
  • Run puppet
  • Try to deploy again.

Check deploy logs:

scap deploy-log -v

Check AQS logs:

sudo journalctl -u aqs 

Journalctl might not have a lot of information since by default Restbase is configured to push logs to logstash. So in order to disable this behavior, remove the following from the AQS configuration file under /etc:

logging:
  name: aqs
  level: warn
  streams:
-  # XXX: Use gelf-stream -> logstash
-  - type: gelf
-    host: localhost
-    port: 12201

Manual AQS restart:

sudo systemctl restart aqs

Data filter before Cassandra load

Some data used by AQS comes from Cassandra. We are using Airflow+Spark+HQL to feed the tables on Cassandra.

On HDFS, We have implemented a disallowed table `wmf.disallowed_cassandra_articles` to filter out sensitive pages we don't want to appear in the top-viewed articles of a wiki.

In fact, some attacks aim at manipulating the number of views per article. For example, the goal could be pushing traffic to a 3rd party site or adding an offensive word to the top list, which millions of users view.

The table is used in some Cassandra tables:

* pageview_per_article_daily

* pageview_top_articles_daily

* pageview_top_percountry_daily

* pageview_top_articles_monthly

To update this list of disallowed articles:

  • update the TSV in analytics/refinery `static_data/cassandra/disallowed_cassandra_articles.tsv `
  • prepare a patch and deploy it

For emergency procedures, you could run the following (Note that you still need a Gerrit patch as the next deployment of analytics/refinery will override your change):

# Fetch the disallowed list
ssh an-launcher1002.eqiad.wmnet
export TSV_FILENAME=disallowed_cassandra_articles.tsv
export TSV_HDFS_PATH="/wmf/refinery/current/static_data/cassandra/${TSV_FILENAME}"
hdfs dfs -cat $TSV_HDFS_PATH > $TSV_FILENAME
# Add or remove some entries (beware, tabs are expected between columns, not spaces)
vim $TSV_FILENAME
# Push the file back to HDFS
sudo -u hdfs kerberos-run-command hdfs hdfs dfs -put -f $TSV_FILENAME $TSV_HDFS_PATH
sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod +r $TSV_HDFS_PATH

API documentation

To create API reference docs that are reliable and easy to update, AQS 2.0 services use:

Documentation monitoring

The AQS documentation site uses Matomo to collect analytics data. To view the data, follow the instruction on wiki to log in to Matomo, and select "AQS documentation" from the "websites" dropdown.

Updating the docs

When making a change to an AQS service, you must run these commands locally to update the API spec before submitting the change. As of 2023, there is no integration to update the spec automatically, so each patch must include the corresponding changes to the API spec when needed. Remember that the docs rely on code annotations, so make sure to keep the annotations up to date with any code changes.

1. Install swag:

go install github.com/swaggo/swag/cmd/swag@latest

2. Generate the spec:

make docs

Swag outputs the spec in YAML and JSON formats to a /docs directory.

Reading the specification

You can view the spec using the API spec reader.

Setting up docs for a new service

To set up API docs for a new AQS service:

  1. Annotate main.go (example, style guide): Anywhere in main.go, add annotations to document general information about the API.
  2. Annotate handler (example, style guide): Add annotations to any code file to document an endpoint. Endpoint annotations should be stored as close as possible to the code they describe. The block of endpoint annotations must end on a line immediately preceding a function.
  3. Annotate entity (example, style guide): Swag automatically gets information about the response format from the struct. To complete the schema in the docs, add these elements to the struct definition:
    1. an example value within the JSON encoding definition using the syntax example:"example value". Note that these examples will be used in the sandbox and should return valid data.
    2. a description of the attribute as an inline comment
  4. Add a make docs command: Add these lines to the service's Makefile:
    docs:  ## creates openapi spec (requires swag)
    	swag init || (echo "Hint: If you haven't installed swag, run 'go install github.com/swaggo/swag/cmd/swag@latest', then re-run 'make docs'."; exit 1)
    
  5. Generate the spec: Run make docs to generate the spec.
  6. Commit the spec files: After generating the spec, commit the /docs directory to the source code repository. Since spec generation does not currently run in CI, the API spec must be stored in the repository in order to be served by the API spec endpoint.
  7. Add an endpoint to serve the API spec (example: main.go, handler, test): To make the docs publicly available, add an endpoint that serves the docs/swagger.json file via service-name/api-spec.json, for example device-analytics/api-spec.json
  8. Route the spec endpoint: As part of setting up routing for the new service in the REST Gateway, ensure that the spec endpoint is served at service-name/api-spec.json, for example device-analytics/api-spec.json.
  9. Add reference documentation to the AQS docs site: Submit a merge request to analytics-api that:
    1. Adds a new Markdown file under /reference using the #API reference template.
    2. Adds the new page to the navigation in the config file under API reference.
    3. Adds an entry to the changelog announcing that the new endpoints are available.
    4. Follow the instructions in the README to preview the site locally. Note that the API reference docs won't appear until the API spec endpoint has been routed and deployed.

API reference template

To use this template, replace everything in square brackets, and remove the square brackets. For an example, see the page analytics reference page.

---
title: [Service name]
description: Reference docs and sandbox for [service name]
---

# [Service name]

[Description of what data the service provides and any limitations users should be aware of, such as the date the data is available from. Use h2 headings to separate limitation sections if necessary.]

## [Endpoint name]

<SpecView specUrl="[spec endpoint URL]" path="get [endpoint path, not including https://wikimedia.org/api/rest_v1/metrics/]"/>

[Repeat the h2 section above for each endpoint. If necessary, group the endpoints into groups, using an h2 heading for the group name and h3 headings for the endpoints.]

Spec validation

API specs are validated against the schemas in aqs tests.

Decision records

Historical

AQS was originally developed as a single service with an API proxied through RESTBase. As a part of the goal to deprecate RESTBase, the /metrics endpoints served by AQS 1.0 were migrated to a set of services that do not depend on RESTBase. This project was referred to as AQS 2.0. See phab:T263489 for the original proposal.

Scaling for AQS 1.0