AQS 2.0

From Wikitech

Analytics Query Service (AQS) is the software behind the /metrics family of endpoints in RESTBase. It is a read-only HTTP proxy to results served from Cassandra and Druid. It is currently based on an outdated fork of RESTBase, and has received little updates over the years. As a part of the goal to deprecate RESTBase, AQS 2.0 is a project to migrate the /metrics endpoints to a set of services that do not depend on RESTBase.

Services

For a complete mapping of /metrics endpoints to new services, see the spreadsheet (public).

Device Analytics
Repository: generated-data-platform/aqs/device-analytics
API specification: swagger.json
Data source: Cassandra
Geo Analytics
Repository: generated-data-platform/aqs/geo-analytics
API specification: swagger.json
Data source: Cassandra
Media Analytics
Repository: generated-data-platform/aqs/media-analytics
API specification: swagger.json
Data source: Cassandra
Page Analytics
Repository: generated-data-platform/aqs/page-analytics
API specification: swagger.json
Data source: Cassandra
Edit Analytics
Repository: generated-data-platform/aqs/edit-analytics
API specification: swagger.json
Data source: Druid
Editor Analytics
Repository: generated-data-platform/aqs/editor-analytics
API specification: swagger.json
Data source: Druid
Common functionality
AQS Assist: generated-data-platform/aqs/aqsassist
service-lib-golang: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/servicelib-golang
Test environments
Cassandra test env: generated-data-platform/aqs/aqs-docker-cassandra-test-env
Druid test env: generated-data-platform/aqs/aqs-docker-druid-test-env
QA
QA test suite: generated-data-platform/aqs/aqs_tests

Project overview

Epic task in Phabricator | API Platform Team workboard | AQS2.0 workboard

  1. Yes Done Implement the new, stand-alone AQS service(s)
  2. Yes Done Deploy to k8s
  3. Yes Done Switch RESTBase to proxying requests from the old AQS service, to the new k8s-based one
    1. Phase 1: Unique devices endpoint (Device Analytics service)
    2. Phase 2: Pageviews and legacy endpoints (Page Analytics service), editors/by-country endpoint (Geo Analytics service), and media requests endpoints (Media Analytics service)
    3. Phase 3: Edited pages, edits, and bytes difference endpoints (Edit Analytics service) and editors and registered users endpoints (Editor Analytics service)
  4. Underway In progress Deprecate the http://{project}/api/rest_v1/metrics resources
  5. Eventually phase out the RESTBase /metrics hierarchy

Proposal

From phab:T263489

We propose to break down the rewrite along dataset boundaries — similar to the module structure in RESTBase — with a separate project used to implement each.

  • Device Analytics
  • Page Analytics
  • Media Analytics
  • Geo Analytics
  • Edit Analytics
  • Editor Analytics

The resulting services will be proxied by RESTBase and/or the API Gateway (the former to eventually be deprecated in favor of the latter) in order to maintain complete compatibility with the existing API.

The target language for these implementations is Go. While a complete comparison of Javascript/NodeJS and Go is out of scope for this issue, the (simplified) rationale is:

  • Strong, static typing; Statically typed languages eliminate entire classes of bugs common to dynamic languages, improve security, and making code easier to reason about
  • Ease of use; Go is more obvious, more explicit, and easier to understand. Complicated concepts like concurrency are easier to get right
  • Performance; Service latency can be expected to be both lower, but more importantly, more predictable with Go

Running a service

AQS 2.0 consists of several repositories. Some correspond to individual services that expose APIs. Others correspond to cross-service common functionality or test environments.

Setup

You will need:

Go (aka "golang") is an opinionated language in various ways. Among these is that you're probably much better off keeping your Go code under your "GOPATH" rather than wherever you may be used to keeping code. (There are, of course, always ways for savvy developers to cheat the system. If you choose to do that, any consequences are on you.) On my Mac, I cloned all the AQS 2.0 repositories under ~/go/src/.

Start a service

The various service README files contain details about running that particular service. But the summary is that you'll need to open several command line (aka "terminal") windows/tabs and run commands in each. The following describes how to execute the "pageviews" services. Other services operate similarly.

  • In one terminal, navigate to <GOPATH>/aqs-docker-test-env
  • Run "make startup", wait for it to say "Startup complete", then leave it running
  • In another terminal, also in <GOPATH>/aqs-docker-test-env, run "make bootstrap" and wait for it to complete
  • Navigate (either in that terminal or a different one) to <GOPATH>/pageviews
  • Run "make"
  • Run "./pageviews" (and leave it running)
  • In another terminal, navigate to <GOPATH>/pageviews and run "make test"
  • In your browser, visit http://localhost:8080/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Banana/daily/20190101/20190102

We haven't started the Druid-based endpoint(s) yet, but the process will likely be similar, with perhaps some differences in how to launch the test environment.

Tips and troubleshooting

Because Go is an opinionated language, it may refuse to run over seemingly small things, such as whitespace. If you see something like this:

   goimports: format errors detected

You can execute this to see what Go is unhappy about:

   goimports -d *.go

And this to automatically fix it:

   goimports -w *.go

Our services depend on several packages, including our own “aqsassist”, which is in active development. This means you may sometimes need to update dependencies for your local service to run. You can update all dependencies via:

   go get .

or update specific dependencies via something like:

   go get gitlab.wikimedia.org/repos/generated-data-platform/aqs/aqsassist

Monitoring

Local Testing

This section aims to explain how to prepare and run both local test environments (Cassandra and Druid) to test all the AQS services using the AQS QA test suite. In this section there are instructions to do this with both test environments: Cassandra and Druid. That way we could run two docker compose projects with all the AQS services running with the two existing test environments (Cassandra and Druid). There are enough instructions below to do that with all cassandra-based and druid-based services in both test environments.

All the steps explained here are ready for almost all the services and both test environments.

At this moment we could build and run two test environments ready to be used by QA engineers to run the AQS test suite:

  • aqs-docker-cassandra-test-env-qa: It's a cassandra test environment with the following services/containers: (available here)
    • cassandra-qa: A docker container with a Cassandra database already populated with some sample data
    • device-analytics-qa: A docker container to run the service listening on the port 8090
    • geo-analytics-qa: A docker container to run the service listening on the port 8091
    • media-analytics-qa: A docker container to run the service listening on the port 8092
    • page-analytics-qa: A docker container to run the service listening on the port 8093
  • aqs-docker-druid-test-env-qa: It's a Druid test environment with the following services/containers: (available here)
    • druid-qa: A docker container with a Druid database already populated with some sample data
    • editor-analytics-qa: A docker container to run the service listening on the port 8094
    • edit-analytics-qa: A docker container to run the service listening on the port 8095

Quick start (for QA engineers)

This quick start guide shows how to build, start and populate the docker-compose project for both testing environments: Cassandra and Druid:

  • Clone all service repositories that belong to the test env you want to run
  • Inside every service project, run make docker_qa to create the service docker image (it will be called like the service: geo-analytics, editor-analytics and so son)
  • For Cassandra test env:
    • Clone the aqs-docker-cassandra-test-env repository
    • Inside the test-env project, run make startup-qa to create the new docker-compose project
    • Before creating the project, you could change the port where you want the service to listen to (in the snippet below we are mapping the default port of the service container, 8080, to the 8091 port in our host. It will be really useful if we want to run all the services at the same time to test them at the same time:
    • Inside the test-env project, run make bootstrap-qa to populate cassandra in the new docker-compose project (it takes around 15 minutes to fully populated the database)
    • Take a look at your Docker Dashboard to be sure that a docker compose project called aqs-docker-test-env-qa has been created with all its services running
  • For Druid test env:
    • Clone the aqs-docker-druid-test-env repository
    • Enter to the aqs-docker-druid-test-env-build folder and run docker build -t aqs-docker-druid-test-env .(this will create a druid image already populated with the sample data)
    • Enter to the aqs-docker-druid-test-env-run folder and run make startup-qa
    • Take a look at your Docker Dashboard to be sure that a docker compose project called aqs-docker-druid-test-env-qa has been created with all its services running
  • Try making a request (for instance to http://localhost:8091/metrics/editors/by-country/en.wikipedia/100..-edits/2018/11) to check everything is working fine (change the port and the request according to the service you want to try)
  • The following are the ports where each service will be listening:
    • aqs-docker-test-env-qa:
      • device-analytics: 8090
      • geo-analytics: 8091
      • media-analytics: 8092
      • page-analytics: 8093
    • aqs-docker-druid-test-env-qa:
      • editor-analytics: 8094
      • edit-analytics: 8095

Full guide (in case you want to customize these environments)

This full guide describes all the necessary steps to create a docker compose project composed of geo-analytics service and cassandra test env as a sample. All the steps related to the service could be done for any other AQS services to run it using docker. Test environments (Cassandra and Druid) could be tuned following the same pattern we use here to add more services to the final docker-composed project.

Service config changes

These steps modify the config.yaml file in geo-analytics

Needed changes:

  • config.yaml file: We need to change the cassandra hostname and the service listen_address to be able to run properly the new AQS test env via docker.

You must change the cassandra host to "cassandra". It’s the name of the service we will set in the docker compose config file with which we are going to run the QA test environment (cassandra + geo). It’s needed so that the container service (geo-analytics in this case) can connect to the cassandra one. Take the opportunity to change the listen_address property to 0.0.0.0 to accept remote connections from outside the service container (to be able to run the QA test suite from your host)

 . . .
 listen_address: 0.0.0.0
 . . .
 cassandra:
   hosts: cassandra
 . . .

Create the service image

These steps describe how it was done for the geo-analytics service. Same steps can be done for any other AQS service

Needed changes:

 docker_qa: ## create a docker container to run QA test via Docker
  curl -s "https://blubberoid.wikimedia.org/v1/production" -H 'content-type: application/yaml' --data-binary @".pipeline/blubber.yaml" > Dockerfile
  docker build -t geo-analytics .

Needed steps

  • If not available, put the blubber file into the .pipeline folder (change the entry_point according to the service where you are putting this file)
  • If not available, put the config file into the .pipeline folder (this file is the same for all the services)
  • Add the new target to the Makefile
  • Now you could build the service image with the following command:
make docker_qa

Build the QA test environment (for cassandra-based services)

These steps must be run in the aqs-docker-test-env project folder.

Needed changes:

All changes described here are available in the aqs-docker-cassandra-test-env

This file defines a new docker-compose project compound of a cassandra engine and a sample service (geo-analytics in this case). If you take a look at this file, you will see that the service specific part could be customized to add any additional cassandra-based service you want to include to this dockerized env. In this moment some of these services are already included:

A service has to be added to define the cassandra container. For this specific service we also add a _healthcheck_ property to define how to know when cassandra is available. That way service containers will start when the database is available (to avoid failures about trying to connect to it when not yet available):

 cassandra:
    image: cassandra:3.11
    container_name: cassandra-qa
    ports:
      - "9042:9042"
    volumes:
      - .:/env
    networks:
      - network1
    healthcheck:
      test: ["CMD-SHELL", "[ $$(nodetool statusgossip) = running ]"]
      interval: 20s
      timeout: 10s
      retries: 5

An another one for each service you want to add to this test environment (in this case we are adding geo-analytics):

 geo-analytics:
  image: geo-analytics
  container_name: geo-analytics-qa
  ports:
   - "8091:8080"
  networks:
   - network1
  depends_on:
      cassandra:
        condition: service_healthy
  • New targets in the Makefile:
 bootstrap-qa: schema-qa load-qa
 schema-qa:
    docker exec -it cassandra-qa cqlsh -f /env/schema.cql
 load-qa:
    docker exec -it cassandra-qa cqlsh -f /env/test_data.cql --cqlshrc=/env/cqlshrc
 startup-qa:
    docker-compose -f docker-compose-qa.yml up -d

Needed steps:

Once you have customized both files according to the specified changes (docker-compose-qa-yml and Makefile) you could build and run the new dockerized test environment:

  • Create the docker-compose project:
 make startup-qa
  • Populate the cassandra container
make bootstrap-qa

QA test env and the included services are already running as docker containers within a docker compose project. The compose project will be named as aqs-docker-test-env-qa

Each service will be listening on a different port according to the service configuration. For example, in this case geo-analytics will be listening on the port 8091.

Build and run the QA test environment (for druid-based services)

These steps must be run in the aqs-docker-druid-test-env project folder.

Needed changes:

All changes described here are available in the aqs-docker-druid-test-env

This file defines a new docker-compose project composed of a Druid engine and a sample service (editor-analytics in this case). If you take a look at this file, you will see that the service specific part could be customized to add any additional Druid-based service you want to include to this dockerized env. In this moment some of these services are already included:

A service has to be added to define the Druid container. For this specific service we also add a _healthcheck_ property to define how to know when Druid is available. That way service containers will start when the database is available (to avoid failures about trying to connect to it when not yet available):

druid:
 image: bpirkle/aqs-docker-druid-test-env:latest
 container_name: druid-qa
 ports:
  - "8888:8888"
  - "8082:8082"
 networks:
  - network1
 healthcheck:
  interval: 10s
  retries: 9
  timeout: 90s
  test:
   - CMD-SHELL
   - nc -z 127.0.0.1 8888

An another one for each service you want to add to this test environment (in this case we are adding editor-analytics):

 editor-analytics:
  image: editor-analytics
  container_name: editor-analytics-qa
  ports:
   - "8094:8080"
  networks:
   - network1
  depends_on:
      druid:
        condition: service_healthy
  • New targets in the Makefile (inside the aqs-docker-druid-test-env-build folder):
startup-qa:
	docker-compose -f docker-compose-qa.yml up -d
shutdown-qa:
	docker-compose -f docker-compose-qa.yml down

Needed steps:

  • Once you have customized both files according to the specified changes (docker-compose-qa-yml and Makefile) you could build and run the new test environment docker image (from the aqs-docker-druid-test-env-build folder):
docker build -t aqs-docker-druid-test-env .
  • After creating the image, you can start the docker-compose project (from the aqs-docker-druid-test-env-run folder):
 make startup-qa

QA test env and the included services are already running as docker containers within a docker compose project. The compose project will be named as aqs-docker-druid-test-env-qa

Each service will be listening on a different port according to the service configuration. For example, in this case editor-analytics will be listening on the port 8094.

Demos

Notes for developers

  • Developers will need to keep a new additional config-devel.yaml just with a different cassandra host (“localhost”) to be able to connect test env while developing and listen_address = “localhost” (we’ll have to do something similar with Druid-based services). This is the way to run the service using an alternative config file:
 make clean build && ./geo-analytics --config config-devel.yaml
config.yaml config-devel.yaml
listen_address: 0.0.0.0 listen_address: localhost
cassandra.hosts:
- cassandra
cassandra.hosts:
- localhost
  • Dockerfile (created by blubber) and this config-devel.yaml should be added to .gitignore
  • A .dockerignore file should be added to the repo to avoid using an already build service binary when creating the service docker image (the already built binary mustn't be included into the dockerized environment because it has to be build inside the right docker container)

To keep in mind

  • We need to change the host of the database (to the name of the service, cassandra) to be able to connect from the service container to the cassandra one
    • It doesn’t matter because when deploying to production, the config file will be replaced automatically. We can use cassandra for the default one and create then config-devel.yaml to use “localhost” when developing
  • We need to allow remote connections (Listen 0.0.0.0:8080 instead of localhost:8080) to allow our host to connect to the service (Postman, curl, . . .)
    • 0.0.0.0 should be by default so we can use this value in the default config file for all the services. That value will be automatically replaced by the right one when deploying to production
  • In the end we need to keep two cassandra containers: the one for developing and the one for QA (a docker compose with two services)
    • It’s not really a problem because developers usually doesn’t start the QA one and QA engineers don't use the development one
    • Anyway, we can keep both test-env at the same time (aqs-docker-test-env and aqs-docker-test-env-qa). The only thing we have to keep in mind is that we cannot run both at the same time because they are listening in the same port.

API documentation

To create API reference docs that are reliable and easy to update, AQS 2.0 services use swag to generate an OpenAPI specification based on a mix of code annotations and the code itself. For more information about swag, see the API documentation page on mediawiki.org.

Updating the docs

When making a change to an AQS service, you must run these commands locally to update the API spec before submitting the change. As of 2023, there is no integration to update the spec automatically, so each patch must include the corresponding changes to the API spec when needed. Remember that the docs rely on code annotations, so make sure to keep the annotations up to date with any code changes.

1. Install swag:

go install github.com/swaggo/swag/cmd/swag@latest

2. Format the annotations

You can use this command to format the annotations automatically:

swag fmt

3. Generate the spec:

make docs

Swag outputs the spec in YAML and JSON formats to a /docs directory.

Reading the specification

You can view the spec using the API spec reader.

Setting up docs for a new service

To set up API docs for a new AQS service:

  1. Annotate main.go (example): Anywhere in main.go, add annotations to document general information about the API.
  2. Create api.md (example): Write the API description in a separate markdown file indicated in the @description.markdown annotation in main.go.
  3. Annotate handler (example): Add annotations to any code file to document an endpoint. Endpoint annotations should be stored as close as possible to the code they describe. The block of endpoint annotations must end on a line immediately preceding a function.
  4. Annotate entity (example): Swag automatically gets information about the response format from the struct. To complete the schema in the docs, add these elements to the struct definition:
    1. an example value within the JSON encoding definition using the syntax example:"example value"
    2. a description of the attribute as an inline comment
  5. Generate the spec: Run swag to generate the spec.
  6. Add a make docs command: Add these lines to the service's Makefile:
    docs:  ## creates openapi spec (requires swag)
    	swag init --markdownFiles . || (echo "Hint: If you haven't installed swag, run 'go install github.com/swaggo/swag/cmd/swag@latest', then re-run 'make docs'."; exit 1)
    
  7. Add an endpoint to serve the API spec (example: main.go, handler, test): To make the docs publicly available, add an endpoint that serves the docs/swagger.json file via service-name/api-spec.json, for example device-analytics/api-spec.json

See also