Kubernetes/Kubernetes Workshop/Load Testing

From Wikitech
Jump to navigation Jump to search

The basic idea is to use a program like apachebench (in package apache2-utils) or hey (https://github.com/rakyll/hey) to generate load on the service. Both of these programs retrieve the same URL multiple times with a concurrency factor, say 10,000 times total with a concurrency of 10 and then report back requests per second, errors, latencies and latency histograms. We will use the calculator-service demo service for this walk through.

Minikube normally loads docker images from dockerhub, or any other repository but can use a local image, if the policy is set to Never load. We can build a local docker image for the calculator-service for easier manipulation and faster updates:

  • check out the calculator-service repo from gerrit or github
  • eval $(minikube docker-env)
  • docker images # to check
  • docker build --tag wmfcalc-mk3 .

The calculator-service is simple and should not present major CPU usage. Let's start with a low CPU allocation in kubernetes: CPU = 100m. calculator-service reports its memory footprint on its /metrics page and it is fairly low (<14 MB). We can set memory = 16 Mb. Both of these variables are set in the deployment file.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: calc
  labels:
    app: calc
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: calc
  template:
    metadata:
      labels:
        app: calc
    spec:
      containers:
       - name: calc
         image: docker-registry.wikimedia.org/wikimedia/blubber-doc-example-calculator-service:stable
         imagePullPolicy: Always
         resources:
           requests:
             memory: "16Mi"
             cpu: "100m"
           limits:
             memory: "16Mi"
             cpu: "100m"

We start with 1 replica for a baseline, note the requests per second, then increase the number of replicas and see if and how the service scales.

Start the service and deployment.

Then run tests with ab - simple math expression, then more complicated, one with a parsing error, and one that induces the length error.

Useful URLs:

  • curl http://192.168.49.2:32459/api?2+3 # to test
  • curl http://192.168.49.2:32459/metrics # to get metrics
  • curl http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3)) # more complex
  • curl http://192.168.49.2:32459/api?2+38+(44)(64)((66^555/2-3)) # syntax error
  • curl http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3))aaaaaaaaaaaa # length error input is limited to 60 characters
  • ab -q -n 1000 -c 8 http://192.168.49.2:32459/api?2+3 | grep Requests Requests per second:    168.51 [#/sec] (mean)
  • curl http://192.168.49.2:32459/metrics # to get metrics

Summary: about 80 rps - memory is at 13 MB - see the Appendix for more data

Now increase the number of replicas to 2, 4, 8, then 16. This results in a plateau at 1000 rps with ab,  but hey binary seems to get more request up to the 1900 rps area

Overall a configuration with 2 replicas using 100m CPU and 16 Mb memory will be able to serve 150+ rps and have some high availability and is a good baseline for our first release.

Testing with k6.io

k6.io is an external testing service. One can record test sequences with a browser or code them using Javascript. k6.io has test execution machines worldwide, basically at AWS.

The service to test needs to be externally available. In this test we use our k8s installation at WMCS from chapter 6, but in addition we need a floating IP to provide external access.

  1. Install k8s on WMCS (see chapter 6)
  2. Deploy calc/1.2
  3. Define a service, using a nodeport. Find the port mapped (30000+)
  4. Map the floating IP to one of the nodes1
  5. Open the firewall for port from Step 3
  6. test access from the Internet: curl http://{floating ip}:{port}/api?2+6 and http://{floating ip}:{port}/metrics
  7. log into k6.io - trial account should work
  8. Run a script to test calc. Here is an example, created from their library, 5 minute test, 1 minute ramping up to 20 Virtual users, then 3 minutes at 20 Virtual Usres, than ramping down. The test itself is a simple get, followed by a 1 second sleep call.
    import { sleep } from 'k6'
    import http from 'k6/http'
    
    // See https://k6.io/docs/using-k6/options
    export const options = {
      stages: [
        { duration: '1m', target: 20 },
        { duration: '3m', target: 20 },
        { duration: '1m', target: 0 },
      ],
      thresholds: {
        http_req_failed: ['rate<0.02'], // http errors should be less than 2%
        http_req_duration: ['p(95)<2000'], // 95% requests should be below 2s
      },
      ext: {
        loadimpact: {
          distribution: {
            'amazon:us:ashburn': { loadZone: 'amazon:us:ashburn', percent: 100 },
          },
        },
      },
    }
    
    export default function main() {
      let response = http.get('http://185.15.56.95:30162/api?2+3')
      sleep(1)
    }
    

Run1 on k6.io.png

Output from calc's metrics collection

$ curl http://185.15.56.95:30162/metrics
start_time 1618522447.9971263
wellformed_total{method="get"} 23669
wellformed_total{method="post"} 0
nonwellformed_total 0
memory 18292736
duration_bucket{le="1"} 6
duration_bucket{le="2"} 562
duration_bucket{le="4"} 11376
duration_bucket{le="8"} 11488
duration_bucket{le="16"} 200
duration_bucket{le="32"} 27
duration_bucket{le="32+"} 13

More test runs:

- 50 users from São Paulo, Brazil

Run from São Paulo


- 50 users from Mumbai, India

Run From Mumbai

Appendix:

One Replica

ab -q -n 1000 -c 1 http://192.168.49.2:32459/api?2+3 | grep Requests
Requests per second:    129.23 [#/sec] (mean)

ab -q -n 1000 -c 2 http://192.168.49.2:32459/api?2+3 | grep Requests
Requests per second:    159.30 [#/sec] (mean)

ab -q -n 1000 -c 4 http://192.168.49.2:32459/api?2+3 | grep Requests
Requests per second:    42.55 [#/sec] (mean)

ab -q -n 1000 -c 4 http://192.168.49.2:32459/api?2+3 | grep Requests
Requests per second:    164.13 [#/sec] (mean)

ab -q -n 1000 -c 8 http://192.168.49.2:32459/api?2+3 | grep Requests
Requests per second:    138.44 [#/sec] (mean)

ab -q -n 1000 -c 8 http://192.168.49.2:32459/api?2+3 | grep Requests
Requests per second:    168.51 [#/sec] (mean)

curl http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3))
{"operation":"2+38+(44)(64)((66555/2-3))","result":"7031834.0"}

ab -q -n 1000 -c 1 http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3)) | grep Requests
Requests per second:    93.77 [#/sec] (mean)

ab -q -n 1000 -c 4 http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3)) | grep Requests
Requests per second:    106.45 [#/sec] (mean)

ab -q -n 1000 -c 8 http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3)) | grep Requests
Requests per second:    104.89 [#/sec] (mean)

## Parse error (^ is not supported)

curl http://192.168.49.2:32459/api?2+38+(44)(64)((66^555/2-3))
{"operation":"2+38+(44)(64)((66555/2-3))","result":"None"}

ab -q -n 1000 -c 1 http://192.168.49.2:32459/api?2+38+(44)(64)*((66555/2-3)) | grep Requests
Requests per second:    86.60 [#/sec] (mean)

ab -q -n 1000 -c 4 http://192.168.49.2:32459/api?2+38+(44)(64)((66^555/2-3)) | grep Requests
Requests per second:    95.75 [#/sec] (mean)

ab -q -n 1000 -c 8 http://192.168.49.2:32459/api?2+38+(44)(64)((66^555/2-3)) | grep Requests
Requests per second:    95.08 [#/sec] (mean)

## Length error (input is limited to 60 characters)

curl http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3))aaaaaaaaaaaa

ab -q -n 1000 -c 1 http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3))aaaaaaaaaaaa| grep Requests
Requests per second:    195.40 [#/sec] (mean)

curl http://192.168.49.2:32459/metrics
start_time 1613141183.1434238
wellformed_total{method="get"} 1000
wellformed_total{method="post"} 0
nonwellformed_total 2000
memory 13492224
duration_bucket{le="1"} 0
duration_bucket{le="2"} 0
duration_bucket{le="4"} 994
duration_bucket{le="8"} 209
duration_bucket{le="16"} 628
duration_bucket{le="32"} 40
duration_bucket{le="32+"} 129

## 4 replicas

ab -q -n 1000 -c 4 http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3)) | grep Requests
Requests per second:    410.19 [#/sec] (mean)

## 8 replicas

ab -q -n 1000 -c 4 http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3)) | grep Requests
Requests per second:    954.36 [#/sec] (mean)

## 16 replicas - 1st run too fast 2000+ rps seemed suspicious - increase number of  requests

ab -q -n 1000 -c 4 http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3)) | grep Requests
Requests per second:    2173.27 [#/sec] (mean)

ab -q -n 10000 -c 4 http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3)) | grep Requests
Requests per second:    1132.95 [#/sec] (mean)

ab -q -n 20000 -c 4 http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3)) | grep Requests
Requests per second:    1011.18 [#/sec] (mean)

## 24 replicas

ab -q -n 20000 -c 16 http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3)) | grep Requests
Requests per second:    1272.75 [#/sec] (mean)

hey -n 20000 -c 16 http://192.168.49.2:32459/api?2+38+(44)(64)((66555/2-3)) | grep Requests
 Requests/sec: 1941.3902