Kubernetes/Ingress

Introduction

The Kubernetes Ingress uses Istio Ingresscontroller (ultimately the Ingressgateway) running as a Daemonset on each worker node to route traffic to workload services and (in last instance) Pods.

The Istio Ingressgateway is implemented as an envoy instance that is configured via xDS by the Istiod Control Plane. All configuration is derived from Kubernetes API Objects like the Service Objects as well as Istio specific custom resources: Gateway, VirtualService and DestinationRule.

The process of configuring the Ingressgateway is abstracted away from service owners/deployers via helm common_templates/0.4/_ingress_helpers.tpl so that it is enough to specify .Values.ingress.enabled: true for a very basic setup.

ingressgateway terminates TLS connections from clients
ingressgateway establishes TLS connections to upstream (pods) which will be terminated on pod level by service-proxy

Istio setup and configuration

Istio (the control plane as well as components like the Ingressgateway) are installed and initially configured using istioctl (from a deployment host) together with an environment specific config.

For initial installation or deploying updates, run (on a deployment host, after selecting the correct kube_env admin environment/cluster):

istioctl-1.15.7 apply -f <environment>/config.yaml

For details on how to configure Istio, please see:

Istio docs (1.9 version)

Some parts of the configuration have to be (or can be) done via helm chart values. istioctl comes with embedded helm charts that get rendered and applied by istiotcl directly. You may find those embedded charts (to look up possible configuration options etc. at:

Istio embedded helm charts

TLS certificates are generated and maintained by cert-manager and deployed into the istio-system namespace for the ingressgateway to pick them up. This is configured and deployed by SRE via helmfile.d/admin_ng/helmfile_namespace_certs.yaml.

Troubleshooting

The Ingressgateway does emit access logs which can be viewed in logstash

Generic grafana dashboards can be found via the istio tag: Grafana search

Status of the ingressgateway

Using 'proxy-status' it can be checked if the istio control-plane is successfully sending updates to the ingressgateway instances:

istioctl-1.9.5 proxy-status

Retrieve clusters, listeners, routes, endpoints, secrets

istioctl can be used to fetch specific parts of the (envoy) configuration.

Be aware that ingressgateway will show a cluster and endpoint for every kubernetes service in the cluster, regardless of whether you have configured ingress for it or not (it's just discovering everything in the cluster).

# <COMPONENT> can be one of: clusters, endpoints, listeners, routes, secrets
istioctl-1.9.5 -n istio-system proxy-config <COMPONENT> daemonset/istio-ingressgateway

# You can output as JSON (-o json) for a huge amount of detail.

When needing to troubleshoot TLS certificates used, the JSON output can be parsed with jq and piped into openssl, e.g:

# <RESOURCE NAME> is the name of the certificate as returned by:
istioctl-1.9.5 -n istio-system proxy-config secrets daemonset/istio-ingressgateway

# Dump a certificate chain
istioctl-1.9.5 -n istio-system proxy-config secrets daemonset/istio-ingressgateway -o json | \
  jq '[.dynamicActiveSecrets[] | select(.name == "<RESOURCE NAME>")][0].secret.tlsCertificate.certificateChain.inlineBytes' -r | \
  base64 -d | \
  openssl crl2pkcs7 -nocrl -certfile /dev/stdin | openssl pkcs7 -print_certs -text -noout

# Dump a CA
istioctl-1.9.5 -n istio-system proxy-config secrets daemonset/istio-ingressgateway -o json | \
  jq '[.dynamicActiveSecrets[] | select(.name == "<RESOURCE NAME>")][0].secret.validationContext.trustedCa.inlineBytes' -r | \
  base64 -d | \
  openssl crl2pkcs7 -nocrl -certfile /dev/stdin | openssl pkcs7 -print_certs -text -noout

Dump complete envoy config

If needed, the complete (currently active) envoy config can be dumped from the ingressgateway:

kubectl -n istio-system exec -it daemonset/istio-ingressgateway -- \
  /bin/bash -c 'exec 5<>/dev/tcp/127.0.0.1/15000; echo -ne "GET /config_dump HTTP/1.1\r\nHost: localhost:15000\r\nConnection: close\r\n\r\n" >&5; cat <&5'

See https://www.envoyproxy.io/docs/envoy/latest/operations/admin.html for details on interacting with the envoy admin interface.

You can always kubectl -n istio-system port-forward daemonset/istio-ingressgateway 15000:15000 if you need to deal with it more.

Enable debug logging

Envoy component logging can be changed via istioctl as well:

# Print all components and the currently configured log level:
istioctl-1.9.5 -n istio-system proxy-config log daemonset/istio-ingressgateway

# Change to level of a specific component (admin in this case) to info
istioctl-1.9.5 -n istio-system proxy-config log daemonset/istio-ingressgateway --level admin:info

# Change the levels of all components to info
istioctl-1.9.5 -n istio-system proxy-config log daemonset/istio-ingressgateway --level info

Administration

Add a new service under Ingress

Assuming you want to add service-foo running in the main (wikikube) cluster under Ingress.

Configure certificates (optional)

By default Ingress will be configured with a certificate to terminate TLS for the hostnames:

 * $NAMESPACE_NAME.discovery.wmnet
 * $NAMESPACE_NAME.svc.codfw.wmnet
 * $NAMESPACE_NAME.svc.eqiad.wmnet

Special Cases:

If you have multiple user-facing services in your namespace or the hostnames this can be configured using the tlsHostnames parameter to the namespace in helmfile.d/admin_ng/values/main.yaml. See docs at the start of helmfile.d/admin_ng/values/common.yaml for more details.

If additional SANs are needed that do not match the above schema (like services in the .wikimedia.org domain, add those via the tlsExtraSANs parameter for your namespace in helmfile.d/admin_ng/values/main.yaml.

These changes need to be deployed to all main (wikikube) clusters like described in Kubernetes/Add a new service#Deploy changes to helmfile.d/admin ng

DNS changes

Your service will be made accessible by pointing dedicated CNAME records to the pre-existing LVS k8s-ingress-wikikube.

Add two datacenter specific CNAME records:

; in $ORIGIN svc.eqiad.wmnet.
service-foo       1H  IN CNAME    k8s-ingress-wikikube.svc.eqiad.wmnet.

; in $ORIGIN svc.codfw.wmnet.
service-foo       1H  IN CNAME    k8s-ingress-wikikube.svc.codfw.wmnet.

If your service runs active/active:

; in $ORIGIN discovery.wmnet.
service-foo       300 IN CNAME k8s-ingress-wikikube-ro.discovery.wmnet.

If your service runs active/passive:

; in $ORIGIN discovery.wmnet.
service-foo       300 IN CNAME k8s-ingress-wikikube-rw.discovery.wmnet.

Follow DNS#Changing records in a zonefile to create and deploy the zone
Run the sre.dns.netbox cookbook

Create an entry in the service::catalog

Add an stripped down entry for your service in hieradata/common/service.yaml, like:

  service-foo:
    description: Pretty fooish service, service-foo.svc.%{::site}.wmnet
    encryption: true
    ip: *k8s-ingress-wikikube_ips
    page: false
    probes: # monitoring for this service
      - type: http
    port: *k8s-ingress-wikikube_port
    sites:
      - eqiad
      - codfw
    state: service_setup

With that service::catalog entry, setup diverges from the standard LVS#Create an entry in the service::catalog process - switching to lvs_setup as well as puppet runs on icinga or auth-dns hosts can be skipped, as there is no lvs, discovery or monitoring stanza. This means that we can go directly from service_setup to production. The lack of an lvs stanza also means there is no need to restart pybal on lvs hosts.

It's very likely that whenever puppet needs to run on A:icinga it actually needs to run on P{O:prometheus} instead

Configuration (for service owners)

To enable Ingress for your chart you need to undertake the following steps:

Make sure your chart uses the latest (at least 0.4) version of common_templates
Make sure the common_templates/0.4/_ingress_helpers.tpl is linked to the templates directory of your chart

For the absolute basic setup, all you need to do is enable ingress via your values.yaml:

ingress:
  enabled: true
  staging: true # If you are doing this for a staging service

This will make your service available as https://SERVICE_NAME.discovery.wmnet (https://SERVICE_NAME.k8s-staging.discovery.wmnet:30443 for staging) traffic will be routed as is to all pods of your service in a round-robin fashion. To access your service in staging, make sure to use the correct SNI (for example curl --resolve SERVICE_FQDN:30443:$(dig +short k8s-ingress-staging.discovery.wmnet) https://SERVICE_FQDN:30443).

You may configure more complex routing logic, listen to different or more than one hostname etc. via the ingress configuration stanza. Please keep in mind that for different or additional hostnames you may need SRE assistance to set up certificates etc.

More complex setups

The routing behavior may be modified via the ingress.httproutes stanza which supports all options described in https://istio.io/v1.9/docs/reference/config/networking/virtual-service/#HTTPRoute.

If you want to make several services available as subpaths of a hostname (https://SERVICE_NAME.discovery.wmnet/one, https://SERVICE_NAME.discovery.wmnet/two, ...) you need to make sure to configure only one Istio Gateway (in one of your helm chart releases) for this hostname and attach multiple HTTPRoute objects to it. Multiple Istio Gateway objects claiming the same hostname will simply be ignored.

Assuming you have two releases of your chart (one and two), you may configure ingress like in the following example to achieve that:

---
# release "one" values.yaml:
# made available as https://SERVICE_NAME.discovery.wmnet/one
# via default options + httproute
ingress:
  enabled: true
  httproutes:
  - match:
    - uri:
        prefix: /one
    route:
    - destination:
        host: one-tls-service.SERVICE_NAMESPACE.svc.cluster.local # The cluster internal DNS name for this releases service. Check prefixing of the service name!
        port:
          number: SERVICE_TLS_PUBLIC_PORT # Port you defined in .Values.tls.public_port
---
# release "two" values.yaml:
# made available as https://SERVICE_NAME.discovery.wmnet/two
# via the Gateway deployed by release "one"
ingress:
  enabled: true
  existingGatewayName: "SERVICE_NAMESPACE/one" # referencing the Gateway deployed by the release "one". Check prefixing of the gateway name!
  routeHosts:
  - SERVICE_NAME.discovery.wmnet # Attach the following routes to this hostname in the referenced Gateway
  httproutes:
  - match:
    - uri:
        prefix: /two
    route:
    - destination:
        host: two-tls-service.SERVICE_NAMESPACE.svc.cluster.local # The cluster internal DNS name for this releases service. Check prefixing of the service name!
        port:
          number: SERVICE_TLS_PUBLIC_PORT # Port you defined in .Values.tls.public_port

With the standard helm chart created by create_new_service.sh the values of ingress.existingGatewayName and the cluster local service name are prefixed with the chart service name. So make sure to use the correct prefixed names. The gateway could be existingGatewayName: "SERVICE_NAMESPACE/SERVICE_NAME-one" and the cluster local service name: SERVICE_NAME-one-tls-service.SERVICE_NAMESPACE.svc.cluster.local.

You can double check the prefixing and name of the gateway created by "one" with: kubectl -n NAMESPACE get gateways | grep one and for the cluster local service name use: kubectl -n NAMESPACE get service | grep one.