This page holds notes and ideas regarding the ~~Stretch~~ Buster migration of Toolforge, specially related to k8s. See News/2020 Kubernetes cluster migration for how this all ended up!

2018-10-04

Meeting attendats: Andrew, Chase, Arturo.

Timeline

refactor toolforge puppet code (Brooke already started)
get puppet compiler for VMs so we can actually test puppet code
k8s: allocate a couple of weeks to play with callbacks and kubeadmin and evaluate if they are the way to go.

Thinks to take into account

probably going directly to Stretch is the way to go.
- By the time we end, Buster may be stable already
- (and Jessie old-old-stable)

k8s: jump versions when moving to eqiad1?
- 1.4 --> 1.12
- does the new version works for custom ingress controllers?
- does kubdeadm works for us?
- integration with nova-proxy?
- try in a cloudvps project, in a pre-defined time slot

lots of Yuvi hacks. What do we do with them
- https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Custom_admission_controllers
- try to do the same admission controller but without the custom Yuvi code. Mind k8s versioning.
- native UidEnforcer probably already in place
- native RegisterEnforcer ??
- HostPathEnforcer: probably use new callback JSON mechanisms
- same for HostAutoMounter
- kubeadmin is probably the way to go. Already using for PAWS.

ingress controller for k8s
- currently using kubeproxy, nginx
- it worth migrate to some native k8s ingress controller

co-existence of k8s clusters

gridengine
- write puppte code from scratch, a new deployment side-by-side with the old
- both grids co-exists and users start launching things in the new one
- a script or something to make sure nothing from the same tool is runnin gin the old grid

Networking

currently using vxlan overlay. We can probably carry over the same model.

2019-06-01

Involved people: mostly Brooke and Arturo.

All puppet code is being developed for Debian Stretch. No kubernetes support for Debian Buster by this time (not even in production).

Related phabricator tasks used to track development:

Deploy upgraded Kubernetes to toolsbeta https://phabricator.wikimedia.org/T215531

etcd

The puppet code for k8s etcd was refactored/reworked into role::wmcs::toolforge::k8s::etcd. It uses the base etcd classes, shared with production.

The setup is intended to be a 3 nodes-cluster, which requires this hiera config (example):

profile::etcd::cluster_bootstrap: true
profile::toolforge::k8s::etcd_hosts:
- toolsbeta-arturo-k8s-etcd-1.toolsbeta.eqiad.wmflabs
- toolsbeta-arturo-k8s-etcd-2.toolsbeta.eqiad.wmflabs
- toolsbeta-arturo-k8s-etcd-3.toolsbeta.eqiad.wmflabs
profile::toolforge::k8s_masters_hosts:
- toolsbeta-arturo-k8s-master-1.toolsbeta.eqiad.wmflabs
- toolsbeta-arturo-k8s-master-2.toolsbeta.eqiad.wmflabs
- toolsbeta-arturo-k8s-master-3.toolsbeta.eqiad.wmflabs
profile::ldap::client::labs::client_stack: sssd
sudo_flavor: sudo

This setup uses TLS with puppet certs.

k8s master

The puppet code for k8s masters was refactored/reworked into role::wmcs::toolforge::k8s::master. It uses the base kubernetes puppet modules, shared with production.

The setup is intented to be a 3 nodes-cluster, which requires this hiera config (example):

profile::toolforge::k8s::etcd_hosts:
- toolsbeta-arturo-k8s-etcd-1.toolsbeta.eqiad.wmflabs
- toolsbeta-arturo-k8s-etcd-2.toolsbeta.eqiad.wmflabs
- toolsbeta-arturo-k8s-etcd-3.toolsbeta.eqiad.wmflabs
profile::ldap::client::labs::client_stack: sssd
sudo_flavor: sudo

Each master node run 3 important systemd services:

kube-apiserver
kube-controller-manager
kube-scheduler

They use TLS by means of puppet certs.

k8s worker nodes

Ongoing work, the puppet role is role::wmcs::toolforge::k8s::node.

k8s API proxy

We plan to use a 3 nodes-cluster to provide an HA proxy for the kubernetes API itself.

This is an ongoing work, the puppet role is role::wmcs::toolforge::k8s::apilb.

2019-06-27

participants:

Brooke
Arturo
Jason

agenda

time investment?
- yuvi wanted kubeadm
current situation with node authentication against the api-server
kubeadm vs raw puppet systemd-based services
PKI stuff

next steps

RBAC database missing in etcd? Which daemon should create this?
Kubernetes api-server bootstrap

https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/08-bootstrapping-kubernetes-controllers.md

Let's try kubeadm for the next couple of week! yaml configuration file in puppet.git
- kubeadm package: a component in our repository https://phabricator.wikimedia.org/T215975
- we may want to use kubeadm 1.15 directly
- puppet tree:
  - use yaml configuration file for kubeadm, stored in puppet.git
  - use modules/toolforge/ for when it makes sense (kubeadm repo, etc?)
  - use profile::toolforge::k8s::kubeadm::{master,node,etc} for the other components
- use external etcd bootstrapped by kubeadm

2019-07-04

Involved people: mostly Brooke and Arturo.

We started trying with kubeadm for the cluster deployment. New puppet code was introduce for the basics, such as package installation and initial configuration file distributions.
We were able to deploy a basic cluster, which we can now as the starting point to actually beging building the Toolforge k8s service on top of it.

Steps:

Done Phabricator T215529 - Puppetize/stand up a load balancer for K8s API servers
Done Phabricator T215975 - Package/copy kubeadm, kubelet, docker-ce and kubectl to Toolforge Aptly or Reprepro
Done Phabricator T215531 - Deploy upgraded Kubernetes to toolsbeta
pending Design and document the concrete workflow/steps for bootstrapping the k8s cluster
pending Design and document how to integrate the cluster with PodSecurityPolicy (https://kubernetes.io/docs/concepts/policy/pod-security-policy/)
pending Design the integration with other custom admission controllers (specific for Toolforge)
pending Do initial tests on how Toolforge users could interact with the cluster

By the time of this writting, we have 3 kinds of servers involved
Each one has a diferent puppet role and a different hiera config, which is left here as example. You should refer to the puppet tree as the source of truth:

API LB: role::wmcs::toolforge::k8s::apilb
Requires a DNS name: toolsbeta-k8s-master.toolsbeta.wmflabs.org pointing to this VM instance.
Hiera config:

profile::toolforge::k8s::api_servers:
  toolsbeta-test-k8s-master-1:
    fqdn: toolsbeta-test-k8s-master-1.toolsbeta.eqiad.wmflabs
    port: 6443
  toolsbeta-test-k8s-master-2:
    fqdn: toolsbeta-test-k8s-master-2.toolsbeta.eqiad.wmflabs
    port: 6443
  toolsbeta-test-k8s-master-3:
    fqdn: toolsbeta-test-k8s-master-3.toolsbeta.eqiad.wmflabs
    port: 6443

Master: role::wmcs::toolforge::k8s::kubeadm::master
Hiera config:

profile::toolforge::k8s::apiserver: toolsbeta-k8s-master.toolsbeta.wmflabs.org
profile::toolforge::k8s::dns_domain: toolsbeta.eqiad.wmflabs
profile::toolforge::k8s::node_token: m7uakr.ern5lmlpv7gnkacw
sudo_flavor: sudo
swap_partition: false
profile::ldap::client::labs::client_stack: sssd

Worker: role::wmcs::toolforge::k8s::kubeadm::node
Hiera config:

profile::ldap::client::labs::client_stack: sssd
profile::toolforge::k8s::apiserver: toolsbeta-k8s-master.toolsbeta.wmflabs.org
profile::toolforge::k8s::node_token: m7uakr.ern5lmlpv7gnkacw
sudo_flavor: sudo
swap_partition: false

Random snippit found: To quickly obtain the --discover-token-ca-cert-hash argument, on an existing control plane node, run

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

The output of that should be used like,

kubeadm join --token $bootstrap_token --control-plane --discovery-token-ca-cert-hash sha256:$output_from_above_command --certificate-key $key

2019-07-16

We reach a point in which we are confident with the cluster lifecycle.

The following components are used:

an external load balancer for the k8s API (role::wmcs::toolforge::k8s::apilb) (no additional setup but hiera config)
an external etcd server for k8s (3 nodes) (role::wmcs::toolforge::k8s::etcd) (no additional setup but hiera config. Buster already)
control plane nodes (role::wmcs::toolforge::k8s::kubeadm::master) (requires hiera config)
worker nodes (role::wmcs::toolforge::k8s::kubeadm::node)

In the first control plane node:

root@toolsbeta-test-k8s-master-1:~# kubeadm init --config /etc/kubernetes/kubeadm-init.yaml --upload-certs
[...]
root@toolsbeta-test-k8s-master-1:~# cp /etc/kubernetes/admin.conf $HOME/.kube/config
root@toolsbeta-test-k8s-master-1:~# kubectl apply -f /etc/kubernetes/calico.yaml
[...]

For additional control plane nodes:

root@toolsbeta-test-k8s-master-1:~# kubeadm --config /etc/kubernetes/kubeadm-init.yaml init phase upload-certs --upload-certs
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
0e323a45a4212c78994e30f8f3b9a6f77a1b475e696e12e7bf5f7cbd72ea5871
root@toolsbeta-test-k8s-master-1:~# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
3637ded9d0ac4e45952214e43b3107055d090ea0c13a176c4607f907662034f1

root@toolsbeta-test-k8s-master-2:~# kubeadm join toolsbeta-k8s-master.toolsbeta.wmflabs.org:6443 --token m7uakr.ern5lmlpv7gnkacw --discovery-token-ca-cert-hash sha256:<openssl_output> --control-plane --certificate-key <upload_certs_output>
[...]

For worker nodes:

aborrero@toolsbeta-test-k8s-worker-1:~ $ sudo kubeadm join toolsbeta-k8s-master.toolsbeta.wmflabs.org:6443 --token m7uakr.ern5lmlpv7gnkacw --discovery-token-ca-cert-hash sha256:<openssl_output>

Note that:

deleting a node requires kubectl delete node <nodename (case of VM deletion), adding a node requires the steps outlined above.
we use puppet certs for the etcd client connection
we enforce client certs on etcd server side

Interesting commands for etcd:

aborrero@toolsbeta-test-k8s-etcd-1:~ $ sudo ETCDCTL_API=3 etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 --key=/var/lib/puppet/ssl/private_keys/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem --cert=/var/lib/puppet/ssl/certs/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem del "" --from-key=true
145

aborrero@toolsbeta-test-k8s-etcd-1:~ $ sudo ETCDCTL_API=3 etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 --key=/var/lib/puppet/ssl/private_keys/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem --cert=/var/lib/puppet/ssl/certs/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem  get / --prefix --keys-only | wc -l
290

aborrero@toolsbeta-test-k8s-etcd-1:~ $ sudo ETCDCTL_API=3 etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 --key=/var/lib/puppet/ssl/private_keys/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem --cert=/var/lib/puppet/ssl/certs/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem member add toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs --peer-urls="https://toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2380"
Member bf6c18ddf5414879 added to cluster a883bf14478abd33

ETCD_NAME="toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs"
ETCD_INITIAL_CLUSTER="toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs=https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2380,toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs=https://toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

aborrero@toolsbeta-test-k8s-etcd-1:~ $ sudo etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 --key-file /var/lib/puppet/ssl/private_keys/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem --cert-file /var/lib/puppet/ssl/certs/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem cluster-health
member 67a7255628c1f89f is healthy: got healthy result from https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379
member bf6c18ddf5414879 is healthy: got healthy result from https://toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379
cluster is healthy

2019-07-30

To load nginx-ingress:

root@toolsbeta-test-k8s-master-1:~# kubectl apply -f /etc/kubernetes/kubeadm-nginx-ingress-psp.yaml 
clusterrolebinding.rbac.authorization.k8s.io/nginx-ingress-psp created

root@toolsbeta-test-k8s-master-1:~# kubectl apply -f /etc/kubernetes/kubeadm-nginx-ingress.yaml 
namespace/nginx-ingress unchanged
serviceaccount/nginx-ingress unchanged
configmap/nginx-config configured
clusterrole.rbac.authorization.k8s.io/nginx-ingress unchanged
clusterrolebinding.rbac.authorization.k8s.io/nginx-ingress unchanged
secret/default-server-secret created
deployment.apps/nginx-ingress created

See phab:T228500 for more details.

2019-08-14

Working prototype of maintain-kubeusers is here: phab:T228499 The general design of certs looks like the following.

The x.509 certs only allow authn. Authz is managed via RBAC and PSPs (design for which is in progress).

2019-09-23

Docs for PSP and RBAC notions: Portal:Toolforge/Admin/Kubernetes/RBAC and PSP Going to add a bit more to that.

2019-10-10

k8s discussion on several open questions.

Folks:

Brooke
Jason
Arturo
Hieu
Bryan

Topics:

Toolforge ingress: decide on how ingress configuration objects will be managed https://phabricator.wikimedia.org/T234231
- admission control for ingress objects
- some options:
  - a daemon detecting which tools are online and generating the ingress config automagically. The webservice command does not generate the ingress config. Management of ingress objects in the API is forbidden for end users.
  - a custom admission controller to enforce correct ingress config, and have the webservice command generate it. The API allows users to manage ingress objects, because we are enforcing a valid config in the API.
  - some mixed thing using a CRD.
- Prototype idea by Brooke: https://phabricator.wikimedia.org/T228500#5548074
- for reference: https://gerrit.wikimedia.org/g/labs/tools/registry-admission-webhook/+/refs/heads/master
- We agree in having the custom admission controller webhook!

Toolforge ingress: decide on final layout of north-south proxy setup https://phabricator.wikimedia.org/T234037
- diagrams: https://phab.wmfusercontent.org/file/data/4a7jhuvypd4bgoxstmc7/PHID-FILE-v6g63ro3sb7ts2htm6yl/image.png
- Bryan: having toolforge.org only for k8s is not very good (i.e, we would rather provide it for the webgrid too)
- Bryan: the new domain name should not be available only for the new k8s
- Arturo: how a given proxy knows if a $tool.toolforge.org tool is running in the grid, the legacy k8s or the new k8s?
- option 4 is discarded: it would be difficult to introduce toolforge.org to the old grid. Complex SSL handling.
- Hieu: option 3 does not require ingress at all. Brooke: but we want it so our cluster supports more use cases
- Bryan: what about rate-limiting, etc? Hieu: rate-limiting using annotations: https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/nginx-configuration/annotations.md
- option 1 is discarded: we feel options 2 and 3 supersedded it
- Arturo proposal: let's follow approach of option 3. If we find a blocker, then follow option 2.
  - Add a fallthrough route in dynamicproxy to redirect to the new k8s cluster. In the first iteration, dynamicproxy knows nothing of toolforge.org
  - SSL: add SAN for the tools.wmflabs.org certificate that includes toolforge.org?
  - first iteration: don't introduce the new domain yet?

Proposal: use option #3 (dynamicproxy -> { legacy things || new k8s ingress })
- Try to introduce the new domain just for a cuple of weeks. If after a couple of weeks we aren't able, then move on. In such case, introducing the new domain will be a future Q goal or whatever.

Toolforge. introduce new domain toolforge.org https://phabricator.wikimedia.org/T234617
- how, when, etc
- Bryan: the new domain name should not be available only for the new k8s
- Jason: What about not using *.toolforge.org wildcard certifiacte. Using let's encrypt / acme-chief we could afford having a certificate per tool.
  - Jason: single domain certs per container could potentially offer better security
  - Bryan: ~600 single domains could be hard to manage (+1 Jason)

Toolforge: refresh puppet code for proxy (dynamicproxy) to support Debian Buster https://phabricator.wikimedia.org/T235059
- not much to discuss. This is ongoing work by arturo.

Toolforge ingress: create a default landing page for unknown/default URLs https://phabricator.wikimedia.org/T234032
- we need a "default route" in the new ingress setup. I'm sure Bryan has some ideas about what to do with this.

There is a new upstream k8s release 1.16. We are developing in 1.15. Shall we upgrade before moving forward?
- API changes may slow us down.
- Brooke thinks strongly no and favors 15.2+ and that series for the first deploy. They changed a number of important API objects.

Deciding on a deadline for firsts testing tools (openstack-browser?)
- rebuilding the current toolsbeta-test cluster just for sanity when all the moving parts are decided+1

last minute Brooke add: quotas https://phabricator.wikimedia.org/T234702#5561572
- similar workflow to what we have in CloudVPS: a default quota that can be tunned later on.