Portal:Toolforge/Admin/Kubernetes/2020 Kubernetes cluster rebuild plan notes
This page holds notes and ideas regarding the Stretch Buster migration of Toolforge, specially related to k8s. See News/2020 Kubernetes cluster migration for how this all ended up!
2018-10-04
Meeting attendats: Andrew, Chase, Arturo.
Timeline
- refactor toolforge puppet code (Brooke already started)
- get puppet compiler for VMs so we can actually test puppet code
- k8s: allocate a couple of weeks to play with callbacks and kubeadmin and evaluate if they are the way to go.
Thinks to take into account
- probably going directly to Stretch is the way to go.
- By the time we end, Buster may be stable already
- (and Jessie old-old-stable)
- k8s: jump versions when moving to eqiad1?
- 1.4 --> 1.12
- does the new version works for custom ingress controllers?
- does kubdeadm works for us?
- integration with nova-proxy?
- try in a cloudvps project, in a pre-defined time slot
- lots of Yuvi hacks. What do we do with them
- https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Custom_admission_controllers
- try to do the same admission controller but without the custom Yuvi code. Mind k8s versioning.
- native UidEnforcer probably already in place
- native RegisterEnforcer ??
- HostPathEnforcer: probably use new callback JSON mechanisms
- same for HostAutoMounter
- kubeadmin is probably the way to go. Already using for PAWS.
- ingress controller for k8s
- currently using kubeproxy, nginx
- it worth migrate to some native k8s ingress controller
- co-existence of k8s clusters
- gridengine
- write puppte code from scratch, a new deployment side-by-side with the old
- both grids co-exists and users start launching things in the new one
- a script or something to make sure nothing from the same tool is runnin gin the old grid
Networking
- currently using vxlan overlay. We can probably carry over the same model.
2019-06-01
Involved people: mostly Brooke and Arturo.
All puppet code is being developed for Debian Stretch. No kubernetes support for Debian Buster by this time (not even in production).
Related phabricator tasks used to track development:
- Deploy upgraded Kubernetes to toolsbeta https://phabricator.wikimedia.org/T215531
etcd
The puppet code for k8s etcd was refactored/reworked into role::wmcs::toolforge::k8s::etcd
.
It uses the base etcd classes, shared with production.
The setup is intended to be a 3 nodes-cluster, which requires this hiera config (example):
profile::etcd::cluster_bootstrap: true profile::toolforge::k8s::etcd_hosts: - toolsbeta-arturo-k8s-etcd-1.toolsbeta.eqiad.wmflabs - toolsbeta-arturo-k8s-etcd-2.toolsbeta.eqiad.wmflabs - toolsbeta-arturo-k8s-etcd-3.toolsbeta.eqiad.wmflabs profile::toolforge::k8s_masters_hosts: - toolsbeta-arturo-k8s-master-1.toolsbeta.eqiad.wmflabs - toolsbeta-arturo-k8s-master-2.toolsbeta.eqiad.wmflabs - toolsbeta-arturo-k8s-master-3.toolsbeta.eqiad.wmflabs profile::ldap::client::labs::client_stack: sssd sudo_flavor: sudo
This setup uses TLS with puppet certs.
k8s master
The puppet code for k8s masters was refactored/reworked into role::wmcs::toolforge::k8s::master
.
It uses the base kubernetes puppet modules, shared with production.
The setup is intented to be a 3 nodes-cluster, which requires this hiera config (example):
profile::toolforge::k8s::etcd_hosts: - toolsbeta-arturo-k8s-etcd-1.toolsbeta.eqiad.wmflabs - toolsbeta-arturo-k8s-etcd-2.toolsbeta.eqiad.wmflabs - toolsbeta-arturo-k8s-etcd-3.toolsbeta.eqiad.wmflabs profile::ldap::client::labs::client_stack: sssd sudo_flavor: sudo
Each master node run 3 important systemd services:
- kube-apiserver
- kube-controller-manager
- kube-scheduler
They use TLS by means of puppet certs.
k8s worker nodes
Ongoing work, the puppet role is role::wmcs::toolforge::k8s::node
.
k8s API proxy
We plan to use a 3 nodes-cluster to provide an HA proxy for the kubernetes API itself.
This is an ongoing work, the puppet role is role::wmcs::toolforge::k8s::apilb
.
2019-06-27
participants:
- Brooke
- Arturo
- Jason
agenda
- time investment?
- yuvi wanted kubeadm
- current situation with node authentication against the api-server
- kubeadm vs raw puppet systemd-based services
- PKI stuff
next steps
- RBAC database missing in etcd? Which daemon should create this?
- Kubernetes api-server bootstrap
- Let's try kubeadm for the next couple of week! yaml configuration file in puppet.git
- kubeadm package: a component in our repository https://phabricator.wikimedia.org/T215975
- we may want to use kubeadm 1.15 directly
- puppet tree:
- use yaml configuration file for kubeadm, stored in puppet.git
- use modules/toolforge/ for when it makes sense (kubeadm repo, etc?)
- use profile::toolforge::k8s::kubeadm::{master,node,etc} for the other components
- use external etcd bootstrapped by kubeadm
2019-07-04
Involved people: mostly Brooke and Arturo.
We started trying with kubeadm for the cluster deployment. New puppet code was introduce for the basics, such as package installation and initial configuration file distributions.
We were able to deploy a basic cluster, which we can now as the starting point to actually beging building the Toolforge k8s service on top of it.
Steps:
- Done Phabricator T215529 - Puppetize/stand up a load balancer for K8s API servers
- Done Phabricator T215975 - Package/copy kubeadm, kubelet, docker-ce and kubectl to Toolforge Aptly or Reprepro
- Done Phabricator T215531 - Deploy upgraded Kubernetes to toolsbeta
- pending Design and document the concrete workflow/steps for bootstrapping the k8s cluster
- pending Design and document how to integrate the cluster with PodSecurityPolicy (https://kubernetes.io/docs/concepts/policy/pod-security-policy/)
- pending Design the integration with other custom admission controllers (specific for Toolforge)
- pending Do initial tests on how Toolforge users could interact with the cluster
By the time of this writting, we have 3 kinds of servers involved
Each one has a diferent puppet role and a different hiera config, which is left here as example. You should refer to the puppet tree as the source of truth:
- API LB:
role::wmcs::toolforge::k8s::apilb
- Requires a DNS name:
toolsbeta-k8s-master.toolsbeta.wmflabs.org
pointing to this VM instance. - Hiera config:
profile::toolforge::k8s::api_servers: toolsbeta-test-k8s-master-1: fqdn: toolsbeta-test-k8s-master-1.toolsbeta.eqiad.wmflabs port: 6443 toolsbeta-test-k8s-master-2: fqdn: toolsbeta-test-k8s-master-2.toolsbeta.eqiad.wmflabs port: 6443 toolsbeta-test-k8s-master-3: fqdn: toolsbeta-test-k8s-master-3.toolsbeta.eqiad.wmflabs port: 6443
- Master:
role::wmcs::toolforge::k8s::kubeadm::master
- Hiera config:
profile::toolforge::k8s::apiserver: toolsbeta-k8s-master.toolsbeta.wmflabs.org profile::toolforge::k8s::dns_domain: toolsbeta.eqiad.wmflabs profile::toolforge::k8s::node_token: m7uakr.ern5lmlpv7gnkacw sudo_flavor: sudo swap_partition: false profile::ldap::client::labs::client_stack: sssd
- Worker:
role::wmcs::toolforge::k8s::kubeadm::node
- Hiera config:
profile::ldap::client::labs::client_stack: sssd profile::toolforge::k8s::apiserver: toolsbeta-k8s-master.toolsbeta.wmflabs.org profile::toolforge::k8s::node_token: m7uakr.ern5lmlpv7gnkacw sudo_flavor: sudo swap_partition: false
Random snippit found: To quickly obtain the --discover-token-ca-cert-hash argument, on an existing control plane node, run
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
The output of that should be used like,
kubeadm join --token $bootstrap_token --control-plane --discovery-token-ca-cert-hash sha256:$output_from_above_command --certificate-key $key
2019-07-16
We reach a point in which we are confident with the cluster lifecycle.
The following components are used:
- an external load balancer for the k8s API (
role::wmcs::toolforge::k8s::apilb
) (no additional setup but hiera config) - an external etcd server for k8s (3 nodes) (
role::wmcs::toolforge::k8s::etcd
) (no additional setup but hiera config. Buster already) - control plane nodes (
role::wmcs::toolforge::k8s::kubeadm::master
) (requires hiera config) - worker nodes (
role::wmcs::toolforge::k8s::kubeadm::node
)
In the first control plane node:
root@toolsbeta-test-k8s-master-1:~# kubeadm init --config /etc/kubernetes/kubeadm-init.yaml --upload-certs
[...]
root@toolsbeta-test-k8s-master-1:~# cp /etc/kubernetes/admin.conf $HOME/.kube/config
root@toolsbeta-test-k8s-master-1:~# kubectl apply -f /etc/kubernetes/calico.yaml
[...]
For additional control plane nodes:
root@toolsbeta-test-k8s-master-1:~# kubeadm --config /etc/kubernetes/kubeadm-init.yaml init phase upload-certs --upload-certs
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
0e323a45a4212c78994e30f8f3b9a6f77a1b475e696e12e7bf5f7cbd72ea5871
root@toolsbeta-test-k8s-master-1:~# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
3637ded9d0ac4e45952214e43b3107055d090ea0c13a176c4607f907662034f1
root@toolsbeta-test-k8s-master-2:~# kubeadm join toolsbeta-k8s-master.toolsbeta.wmflabs.org:6443 --token m7uakr.ern5lmlpv7gnkacw --discovery-token-ca-cert-hash sha256:<openssl_output> --control-plane --certificate-key <upload_certs_output>
[...]
For worker nodes:
aborrero@toolsbeta-test-k8s-worker-1:~ $ sudo kubeadm join toolsbeta-k8s-master.toolsbeta.wmflabs.org:6443 --token m7uakr.ern5lmlpv7gnkacw --discovery-token-ca-cert-hash sha256:<openssl_output>
Note that:
- deleting a node requires
kubectl delete node <nodename
(case of VM deletion), adding a node requires the steps outlined above. - we use puppet certs for the etcd client connection
- we enforce client certs on etcd server side
Interesting commands for etcd:
aborrero@toolsbeta-test-k8s-etcd-1:~ $ sudo ETCDCTL_API=3 etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 --key=/var/lib/puppet/ssl/private_keys/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem --cert=/var/lib/puppet/ssl/certs/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem del "" --from-key=true
145
aborrero@toolsbeta-test-k8s-etcd-1:~ $ sudo ETCDCTL_API=3 etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 --key=/var/lib/puppet/ssl/private_keys/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem --cert=/var/lib/puppet/ssl/certs/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem get / --prefix --keys-only | wc -l
290
aborrero@toolsbeta-test-k8s-etcd-1:~ $ sudo ETCDCTL_API=3 etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 --key=/var/lib/puppet/ssl/private_keys/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem --cert=/var/lib/puppet/ssl/certs/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem member add toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs --peer-urls="https://toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2380"
Member bf6c18ddf5414879 added to cluster a883bf14478abd33
ETCD_NAME="toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs"
ETCD_INITIAL_CLUSTER="toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs=https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2380,toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs=https://toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
aborrero@toolsbeta-test-k8s-etcd-1:~ $ sudo etcdctl --endpoints https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 --key-file /var/lib/puppet/ssl/private_keys/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem --cert-file /var/lib/puppet/ssl/certs/toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs.pem cluster-health
member 67a7255628c1f89f is healthy: got healthy result from https://toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379
member bf6c18ddf5414879 is healthy: got healthy result from https://toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379
cluster is healthy
2019-07-30
To load nginx-ingress:
root@toolsbeta-test-k8s-master-1:~# kubectl apply -f /etc/kubernetes/kubeadm-nginx-ingress-psp.yaml
clusterrolebinding.rbac.authorization.k8s.io/nginx-ingress-psp created
root@toolsbeta-test-k8s-master-1:~# kubectl apply -f /etc/kubernetes/kubeadm-nginx-ingress.yaml
namespace/nginx-ingress unchanged
serviceaccount/nginx-ingress unchanged
configmap/nginx-config configured
clusterrole.rbac.authorization.k8s.io/nginx-ingress unchanged
clusterrolebinding.rbac.authorization.k8s.io/nginx-ingress unchanged
secret/default-server-secret created
deployment.apps/nginx-ingress created
See phab:T228500 for more details.
2019-08-14
Working prototype of maintain-kubeusers is here: phab:T228499 The general design of certs looks like the following.
The x.509 certs only allow authn. Authz is managed via RBAC and PSPs (design for which is in progress).
2019-09-23
Docs for PSP and RBAC notions: Portal:Toolforge/Admin/Kubernetes/RBAC and PSP Going to add a bit more to that.
2019-10-10
k8s discussion on several open questions.
Folks:
- Brooke
- Jason
- Arturo
- Hieu
- Bryan
Topics:
- Toolforge ingress: decide on how ingress configuration objects will be managed https://phabricator.wikimedia.org/T234231
- admission control for ingress objects
- some options:
- a daemon detecting which tools are online and generating the ingress config automagically. The webservice command does not generate the ingress config. Management of ingress objects in the API is forbidden for end users.
- a custom admission controller to enforce correct ingress config, and have the webservice command generate it. The API allows users to manage ingress objects, because we are enforcing a valid config in the API.
- some mixed thing using a CRD.
- Prototype idea by Brooke: https://phabricator.wikimedia.org/T228500#5548074
- for reference: https://gerrit.wikimedia.org/g/labs/tools/registry-admission-webhook/+/refs/heads/master
- We agree in having the custom admission controller webhook!
- Toolforge ingress: decide on final layout of north-south proxy setup https://phabricator.wikimedia.org/T234037
- diagrams: https://phab.wmfusercontent.org/file/data/4a7jhuvypd4bgoxstmc7/PHID-FILE-v6g63ro3sb7ts2htm6yl/image.png
- Bryan: having toolforge.org only for k8s is not very good (i.e, we would rather provide it for the webgrid too)
- Bryan: the new domain name should not be available only for the new k8s
- Arturo: how a given proxy knows if a $tool.toolforge.org tool is running in the grid, the legacy k8s or the new k8s?
- option 4 is discarded: it would be difficult to introduce toolforge.org to the old grid. Complex SSL handling.
- Hieu: option 3 does not require ingress at all. Brooke: but we want it so our cluster supports more use cases
- Bryan: what about rate-limiting, etc? Hieu: rate-limiting using annotations: https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/nginx-configuration/annotations.md
- option 1 is discarded: we feel options 2 and 3 supersedded it
- Arturo proposal: let's follow approach of option 3. If we find a blocker, then follow option 2.
- Add a fallthrough route in dynamicproxy to redirect to the new k8s cluster. In the first iteration, dynamicproxy knows nothing of toolforge.org
- SSL: add SAN for the tools.wmflabs.org certificate that includes toolforge.org?
- first iteration: don't introduce the new domain yet?
- Proposal: use option #3 (dynamicproxy -> { legacy things || new k8s ingress })
- Try to introduce the new domain just for a cuple of weeks. If after a couple of weeks we aren't able, then move on. In such case, introducing the new domain will be a future Q goal or whatever.
- Toolforge. introduce new domain toolforge.org https://phabricator.wikimedia.org/T234617
- how, when, etc
- Bryan: the new domain name should not be available only for the new k8s
- Jason: What about not using *.toolforge.org wildcard certifiacte. Using let's encrypt / acme-chief we could afford having a certificate per tool.
- Jason: single domain certs per container could potentially offer better security
- Bryan: ~600 single domains could be hard to manage (+1 Jason)
- Toolforge: refresh puppet code for proxy (dynamicproxy) to support Debian Buster https://phabricator.wikimedia.org/T235059
- not much to discuss. This is ongoing work by arturo.
- Toolforge ingress: create a default landing page for unknown/default URLs https://phabricator.wikimedia.org/T234032
- we need a "default route" in the new ingress setup. I'm sure Bryan has some ideas about what to do with this.
- There is a new upstream k8s release 1.16. We are developing in 1.15. Shall we upgrade before moving forward?
- API changes may slow us down.
- Brooke thinks strongly no and favors 15.2+ and that series for the first deploy. They changed a number of important API objects.
- Deciding on a deadline for firsts testing tools (openstack-browser?)
- rebuilding the current toolsbeta-test cluster just for sanity when all the moving parts are decided+1
- last minute Brooke add: quotas https://phabricator.wikimedia.org/T234702#5561572
- similar workflow to what we have in CloudVPS: a default quota that can be tunned later on.