Portal:Toolforge/Admin/Kubernetes/New cluster
This page contains information on how to deploy kubernetes for our Toolforge setup. This refers to the basic building blocks on bare-metal (such as etcd, controller, worker, etcs) and not to end-user apps running inside kubernetes.
general considerations
Please take this into account when trying to build a cluster following these instructions.
- This was only tested in Debian Buster
- You will need a set of packages in reprepro, in
thirdparty/kubeadm-k8s
, seemodules/aptrepo/files/updates
in the operations puppet tree. - You need to upload several docker images to our internal docker registry once the regitry admission controller is deployed. See docker registry: uploading custom images.
etcd nodes
A working etcd cluster is the starting point for a working k8s deployments. All other k8s components require it.
The role for the VM should be role::wmcs::toolforge::k8s::etcd
.
Typical hiera configuration looks like:
profile::etcd::cluster_bootstrap: false
profile::toolforge::k8s::etcd_nodes:
- tools-k8s-etcd-1.tools.eqiad1.wikimedia.cloud
- tools-k8s-etcd-2.tools.eqiad1.wikimedia.cloud
- tools-k8s-etcd-3.tools.eqiad1.wikimedia.cloud
profile::puppet::agent::dns_alt_names:
- tools-k8s-etcd-1.tools.eqiad1.wikimedia.cloud
- tools-k8s-etcd-2.tools.eqiad1.wikimedia.cloud
- tools-k8s-etcd-3.tools.eqiad1.wikimedia.cloud
Because the DNS alt names, Puppet certs will need to be signed by the master with the following command:
aborrero@tools-puppetmaster-02:~ $ sudo puppet cert --allow-dns-alt-names sign tools-k8s-etcd-1.tools.eqiad1.wikimedia.cloud
In case a brand-new etcd cluster, the profile::etcd::cluster_bootstrap
should be set to true
.
A basic cluster health-check command:
user@tools-k8s-etcd-1:~$ sudo ETCDCTL_API=3 etcdctl --endpoints https://$(hostname -f):2379 --key-file /var/lib/puppet/ssl/private_keys/$(hostname -f).pem --cert-file /var/lib/puppet/ssl/certs/$(hostname -f).pem cluster-health
member 67a7255628c1f89f is healthy: got healthy result from https://tools-k8s-etcd-4.tools.eqiad1.wikimedia.cloud:2379
member 822c4bd670e96cb1 is healthy: got healthy result from https://tools-k8s-etcd-5.tools.eqiad1.wikimedia.cloud:2379
member cacc7abd354d7bbf is healthy: got healthy result from https://tools-k8s-etcd-6.tools.eqiad1.wikimedia.cloud:2379
cluster is healthy
See if etcd is actually storing data:
user@tools-k8s-etcd-1:~$ sudo ETCDCTL_API=3 etcdctl --endpoints https://tools-k8s-etcd-4.tools.eqiad1.wikimedia.cloud:2379 --key=/var/lib/puppet/ssl/private_keys/tools-k8s-etcd-4.tools.eqiad1.wikimedia.cloud.pem --cert=/var/lib/puppet/ssl/certs/tools-k8s-etcd-4.tools.eqiad1.wikimedia.cloud.pem get / --prefix --keys-only | wc -l
290
Delete all data in etcd (warning!), for a fresh k8s start:
user@tools-k8s-etcd-1:~$ sudo ETCDCTL_API=3 etcdctl --endpoints https://tools-k8s-etcd-1.tools.eqiad1.wikimedia.cloud:2379 --key=/var/lib/puppet/ssl/private_keys/tools-k8s-etcd-1.tools.eqiad1.wikimedia.cloud.pem --cert=/var/lib/puppet/ssl/certs/tools-k8s-etcd-1.tools.eqiad1.wikimedia.cloud.pem del "" --from-key=true
145
Add a new member to the etcd cluster:
We currently have a spicerack cookbook (setup) simplifying the task, so to add a new etcd node to the tools project, you can just run:
> cookbook wmcs.toolforge.add_etcd_node --project tools
note that for toolsbeta, you'll have to provide the option --etcd-prefix
as the VM names there don't adhere to the general prefix template.
To do the same manually:
user@tools-k8s-etcd-1:~$ sudo ETCDCTL_API=3 etcdctl --endpoints https://$(hostname -f):2379 --key /var/lib/puppet/ssl/private_keys/$(hostname -f).pem --cert /var/lib/puppet/ssl/certs/$(hostname -f).pem member add tools-k8s-etcd-2.tools.eqiad1.wikimedia.cloud --peer-urls="https://tools-k8s-etcd-2.tools.eqiad1.wikimedia.cloud:2380"
Member bf6c18ddf5414879 added to cluster a883bf14478abd33
ETCD_NAME="tools-k8s-etcd-2.tools.eqiad1.wikimedia.cloud"
ETCD_INITIAL_CLUSTER="tools-k8s-etcd-1.tools.eqiad1.wikimedia.cloud=https://tools-k8s-etcd-1.tools.eqiad1.wikimedia.cloud:2380,tools-k8s-etcd-2.tools.eqiad1.wikimedia.cloud=https://tools-k8s-etcd-2.tools.eqiad1.wikimedia.cloud:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
NOTE: joining the new node (the member add
command above) should be done in a pre-existing node before trying to start the etcd service in the new node.
NOTE: that the etcd service uses puppet certs.
NOTE: these VMs use internal firewalling by ferm. Rules won't change with DNS changes. After creating or destroying VMs that reuse DNS names you might want to force restart of the firewall with something like:
user@cloud-cumin-01:~$ sudo cumin --force -x 'O{project:toolsbeta name:tools-k8s-etcd-.*}' 'systemctl restart ferm'
List current members of a cluster
user@tools-k8s-etcd-13:~$ sudo ETCDCTL_API=3 etcdctl --endpoints https://$(hostname -f):2379 --key /var/lib/puppet/ssl/private_keys/$(hostname -f).pem --cert /var/lib/puppet/ssl/certs/$(hostname -f).pem member list
214aee26edbb7483, started, tools-k8s-etcd-13.tools.eqiad1.wikimedia.cloud, https://tools-k8s-etcd-13.tools.eqiad1.wikimedia.cloud:2380, https://tools-k8s-etcd-13.tools.eqiad1.wikimedia.cloud:2379
25ad48dc38f5b822, started, tools-k8s-etcd-17.tools.eqiad1.wikimedia.cloud, https://tools-k8s-etcd-17.tools.eqiad1.wikimedia.cloud:2380, https://tools-k8s-etcd-17.tools.eqiad1.wikimedia.cloud:2379
3cc7fd0010b673e8, started, tools-k8s-etcd-18.tools.eqiad1.wikimedia.cloud, https://tools-k8s-etcd-18.tools.eqiad1.wikimedia.cloud:2380, https://tools-k8s-etcd-18.tools.eqiad1.wikimedia.cloud:2379
front proxy (haproxy)
The kubernetes front proxy servers both the k8s API (tcp/6443) and the ingress (tcp/30000). Is one of the key components of kubernetes networking and ingress. We use haproxy for this, in a hot-standby setup with keepalived and a virtual ip address. There should be a couple of VMs, but only one is really working at a moment.
There is a DNS name k8s.svc.tools.eqiad1.wikimedia.cloud
that should be pointing to the virtual IP address. No public floating IP involved.
Kubernetes itself talks to k8s.tools.eqiad1.wikimedia.cloud
(no svc.) for now (due to certificate names), which is a CNAME to the svc. name.
The puppet role for the VMs is role::wmcs::toolforge::k8s::haproxy
and a typical hiera configuration looks like:
profile::toolforge::k8s::apiserver_port: 6443
profile::toolforge::k8s::control_nodes:
- tools-k8s-control-1.tools.eqiad1.wikimedia.cloud
- tools-k8s-control-2.tools.eqiad1.wikimedia.cloud
- tools-k8s-control-3.tools.eqiad1.wikimedia.cloud
profile::toolforge::k8s::ingress_port: 30000
profile::toolforge::k8s::worker_nodes:
- tools-k8s-worker-1.tools.eqiad1.wikimedia.cloud
- tools-k8s-worker-2.tools.eqiad1.wikimedia.cloud
prometheus::haproxy_exporter::endpoint: http://localhost:8404/stats;csv
# TODO: add keepalived config
NOTE: in case of toolsbeta, the VMs need a security group that allows connectivity between the front proxy (in tools) and haproxy (in toolsbeta). This security group is called k8s-dynamicproxy-to-haproxy and TCP ports should match those in hiera.
NOTE: in the case of initial bootstrap of the k8s cluster, the FQDN k8s.tools.eqiad1.wikimedia.cloud
needs to point to the first control node since otherwise haproxy won't see any active backend and kubeadm will fail.
NOTE: all HAProxy VMs need to be allowed to use the virtual ip address on Neutron.
control nodes
The controller nodes are the servers in which the key internal components of kubernetes are running, such as the api-server, scheduler, controller, etc.
There should be 3 control nodes, VMs of at least 2 CPUs and no swap.
The puppet role for the VMs is role::wmcs::toolforge::k8s::control
.
Our puppetization requires two values in the labs/private hiera config. One for node_token
and one for encryption_key
for encrypting secrets at rest in etcd. If using the toolforge versions of the keys, they are profile::toolforge::k8s::node_token
and profile::toolforge::k8s::encryption_key
while in the generic kubeadm version they are profile::wmcs::kubeadm::k8s::encryption_key
and profile::wmcs::kubeadm::k8s::node_token
. The node_token
value is a random string matching this regex [a-z0-9]{6}\.[a-z0-9]{16}
and is used for joining nodes to the cluster, thus it should be regarded as a secret. Once the token expires, it isn't secret anymore. Tokens can be created and deleted in kubeadm overall, but having one in the config for bootstrap can be useful if you don't want to generate new ones every time. The encryption key is an AES CBC key per the upstream docs. You can create one using the command head -c 32 /dev/urandom | base64
. It is simpler to have that configuration in place rather than go back and re-encrypt everything later like was done on the initial build for Toolforge.
Typical hiera configuration:
profile::toolforge::k8s::apiserver_fqdn: k8s.tools.eqiad1.wikimedia.cloud profile::toolforge::k8s::etcd_nodes: - tools-k8s-etcd-1.tools.eqiad1.wikimedia.cloud - tools-k8s-etcd-2.tools.eqiad1.wikimedia.cloud - tools-k8s-etcd-3.tools.eqiad1.wikimedia.cloud swap_partition: false
NOTE: if creating or deleting control nodes, you might want to restart the firewall in etcd nodes. (18/11/2020 dcaro: this was not needed when adding a control node to toolsbeta)
NOTE: you should reboot the control node VM after the initial puppet run, to make sure iptables alternatives are taken into account by docker and kube-proxy.
NOTE: control and worker nodes require the tools-new-k8s-full-connectivity
neutron security group (this might not be needed, see T268140.
bootstrap
With bootstrap we refer to the process of creating the k8s cluster from scratch. In this particular case, there are no control nodes yet. You are installing the first one.
In this initial situation, the FQDN k8s.tools.eqiad1.wikimedia.cloud
should point to the initial controller node, since haproxy won't proxy anything to the yet-to-be-ready api-server.
Also, make sure the etcd server is totally fresh and clean, i.e, doesn't store anything from previous clusters.
In the first control server, run the following commands:
root@tools-k8s-control-1:~# kubeadm init --config /etc/kubernetes/kubeadm-init.yaml --upload-certs
[...]
root@tools-k8s-control-1:~# mkdir -p $HOME/.kube
root@tools-k8s-control-1:~# cp /etc/kubernetes/admin.conf $HOME/.kube/config
root@tools-k8s-control-1:~# kubectl apply -f /etc/kubernetes/psp/base-pod-security-policies.yaml
podsecuritypolicy.policy/privileged-psp created
clusterrole.rbac.authorization.k8s.io/privileged-psp created
rolebinding.rbac.authorization.k8s.io/kube-system-psp created
podsecuritypolicy.policy/default created
root@tools-k8s-control-1:~# kubectl apply -f /etc/kubernetes/calico.yaml
[...]
root@tools-k8s-control-1:~# kubectl apply -f /etc/kubernetes/toolforge-tool-roles.yaml
[...]
root@tools-k8s-control-1:~# kubectl apply -k /srv/git/maintain-kubeusers/deployments/toolforge
[...]
After this, the cluster has been boostrapped and has 1 single control node. This should work:
root@tools-k8s-control-1:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
tools-k8s-control-1 Ready master 3m26s v1.15.1
root@tools-k8s-control-1:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-59f54d6bbc-9cjml 1/1 Running 0 2m12s
kube-system calico-node-g4hr7 1/1 Running 0 2m12s
kube-system coredns-5c98db65d4-5wgmh 1/1 Running 0 2m16s
kube-system coredns-5c98db65d4-5xmnt 1/1 Running 0 2m16s
kube-system kube-apiserver-tools-k8s-control-1 1/1 Running 0 96s
kube-system kube-controller-manager-tools-k8s-control-1 1/1 Running 0 114s
kube-system kube-proxy-7d48c 1/1 Running 0 2m15s
kube-system kube-scheduler-tools-k8s-control-1 1/1 Running 0 106s
existing cluster
Once the first control node is bootstrapped, we consider the cluster to be existing. But this cluster is designed to have 3 control nodes.
Adding additional nodes is described as follows
NOTE: pay special attention to FQDNs (k8s.<project>.eqiad1.wikimedia.cloud, ...) and connectivity. You may need to restart ferm
after updating the hiera keys in etcd nodes before you can add more control nodes to an existing cluster.
NOTE: control and worker nodes require the tools-new-k8s-full-connectivity
neutron security group, this can be added after the instance is spun up.
First you need to obtain some data from a pre-existing control node:
root@tools-k8s-control-1:~# kubeadm token create
bs2psl.wcxkn5la28xrxoa1
root@tools-k8s-control-1:~# kubeadm --config /etc/kubernetes/kubeadm-init.yaml init phase upload-certs --upload-certs
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
2a673bbc603c0135b9ada19b862d92c46338e90798b74b04e7e7968078c78de9
root@tools-k8s-control-1:~# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
44550243d244837e17ae866e318e5d49e7db978c3a68b71216f541ca6dd18704
Then, in the new control node:
root@tools-k8s-control-2:~# kubeadm join k8s.tools.eqiad1.wikimedia.cloud:6443 --token ${TOKEN_OUTPUT} --discovery-token-ca-cert-hash sha256:${OPENSSL_OUTPUT} --control-plane --certificate-key ${UPLOADCERTS_OUTPUT}
root@tools-k8s-control-2:~# mkdir -p $HOME/.kube
root@tools-k8s-control-2:~# cp /etc/kubernetes/admin.conf $HOME/.kube/config
The complete cluster should show 3 control nodes and the corresponding pods in the kube-system namespace:
root@tools-k8s-control-2:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-59f54d6bbc-9cjml 1/1 Running 0 117m
kube-system calico-node-dfbqd 1/1 Running 0 109m
kube-system calico-node-g4hr7 1/1 Running 0 117m
kube-system calico-node-q5phv 1/1 Running 0 108m
kube-system coredns-5c98db65d4-5wgmh 1/1 Running 0 117m
kube-system coredns-5c98db65d4-5xmnt 1/1 Running 0 117m
kube-system kube-apiserver-tools-k8s-control-1 1/1 Running 0 116m
kube-system kube-apiserver-tools-k8s-control-2 1/1 Running 0 109m
kube-system kube-apiserver-tools-k8s-control-3 1/1 Running 0 108m
kube-system kube-controller-manager-tools-k8s-control-1 1/1 Running 0 117m
kube-system kube-controller-manager-tools-k8s-control-2 1/1 Running 0 109m
kube-system kube-controller-manager-tools-k8s-control-3 1/1 Running 0 108m
kube-system kube-proxy-7d48c 1/1 Running 0 117m
kube-system kube-proxy-ft8zw 1/1 Running 0 109m
kube-system kube-proxy-fx9sp 1/1 Running 0 108m
kube-system kube-scheduler-tools-k8s-control-1 1/1 Running 0 117m
kube-system kube-scheduler-tools-k8s-control-2 1/1 Running 0 109m
kube-system kube-scheduler-tools-k8s-control-3 1/1 Running 0 108m
root@tools-k8s-control-2:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
tools-k8s-control-1 Ready master 123m v1.15.1
tools-k8s-control-2 Ready master 112m v1.15.1
tools-k8s-control-3 Ready master 111m v1.15.1
NOTE: you might want to make sure the FQDN k8s.tools.eqiad1.wikimedia.cloud
is pointing to the active haproxy node, since you now have api-servers responding in the haproxy backends.
If this is the second node in the cluster, now that you have more than one node in the cluster, delete one of the two coredns pods with, for example, kubectl -n kube-system delete pods coredns-5c98db65d4-5xmnt
so that the deployment will spin up a new pod on another control plane server rather than running both coredns pods on the same server.
reconfiguring control plane elements after deployment
Kubeadm doesn't directly reconfigure standing nodes except, potentially, during upgrades. Therefore a change to the init file won't do much for a cluster that is already built. To make a change to some element of the control plane, such as kube-apiserver command-line arguments, you will want to change:
- The ConfigMap in the kube-system namespace called
kubeadm-config
. It can be altered with a command likeroot@tools-k8s-control-2:~# kubectl edit cm -n kube-system kubeadm-config
- The manifest for the control plane element you are altering, eg. adding a command line argument for kube-apiserver by editing /etc/kubernetes/manifests/kube-apiserver.yaml, which will automatically restart the service.
This should prevent kubeadm from overwriting changes you made by hand later.
NOTE: Remember to change the manifest files on all control plane nodes.
worker nodes
Worker nodes should be created in VM instances with minimun 2 CPUs and Debian Buster as operating system. Worker nodes should have at least 40 GB in a separate docker-reserved ephemeral disk, which is currently provided with the flavor g3.cores8.ram16.disk20.ephem140
.
Using cookbooks
We have a spicerack cookbooks that simplifies adding a new worker node to an existing toolforge instance, just run:
Adding a node
12:34 PM <operations-cookbooks-python3> ~/Work/wikimedia/operations-cookbooks (wmcs|✔)
dcaro@vulcanus$ cookbook --config ~/.config/spicerack/cookbook.yaml wmcs.toolforge.add_k8s_worker_node --help
usage: cookbooks.wmcs.toolforge.add_k8s_worker_node [-h] --project PROJECT [--task-id TASK_ID] [--k8s-worker-prefix K8S_WORKER_PREFIX] [--k8s-control-prefix K8S_CONTROL_PREFIX] [--flavor FLAVOR] [--image IMAGE]
WMCS Toolforge cookbook to add a new worker node
optional arguments:
-h, --help show this help message and exit
--project PROJECT Openstack project where the toolforge installation resides. (default: None)
--task-id TASK_ID Id of the task related to this operation (ex. T123456) (default: None)
--k8s-worker-prefix K8S_WORKER_PREFIX
Prefix for the k8s worker nodes, default is <project>-k8s-worker. (default: None)
--k8s-control-prefix K8S_CONTROL_PREFIX
Prefix for the k8s control nodes, default is the k8s_worker_prefix replacing 'worker' by 'control'. (default: None)
--flavor FLAVOR Flavor for the new instance (will use the same as the latest existing one by default, ex. g2.cores4.ram8.disk80, ex. 06c3e0a1-f684-4a0c-8f00-551b59a518c8). (default: None)
--image IMAGE Image for the new instance (will use the same as the latest existing one by default, ex. debian-10.0-buster, ex. 64351116-a53e-4a62-8866-5f0058d89c2b) (default: None)
Example (adding a new worker with same image/flavor):
12:34 PM <operations-cookbooks-python3> ~/Work/wikimedia/operations-cookbooks (wmcs|✔)
dcaro@vulcanus$ cookbook --config ~/.config/spicerack/cookbook.yaml wmcs.toolforge.add_k8s_worker_node --project toolforge --task-id T674384
It will take care of everything (partitions, puppet master swap, puppet runs, kubeadm join, ...).
Removing a node
This will remove a node from the worker pool and do all the config changes (not many for workers):
12:34 PM <operations-cookbooks-python3> ~/Work/wikimedia/operations-cookbooks (wmcs|✔)
dcaro@vulcanus$ cookbook --config ~/.config/spicerack/cookbook.yaml wmcs.toolforge.remove_k8s_worker_node --help
usage: cookbooks.wmcs.toolforge.worker.depool_and_remove_node [-h] --project PROJECT [--fqdn-to-remove FQDN_TO_REMOVE] [--control-node-fqdn CONTROL_NODE_FQDN] [--k8s-worker-prefix K8S_WORKER_PREFIX] [--task-id TASK_ID]
WMCS Toolforge cookbook to remove and delete an existing k8s worker node
optional arguments:
-h, --help show this help message and exit
--project PROJECT Openstack project to manage. (default: None)
--fqdn-to-remove FQDN_TO_REMOVE
FQDN of the node to remove, if none passed will remove the intance with the lower index. (default: None)
--control-node-fqdn CONTROL_NODE_FQDN
FQDN of the k8s control node, if none passed will try to get one from openstack. (default: None)
--k8s-worker-prefix K8S_WORKER_PREFIX
Prefix for the k8s worker nodes, default is <project>-k8s-worker (default: None)
--task-id TASK_ID Id of the task related to this operation (ex. T123456) (default: None)
Example (removing the oldest worker):
12:34 PM <operations-cookbooks-python3> ~/Work/wikimedia/operations-cookbooks (wmcs|✔)
dcaro@vulcanus$ cookbook --config ~/.config/spicerack/cookbook.yaml wmcs.toolforge.remove_k8s_worker_node --project toolforge --task-id T674384
Manually
Prefer using cookbooks | ||||
---|---|---|---|---|
The puppet role for them is swap_partition: false NOTE: you should reboot the worker node VM after the initial puppet run, to make sure iptables alternatives are taken into account by docker and kube-proxy and calico.
Because Toolforge uses a local Puppetmaster, each instance will need manual intervention before it is ready to put into service. See Help:Standalone_puppetmaster#Step 2: Setup a puppet client for details.
Once Puppet is running cleanly on the new instance and it has been rebooted to ensure that all iptables rules are applied properly, you are ready to join it to the cluster. First get a couple of values from the control nodes that will be used to create the cluster join command: root@tools-k8s-control-1:~# kubeadm token create
bs2psl.wcxkn5la28xrxoa1
root@tools-k8s-control-1:~# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
44550243d244837e17ae866e318e5d49e7db978c3a68b71216f541ca6dd18704
And then run kubeadm: root@tools-k8s-worker-1:~# kubeadm join k8s.tools.eqiad1.wikimedia.cloud:6443 --token ${TOKEN_OUTPUT} --discovery-token-ca-cert-hash sha256:${OPENSSL_OUTPUT}
After this, you should see the new worker node being reported: root@tools-k8s-control-2:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
tools-k8s-control-1 Ready master 162m v1.15.1
tools-k8s-control-2 Ready master 151m v1.15.1
tools-k8s-control-3 Ready master 150m v1.15.1
tools-k8s-worker-1 Ready <none> 53s v1.15.1
tools-k8s-worker-2 NotReady <none> 20s v1.15.1
Once you have joined all of the new nodes you are adding to the cluster, revoke the token that you used to bootstrap kubeadm: $ kubeadm token delete ${TOKEN_OUTPUT}
bootstrap token "..." deleted
Bryan's bulk instance process
|
ingress nodes
Ingress nodes are just dedicated worker nodes that don't need as much disk. Currently, many have a dedicated /var/lib/docker LVM volume, but that can be disabled as unnecessary with the hiera value profile::wmcs::kubeadm::docker_vol: false
so you don't have to use a very large flavor. Follow the steps for worker nodes, and pass --role ingress
to the cookbook.
Other components
Once the basic componets are deployed (etcd, haproxy, control, worker nodes), other components should be deployed as well.
Kubernetes components
All the kubernetes level components that we deploy can be found in the toolforge-deploy repository. See also: Portal:Toolforge/Admin/Kubernetes/Custom_components.
Non-kubernetes components
first tool: fourohfour
This should be one of the first tools deployed, since this handles 404 situations for webservices. The kubernetes service provided by this tool is set as the default backend for nginx-ingress.
TODO: describe how to deploy it.
Metrics
We have an external prometheus server (i.e. prometheus is not running inside the k8s cluster). This server has the name pattern tools-prometheus-*.tools.eqiad1.wikimedia.cloud.
Access control rules that prometheus needs to access the cluster are deployed with the cookbook. You will then need to generate the x509 certs that prometheus will use to auth.
root@tools-k8s-control-2:~# wmcs-k8s-get-cert prometheus
/tmp/tmp.7JaiWyso9m/server-cert.pem
/tmp/tmp.7JaiWyso9m/server-key.pem
Do scp the certs to your laptop. Place the files in the final destinations:
- public key in the operations/puppet.git repository, in files/ssl/toolforge-k8s-prometheus.crt.
- private key in the labs/private.git repository of the project puppetmaster in modules/secret/secrets/ssl/toolforge-k8s-prometheus.key.
The cert expires in 1 year and this operation should be repeated. See Portal:Toolforge/Admin/Kubernetes/Certificates#External API access for more details.
See also
- Toolforge Kubernetes networking and ingress
- Toolforge k8s RBAC and PodSecurityPolicy -- documentation page
- Upgrading Kubernetes in Toolforge
- phabricator T215531 - Deploy upgraded Kubernetes to toolsbeta (epic task)
- phabricator T237643 - toolforge: new k8s: figure out metrics / observability (about prometheus)