12:19 Rook: Reduce memory request for single user container T345467
2023-09-01
18:45 Rook: increased worker node count to as all memory was requested T345462
2023-08-31
16:51 taavi: restart uwsgi-toolsdb-replica-cnf-web.service on paws-nfs-2, a required dependency was installed only after last restart which made the service fail
11:37 andrewbogott: rebooting paws-k8s-haproxy-2, paws-acme-chief-01, paws-k8s-haproxy-1 -- their load numbers are through the roof for no obvious reason
08:15 taavi: empty profile::wmcs::paws::control_nodes hiera key to bring PAWS back up (T329581), it contained the hostnames of the old kubeadm backed cluster which should be cleaned up properly in T327674
10:40 dcaro: extended the /srv volumes for both prometheus nodes to 15G to give some space for prometheus to shuffle data (the tsdb size is set to 10G, but it needs a bit more space than that)
21:50 bstorm: deploying latest PRs to add a note on the wikireplicas changes
2020-12-21
20:27 bstorm: applied tuning for timeouts and elections on the k8s etcd pods of 300 for heartbeat and 3000 for elections T267966
2020-12-17
02:22 bstorm: Set PAWS hub back to using mariadb T266587
2020-12-16
18:21 chicocvenancio: move paws to sqlite while toolsdb is down.
2020-12-10
17:00 arturo: fixing /etc/kubernetes/kublet.conf and restarting kubelet in paws-k8s-control-1 (T269865)
2020-12-05
00:42 bd808: `kubectl delete po renderer-794886b9cd-9nc6c -n prod` after seeing lots of listen queue full errors in the pod logs.
2020-11-30
18:22 bstorm: 1.17 upgrade for kubernetes complete T268669
17:25 bstorm: upgrading the worker nodes (this will likely kill services briefly when some pods are rescheduled) T268669
17:14 bstorm: updated the calico-kube-controllers deployment to use our internal registry to deal with docker-hub rate-limiting T268669T269016
17:09 chicocvenancio: delete orphaned jupyter server pod `kubectl -n prod delete pod jupyter--45volutionoftheuniverse`. Respective server not running in jupyter admin UI.
16:31 bstorm: upgrading pods on paws-k8s-control-3 T268669
17:05 bstorm: running the final rsync to the new cluster's nfs T211096
16:08 bstorm: changing paws.wmflabs.org to point at the new cluster ip 185.15.56.57 T211096
16:02 bstorm: LAST MESSAGE WRONG: switching NEW cluster to toolsdb T211096
16:02 bstorm: switching old cluster to toolsdb T211096
15:58 bstorm: switching old cluster to sqlite T211096
15:53 bstorm: downtiming alerts in case they need changes (seems likely) T211096
2020-07-30
20:40 bstorm: upgrading the singleuser image to test shuffling around some of the pip installs
16:38 bstorm: removing the *.paws.wmflabs.org SNI name because it won't be used and it might trigger a re-issue of certs T255249
15:39 bstorm: upgrading acme-chief to 0.27-1
2020-07-29
18:03 bstorm: powering on paws-k8s-haproxy-1 because that worked fine
18:00 bstorm: powering off paws-k8s-haproxy-1 to test failover
2020-07-24
17:25 bstorm: to force repulling of every image everywhere, uninstalling paws in the new cluster and reinstalling it T258812
09:39 arturo: dropped the DNS wildcard record `*.paws.wmcloud.org IN A 185.15.56.57` and created concrete CNAME records for the FQDNs we actually use (T211096)
2020-07-23
22:51 bstorm: deploying via the default 'latest' tag in the new cluster T211096
22:48 bstorm: tagged the newbuild tags with "latest" to set sane defaults for all images in the helm chart T211096
21:14 bstorm: pushing quay.io/wikimedia-paws-prod/nbserve:newbuild to main repo T211096
21:11 bstorm: pushing quay.io/wikimedia-paws-prod/deploy-hook:newbuild to main repo T211096
21:09 bstorm: pushing quay.io/wikimedia-paws-prod/singleuser:newbuild to the main repo T211096
21:08 bstorm: pushing quay.io/wikimedia-paws-prod/paws-hub:newbuild to the main repo T211096
21:06 bstorm: pushing dbproxy docker image for new cluster into main quay.io repo T211096
2020-07-22
23:32 bstorm: setting the default NFS version to 4.2 while excepting the two stretch servers T257945
21:41 bstorm: deployed ingress to redirect paws.wmcloud.org to the wikitech doc page T195217
2020-06-30
23:00 bstorm: added paws-public.wmflabs.org to the alt-names for acme-chief, which broke it until we hand off the zone to the paws project <sorry!> T195217T255997
2020-06-26
21:57 bstorm: applied the metrics manifests to kubernetes to enable metrics-server, cadvisor, etc. T256361
2020-06-25
22:52 bstorm: created paws-k8s-worker-5/6/7 as x-large nodes to bring the cluster up to roughly the same capacity as the existing one using soft anti-affinity T211096T253267
22:43 bstorm: bumped quota up to 24 instances, 128 GB RAM and 56 cores T211096
16:39 bstorm: deleted the deployhook from the in-progress new cluster for now just in case T211096
15:44 bstorm: deployed a proof-of-concept paws-public setup in the new cluster T255997
2020-06-24
23:18 bstorm: added A record for *.paws.wmcloud.org to public and hub to use T211096T255997T195217
21:45 bstorm: doing an initial rsync of the paws userhomes to the new project T160113
2020-06-19
10:01 arturo: enabled `paws.wmflabs.org` and `*.paws.wmflabs.org` as valid ingress domains (acme-chief TLS cert, haproxy, etc) (T195217)
2020-06-17
21:51 bstorm_: upgraded chart in the new cluster to include resource limits T251298
21:51 bstorm_: upgraded chart in the new cluster to include resource limits
2020-06-16
15:48 arturo: change DNS record k8s.svc.paws.eqiad1.wikimedia.cloud to point to the haproxy VIP port address 172.16.1.171 (T195217)
15:47 arturo: associate floating IP 185.15.56.57 with haproxy VIP port (T295217)
15:43 arturo: allow traffic to haproxy VM ports from the VIP port: `sudo wmcs-openstack port set --allowed-address ip-address=172.16.1.171 1b40be58-7182-41aa-95ce-797f94f83d66` (T295217)
15:43 arturo: allow traffic to haproxy VM ports from the VIP port: `sudo wmcs-openstack port set --allowed-address ip-address=172.16.1.171 9ccc43d9-1a8a-4287-afda-67e8bab27a9f` (T295217)
15:59 arturo: created DNS record `deploy-hook.paws.wmcloud.org IN CNAME paws.wmcloud.org` (T195217)
12:28 arturo: manually created an Ingress object to test routing to the hub (T195217)
12:20 arturo: created DNS record `paws.wmcloud.org IN A 185.15.56.57` (T195217)
12:19 arturo: associate floating IP 185.15.56.57 with VM paws-k8s-haproxy-1 (T195217)
12:18 arturo: release floating IP not in use: 185.15.56.42
12:18 arturo: release floating IP not in use: 185.15.56.43
11:45 arturo: reset wikitech user password for the service account `paws-dns-manager` to what is in labs/private.git/hieradata/common.yaml `profile::acme_chief::cloud::designate_sync_password` (T195217)
2020-06-12
18:49 bstorm_: deployed a test of paws chart in the new cluster T211096
13:23 arturo: assigned the DNS zone `paws.wmcloud.org` (T195217)
13:13 arturo: live-hacking session in the puppetmaster ended
13:05 arturo: live-hacking puppet tree in paws-puppetmaster-01 for T195217
12:18 arturo: bootstrapped paws-k8s-ingress nodes, added them to the k8s cluster (T195217)
12:04 arturo: created `paws-k8s-ingress` puppet prefix and add the `role::wmcs::paws::k8s::worker` role (T195217)
12:02 arturo: created 2 medium VM instances: paws-k8s-ingress-1 and paws-k8s-ingress-2 with haproxy anti-affinity (T195217)
2020-05-26
22:34 bstorm_: restored the deployment for maintain-kubeusers so anyone added to the paws.admin group will have admin on the cluster now that the bug is fixed T211096T246059
22:05 bstorm_: temporarily deleted the deployment for maintain-kubeusers pending patch to fix context creation for new admin accounts T211096T246059
22:04 bstorm_: created paws-focused PodSecurityPolicies and the prod namespace in the new cluster T211096
22:03 bstorm_: created paws.admin group and kubernetes admin accounts on the new k8s cluster T211096T246059
18:29 bstorm_: bootstrapped the new control plane nodes T211096
15:27 bstorm_: updated profile::wmcs::kubeadm::kubernetes_version to 1.16.10 for cluster init T211096
2020-05-21
23:04 bstorm_: added profile::wmcs::kubeadm::k8s::encryption_key and profile::wmcs::kubeadm::k8s::node_token to labs/private T211096
14:53 bstorm_: adding the hiera values to horizon for bootstrapping k8s T211096
14:39 arturo: point record `k8s.svc.paws.eqiad1.wikimedia.cloud` to `172.16.1.186` (which is paws-k8s-control-1, for the initial bootstrap) (T211096)
12:48 arturo: created record `k8s.svc.paws.eqiad1.wikimedia.cloud` pointing to `172.16.0.191` (which is paws-k8s-haproxy-1) (T211096)
12:34 arturo: created and transferred DNS zone `svc.paws.eqiad1.wikimedia.cloud` (T211096)
2020-05-20
22:35 bstorm_: created paws-k8s-worker-1/2/3/4 T211096
22:12 bstorm_: created paws-k8s-haproxy-1/2 with antiaffinity group T211096
21:36 bstorm_: created paws-k8s-control-1/2/3 with appropriate sec group and server group T211096
18:59 bstorm_: created anti-affinity group "controlplane" T211096
16:38 bstorm_: deleting the old shut-down VMs from the last effort to rebuild paws T211096
16:36 bstorm_: cleaned up the old DNS entries for the external LBs that have been off for a year
2020-03-20
14:03 jeh: upgrade paws-puppetmaster-01 to v5 T241719
2020-02-14
21:31 andrewbogott: restarting paws-puppetmaster-01 so its clients can connect
00:27 bstorm_: rebooting the paws master since it is in a bad state after the openstack maintenance as well.
2019-11-01
21:15 Krenair: Updated paws-apiserver.wmflabs.org A record list to remove 172.16.2.151 which is not allocated to any instance. The other two A records point to valid instances in the paws project.
2019-10-23
09:03 arturo: paws-master-01/03 and a couple of other servers are down because hypervisor is rebooting
2019-10-14
22:32 bd808: Removed project member "Afrodric". Looks like someone added accidentally when trying to make aborrero as project member
22:31 bd808: Added Krenair as project member
2019-05-18
11:13 chicocvenancio: point paws-proxy-02 to tools-paws-worker-1006 on paws-deploy-hook hostname (T218380)