Jump to content

Nova Resource:Paws/SAL

From Wikitech

2024-06-20

  • 11:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0)
  • 11:44 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_project_to_ovs

2024-06-18

  • 20:46 andrewbogott: rebuilt magnum cluster using g4/ovs flavors

2024-05-06

  • 17:15 Rook: upgrade pywikibot T364188

2024-04-29

  • 17:54 Rook: upgrade pywikibot T363131
  • 17:54 Rook: jupyterlab to 4.1.8 T363596
  • 16:50 Rook: k8s to 1.26 T326985

2024-03-25

  • 15:59 Rook: upgrade jupyter chart. New cluster T360643
  • 13:01 Rook: use upstream jupyter-rsession-proxy T360800

2024-03-15

  • 16:16 Rook: upgrade jupyterlab T360193

2024-03-09

  • 16:06 Rook: increase number of worker nodes T359747

2024-03-08

  • 12:33 Rook: upgrade pywikibot T359616
  • 11:49 Rook: upgrade jupyterlab version T359588

2024-03-07

  • 21:01 Rook: increase worker count to manage outreachy load T359591

2024-03-06

  • 12:07 Rook: increase capacity for outreachy T359316

2024-03-05

  • 13:25 Rook: remove jupyter-dash T358621

2024-03-04

  • 13:57 Rook: add wikibase-cli T358649

2024-02-26

  • 16:09 Rook: remove nbextension moving away from notebook interface T312234

2024-02-21

  • 16:07 Rook: increase prometheus retention T357786

2024-02-20

  • 16:17 Rook: upgrade jupyterlab T357990

2024-02-15

  • 18:33 Rook: update openresty T357698
  • 13:59 Rook: upgrade jupyterlab to 4.1.1 T357027

2024-02-12

  • 15:56 Rook: upgrade OpenRefine T356448

2024-02-09

  • 17:48 Rook: jupyterlab upgraded to 4.1.0 T357027

2024-02-01

  • 20:17 Rook: prometheus and kube-state-metrics internal to cluster T355179

2024-01-29

  • 12:46 Rook: update jupyerlab T355890

2024-01-26

  • 15:25 Rook: update to allow for s3 tofu state storage in codfw1dev T355543

2024-01-24

  • 12:56 Rook: Remove 123-11 cluster T355785

2024-01-18

  • 08:05 Rook: upgrade rstudio-server T355288

2024-01-12

  • 14:00 Rook: removed paws-123-10 cluster T354946

2023-12-08

  • 14:41 Rook: upgrade OpenRefine T353021

2023-12-06

  • 14:28 Rook: pywikibot to 8.6 T352794

2023-11-27

  • 12:33 Rook: jupyterlab to 4.0.9 T351726

2023-11-15

  • 19:46 Rook: move to opentofu T351249

2023-11-13

  • 18:17 Rook: remove old cluster T350875
  • 13:21 Rook: pwb version bump T351015

2023-11-09

  • 14:34 Rook: updated ingress-nginx T347506

2023-11-06

  • 17:10 Rook: pywikibot to 8.5.0 T350552

2023-11-03

  • 09:27 Rook: bump jupyterlab version T350459

2023-10-24

  • 13:03 Rook: removed old cluster T349551

2023-10-23

  • 18:35 Rook: deploy new cluster/jupyterhub chart T349545

2023-10-19

  • 15:37 Rook: Bump jupyterlab version T349203
  • 14:53 Rook: Bump urllib3 from 1.26.17 to 1.26.18
  • 12:59 Rook: Bump urllib3 from 1.26.16 to 1.26.17

2023-10-18

  • 12:31 Rook: Jupyterlab to 3.0.6 T347108
  • 11:01 Rook: update OpenRefine T348464

2023-10-09

  • 15:50 Rook: bump pwb version T348372

2023-09-29

  • 09:58 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
  • 09:56 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console
  • 09:56 wm-bot2: dcaro@urcuchillay END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255)
  • 09:56 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console

2023-09-21

  • 14:24 Rook: bump pwb version T346912

2023-09-05

  • 12:19 Rook: Reduce memory request for single user container T345467

2023-09-01

  • 18:45 Rook: increased worker node count to as all memory was requested T345462

2023-08-31

  • 16:51 taavi: restart uwsgi-toolsdb-replica-cnf-web.service on paws-nfs-2, a required dependency was installed only after last restart which made the service fail
  • 13:52 Rook: pwb to version 8.3.2 T345192

2023-08-30

  • 12:39 Rook: Upgrade notebook allowing i18n support T121478

2023-08-28

2023-08-22

  • 18:06 Rook: bump jupyterlab version T344265

2023-08-21

  • 20:03 Rook: pywikibot version bump T344493

2023-08-18

  • 14:07 Rook: unrar added

2023-08-15

  • 14:08 Rook: T343116 update helm chart, jupyterlab, jupyterhub, notebook. Dropping sparql to allow for update.

2023-07-28

  • 12:26 Rook: pwb version bump T342852

2023-07-17

  • 18:33 Rook: update OpenRefine version T341985

2023-07-13

2023-07-12

  • 15:22 Rook: paws deploying from tf in codfw1dev T340657

2023-07-10

  • 11:18 Rook: shut down systems associated with T341457 should only be redirect page.
  • 11:00 Rook: removed paws-db-backup-1 T340054

2023-06-30

2023-06-29

  • 13:52 Rook: T340698 OpenRefine version bump

2023-06-13

  • 18:16 Rook: Rolling jupyterlab back to last known working version T338981

2023-06-12

  • 16:02 Rook: updated jupyterlab T338337

2023-06-07

  • 10:30 Rook: upgrade jupyterlab T324002

2023-05-15

  • 12:00 Rook: implemented limits to storage. Cleared about half the storage used T327936

2023-05-03

2023-04-24

  • 13:28 Rook: pywikibot version bump 5466962 T335237
  • 11:37 andrewbogott: rebooting paws-k8s-haproxy-2, paws-acme-chief-01, paws-k8s-haproxy-1 -- their load numbers are through the roof for no obvious reason

2023-04-17

  • 11:13 Rook: robots.txt to nbserve 1fbc786

2023-04-13

  • 21:19 Rook: revert jupyterlab upgrade, causing notebooks to not load T324002

2023-04-12

2023-04-10

2023-04-03

2023-03-27

  • 19:41 Rook: Upgrade Wikimedia Commons extension for OpenRefine on PAWS to version 0.1.1 4b10d3f T332721
  • 17:50 Rook: bump pwb version 7c49518 T333067

2023-03-20

2023-03-16

  • 13:13 Rook: pywikibot version bump 6de17db

2023-03-15

  • 16:46 Rook: Install Wikimedia Commons extension for OpenRefine aaae2fd Install Wikimedia Commons extension for OpenRefine
  • 16:08 Rook: openrefine version bump 9047213 T331747

2023-03-14

2023-03-13

  • 10:40 Rook: Restructure paws away from special networking (Change paws domain name) df16f35 T328842

2023-02-20

2023-02-18

2023-02-16

  • 12:58 Rook: moving to new cluster. Old one was restarting hub and couldn't find all of its nodes

2023-02-14

  • 13:27 Rook: Bump oauthlib 97f241b
  • 08:15 taavi: empty profile::wmcs::paws::control_nodes hiera key to bring PAWS back up (T329581), it contained the hostnames of the old kubeadm backed cluster which should be cleaned up properly in T327674

2023-02-13

2023-01-31

2023-01-30

  • 13:51 Rook: Set 1 to = 1 for nfs mounts and variables 09b036e T326675
  • 11:53 Rook: updated ingress-nginx to allow larger file (more than 800K) uploads T328168

2023-01-25

2023-01-24

2023-01-23

2023-01-19

2023-01-17

2023-01-12

  • 14:09 Rook: Variables for nbserve and renderer resource requests 4eada3f T326723

2023-01-11

2023-01-10

2023-01-04

2023-01-03

2022-12-16

  • 16:08 dcaro: removing coe cluster rook3 (T325373)

2022-12-13

2022-12-05

2022-12-01

  • 08:30 taavi: root@paws-k8s-control-1:~# for cert in etcd-server etcd-peer etcd-healthcheck-client; do kubeadm certs renew $cert; done # T324178

2022-11-28

2022-11-02

  • 13:17 Rook: T322063 Adding commentary regarding nginx rewrites b11757b

2022-10-31

2022-10-27

2022-10-24

2022-10-20

2022-10-17

2022-10-10

2022-10-05

  • 10:40 dcaro: extended the /srv volumes for both prometheus nodes to 15G to give some space for prometheus to shuffle data (the tsdb size is set to 10G, but it needs a bit more space than that)
  • 10:18 arturo: aborrero@paws-k8s-control-1:~$ sudo -i kubectl -n prod rollout restart deployment/proxy (T319366)
  • 10:02 dcaro: checking the prometheus instance for mounts from labstore
  • 10:01 dcaro: checking db-backup for labstore mounts (it should have according to hiera)
  • 10:00 dcaro: note that ingress were not mounting labstore already
  • 10:00 dcaro: removing labstore mounts on ingress-4
  • 09:58 dcaro: removing labstore mounts on ingress-3
  • 09:57 dcaro: removing labstore mounts on acme-chief
  • 09:56 dcaro: removing labstore mounts on control-3
  • 09:54 dcaro: removing labstore mounts on control-2

2022-10-03

2022-09-27

2022-09-26

  • 13:56 taavi: restart the 6 singleuser pods that don't have the new dumps mount points attached yet T317144
  • 12:42 Rook: Upgrade julia to 1.8.1 #210 51307a7 T318276
  • 11:15 Rook: bump pywikibot version #211 T318519 b540c81
  • 08:42 Rook: Remove unused file #209 T318277 9f5034e

2022-09-20

  • 18:43 Rook: Bump oauthlib from 3.2.0 to 3.2.1 in /images/minesweeper #207 dcfcfaa

2022-08-31

2022-08-29

2022-08-25

2022-08-24

  • 13:44 Rook: Upgrade ingress-nginx to 1.3.0 8e6b577

2022-08-23

2022-08-22

2022-08-04

2022-08-02

  • 12:22 Rook: deleting worker-1 from cluster and shutting down T313287

2022-07-25

  • 20:25 Rook: T313728 #188 revert nbclassic to restore openrefine
  • 11:18 Rook: 9336306 update pywikibot

2022-07-22

  • 15:43 Rook: updating to give hub pod a cpu request so that it cannot be resource starved

2022-07-19

  • 08:12 taavi: drain paws-k8s-worker-1 T313287

2022-07-14

2022-07-13

  • 18:25 Rook: 002d1a8 update dockerfile to give julia and openrefine sections
  • 11:59 Rook: cd8ec39 node from its own repo #182

2022-07-12

2022-07-11

2022-07-07

2022-06-27

2022-06-23

2022-06-21

2022-06-15

  • 18:03 Rook: removing unnecessary yaml #167 f473302
  • 17:04 Rook: Move renderer to ubuntu container #168 0788089

2022-06-14

  • 13:11 Rook: Removing leftover db-proxy bits #170 8795bf7

2022-06-07

2022-06-06

  • 13:54 Rook: T308975 Move away from git to pip for nbconvert #162 54f93d0
  • 12:21 Rook: T308926 Move nbserve to upstream container #165 1767870

2022-06-02

  • 17:02 Rook: scaling db-proxy to zero T309794

2022-05-22

2022-05-18

2022-05-16

  • 09:36 dcaro: restarted reload-acme-chief-backend.service to ensure certs are refreshed

2022-05-14

  • 16:16 andrewbogott: restarting acme-chief.service on paws-acme-chief-01 for T308383

2022-05-11

  • 13:44 Rook: update pwb version and pin jupyterlab version ef3e38c

2022-05-10

  • 13:56 Rook: upgrade pywikibot on container start 437f46a

2022-04-27

  • 17:02 Rook: pywikibot version bump 3c42a62

2022-04-18

  • 17:29 Rook: updating links to phab with prefilled ticket links aef7c67
  • 12:30 Rook: update pywikibot 6db74b6

2022-04-16

2022-04-04

  • 12:31 taavi: moving all VMs from paws-puppetmaster-01 -> paws-puppetmaster-2

2022-03-29

  • 10:36 Rook: upgrading pywikibot 702f21d

2022-03-21

  • 11:11 Rook: deploying jupyterlab cd6ee19

2022-03-10

  • 12:23 Rook: updating banner to note ui will update soon 462ab18

2022-03-08

  • 13:26 Rook: upgrading open refine c116d64

2022-03-07

  • 11:12 Rook: deploying paws realtime collaboration 246e2af

2022-03-02

  • 14:20 Rook: deploying fixed version of jupyter-rsession-proxy abe89f6

2022-03-01

  • 13:38 Rook: deploying pyaudio fix 978fb64

2022-02-23

  • 13:13 Rook: deploying e6eedbc cleanup

2022-02-15

  • 18:20 chicocvenancio: added psp for minesweeper
  • 16:04 mdipietro: updating pywikibot 2fc27c9
  • 14:21 chicocvenancio: Deploying minesweeper

2022-01-25

  • 14:30 mdipietro: deployed 93d33c4 PR122

2021-12-28

2021-12-20

  • 18:20 majavah: deploying calico v3.21.0 (T292698)

2021-12-16

2021-12-02

  • 19:12 chicocvenancio: deploy PR 111 T295257
  • 12:52 mdipietro: upgrading pywikibot 0f5d28d

2021-12-01

  • 11:36 mdipietro: deploying lsof pr-76 a378845

2021-11-29

  • 13:02 chicocvenancio: deploy PR 113 T295761
  • 12:29 chicocvenancio: deploy PR 112 T295761

2021-11-25

  • 21:37 chicocvenancio: rollback singleuser to PR #96 T295257
  • 21:15 chicocvenancio: deploy PR #110 changing singleuser to bump openrefine version T295257

2021-11-23

  • 14:19 mdipietro: increased cull timeout with deploy of 3e57264

2021-11-22

  • 12:57 mdipietro: added julia to paws with 7b58fb0
  • 11:04 mdipietro: added julia to paws with 12bfdad

2021-11-11

  • 08:35 majavah: disabling pod preset controller in preparation for T291913

2021-11-09

  • 16:24 mdipietro: deployed PR97 (85c085f) Update Pywikibot to 6.6.2

2021-11-03

2021-11-01

  • 12:31 majavah: upgrade ingress-nginx T292771

2021-10-28

  • 14:35 chicocvenancio: set team toolforge/wmcsadmins as maintainers for github repo

2021-10-26

  • 15:06 chicocvenancio: delete orphan pods for 2 users

2021-10-22

2021-10-21

  • 12:58 mdipietro: upgraded to 923250f which was really not an upgrade as the diff gave nothing. Though now it is clear what is deployed.

2021-09-07

  • 22:14 bstorm: upgraded k8s to 1.19.13 T287399

2021-08-18

  • 19:09 bstorm: redeployed hub with trove database backend instead of toolsdb

2021-07-29

  • 14:09 majavah: add mdipietro as projectadmin T287287

2021-07-25

  • 16:09 majavah: deleting ingress pod running on worker-6 to get it to re-appear in ingress-4

2021-07-21

  • 19:53 bstorm: deployed new maintain-kubeusers T285011
  • 19:53 bstorm: deployed new rbac for maintain-kubeusers changes T285011
  • 16:59 majavah: deploying calico v3.18.4 T280342
  • 15:52 majavah: add my key to passwords::root::extra_keys
  • 15:00 majavah: starting kubernetes upgrades T280302

2021-07-14

  • 10:38 majavah: correction: undeploy old ingress T264221
  • 10:35 majavah: undeploy old ingress T266050

2021-07-13

  • 07:51 majavah: renewing tools-prometheus certificates

2021-07-12

  • 13:18 majavah: ingress upgrade completed
  • 13:05 majavah: moving user traffic to updated ingress-nginx T264221

2021-07-01

  • 12:04 majavah: deploy ingress-nginx 0.46 via the helm chart to paws T264221

2021-06-30

  • 20:05 bstorm: tried force delete on the ingress-nginx-gen2 namespace, which doesn't appear to be working either until metrics-server is fixed T285905
  • 20:00 bstorm: renewed k8s metrics-server certs and the deployment
  • 18:04 majavah: renew kubernetes metrics-server certificate
  • 17:26 majavah: creating paws-k8s-ingress-[3-4] and joining them to the k8s cluster T264221
  • 17:16 bstorm: temporarily increased quota to 60 cores to enable T264221

2021-06-03

  • 20:43 chicocvenancio: tagged new singleuser image, fixes T283969

2021-05-27

  • 21:53 bstorm: added paws-k8s-control-2.paws.eqiad.wmflabs back to the list of control nodes at the proxy
  • 21:50 bstorm: renewed the certs for paws-k8s-control-2
  • 20:37 bstorm: removed paws-k8s-control-2.paws.eqiad.wmflabs from the proxy because it is somewhat broken (certs expired)
  • 19:41 bstorm: forced removal of openrefine in paws for now and deleted all current user server pods to force use of the new image

2021-05-23

2021-05-21

  • 00:06 bstorm: creating trove mysql instance pawsdb-1 T267683

2021-05-12

  • 19:33 bstorm: added taavi to paws.admin

2021-05-11

  • 09:17 Majavah: set `profile::wmcs::kubeadm::docker_vol: false` on ingress nodes T282087
  • 09:15 arturo: added user `taavi` (Majavah) as projectadmin

2021-04-20

2021-04-02

  • 21:50 bstorm: deploying latest PRs to add a note on the wikireplicas changes

2020-12-21

  • 20:27 bstorm: applied tuning for timeouts and elections on the k8s etcd pods of 300 for heartbeat and 3000 for elections T267966

2020-12-17

  • 02:22 bstorm: Set PAWS hub back to using mariadb T266587

2020-12-16

  • 18:21 chicocvenancio: move paws to sqlite while toolsdb is down.

2020-12-10

  • 17:00 arturo: fixing /etc/kubernetes/kublet.conf and restarting kubelet in paws-k8s-control-1 (T269865)

2020-12-05

  • 00:42 bd808: `kubectl delete po renderer-794886b9cd-9nc6c -n prod` after seeing lots of listen queue full errors in the pod logs.

2020-11-30

  • 18:22 bstorm: 1.17 upgrade for kubernetes complete T268669
  • 17:25 bstorm: upgrading the worker nodes (this will likely kill services briefly when some pods are rescheduled) T268669
  • 17:14 bstorm: updated the calico-kube-controllers deployment to use our internal registry to deal with docker-hub rate-limiting T268669 T269016
  • 17:09 chicocvenancio: delete orphaned jupyter server pod `kubectl -n prod delete pod jupyter--45volutionoftheuniverse`. Respective server not running in jupyter admin UI.
  • 16:31 bstorm: upgrading pods on paws-k8s-control-3 T268669
  • 16:17 bstorm: starting upgrade on paws-k8s-control-2 T268669 (first kubectl drain paws-k8s-control-2 --ignore-daemonsets)
  • 15:53 bstorm: proceeding with upgrade to 1.17 on paws-k8s-control-1 T268669
  • 15:49 bstorm: draining paws-k8s-control-1 for upgrade T268669
  • 12:49 arturo: disable puppet in all k8s nodes to prepare for the upgrade (T268669)
  • 12:49 arturo: set hiera `profile::wmcs::kubeadm::component: 'thirdparty/kubeadm-k8s-1-17'` at project level (T268669)

2020-11-16

  • 22:13 bstorm: deploying new paws changes for multiinstance readiness

2020-11-10

  • 20:16 chicocvenancio: restart hub to apply move to sqlite. T267667
  • 16:41 arturo: set paws in sqlite mode because T266587 (kubectl --namespace prod edit configmap hub-config)

2020-10-15

  • 19:12 andrewbogott: uncordoned paws-k8s-worker-1 and -2
  • 18:48 andrewbogott: draining paws-k8s-worker-2 for move to ceph
  • 18:36 andrewbogott: draining paws-k8s-worker-1 for move to ceph

2020-09-29

  • 10:59 arturo: last 2 commands should help puppet agent in the paws project, previously it had issues fetching acme-chief certs because an API update
  • 10:58 arturo: aborrero@paws-acme-chief-01:~$ sudo systemctl restart uwsgi-acme-chief.service
  • 10:56 arturo: aborrero@paws-acme-chief-01:~$ sudo systemctl restart acme-chief.service

2020-08-14

  • 17:09 bstorm: backing up the old proxy config to NFS and deleting paws-proxy-02 T211096

2020-08-07

  • 22:30 bstorm: removing downtime for paws and front page monitor T211096
  • 18:01 bstorm: shutting down paws-proxy-02 T211096
  • 17:05 bstorm: running the final rsync to the new cluster's nfs T211096
  • 16:08 bstorm: changing paws.wmflabs.org to point at the new cluster ip 185.15.56.57 T211096
  • 16:02 bstorm: LAST MESSAGE WRONG: switching NEW cluster to toolsdb T211096
  • 16:02 bstorm: switching old cluster to toolsdb T211096
  • 15:58 bstorm: switching old cluster to sqlite T211096
  • 15:53 bstorm: downtiming alerts in case they need changes (seems likely) T211096

2020-07-30

  • 20:40 bstorm: upgrading the singleuser image to test shuffling around some of the pip installs
  • 16:38 bstorm: removing the *.paws.wmflabs.org SNI name because it won't be used and it might trigger a re-issue of certs T255249
  • 15:39 bstorm: upgrading acme-chief to 0.27-1

2020-07-29

  • 18:03 bstorm: powering on paws-k8s-haproxy-1 because that worked fine
  • 18:00 bstorm: powering off paws-k8s-haproxy-1 to test failover

2020-07-24

  • 17:25 bstorm: to force repulling of every image everywhere, uninstalling paws in the new cluster and reinstalling it T258812
  • 09:39 arturo: dropped the DNS wildcard record `*.paws.wmcloud.org IN A 185.15.56.57` and created concrete CNAME records for the FQDNs we actually use (T211096)

2020-07-23

  • 22:51 bstorm: deploying via the default 'latest' tag in the new cluster T211096
  • 22:48 bstorm: tagged the newbuild tags with "latest" to set sane defaults for all images in the helm chart T211096
  • 21:14 bstorm: pushing quay.io/wikimedia-paws-prod/nbserve:newbuild to main repo T211096
  • 21:11 bstorm: pushing quay.io/wikimedia-paws-prod/deploy-hook:newbuild to main repo T211096
  • 21:09 bstorm: pushing quay.io/wikimedia-paws-prod/singleuser:newbuild to the main repo T211096
  • 21:08 bstorm: pushing quay.io/wikimedia-paws-prod/paws-hub:newbuild to the main repo T211096
  • 21:06 bstorm: pushing dbproxy docker image for new cluster into main quay.io repo T211096

2020-07-22

  • 23:32 bstorm: setting the default NFS version to 4.2 while excepting the two stretch servers T257945

2020-07-21

  • 15:13 chicocvenancio: merge pr #50 to fix T258142

2020-07-06

  • 21:41 bstorm: deployed ingress to redirect paws.wmcloud.org to the wikitech doc page T195217

2020-06-30

  • 23:00 bstorm: added paws-public.wmflabs.org to the alt-names for acme-chief, which broke it until we hand off the zone to the paws project <sorry!> T195217 T255997

2020-06-26

  • 21:57 bstorm: applied the metrics manifests to kubernetes to enable metrics-server, cadvisor, etc. T256361

2020-06-25

  • 22:52 bstorm: created paws-k8s-worker-5/6/7 as x-large nodes to bring the cluster up to roughly the same capacity as the existing one using soft anti-affinity T211096 T253267
  • 22:43 bstorm: bumped quota up to 24 instances, 128 GB RAM and 56 cores T211096
  • 16:39 bstorm: deleted the deployhook from the in-progress new cluster for now just in case T211096
  • 15:44 bstorm: deployed a proof-of-concept paws-public setup in the new cluster T255997

2020-06-24

  • 23:18 bstorm: added A record for *.paws.wmcloud.org to public and hub to use T211096 T255997 T195217
  • 21:45 bstorm: doing an initial rsync of the paws userhomes to the new project T160113

2020-06-19

  • 10:01 arturo: enabled `paws.wmflabs.org` and `*.paws.wmflabs.org` as valid ingress domains (acme-chief TLS cert, haproxy, etc) (T195217)

2020-06-17

  • 21:51 bstorm_: upgraded chart in the new cluster to include resource limits T251298
  • 21:51 bstorm_: upgraded chart in the new cluster to include resource limits

2020-06-16

  • 15:48 arturo: change DNS record k8s.svc.paws.eqiad1.wikimedia.cloud to point to the haproxy VIP port address 172.16.1.171 (T195217)
  • 15:47 arturo: associate floating IP 185.15.56.57 with haproxy VIP port (T295217)
  • 15:43 arturo: allow traffic to haproxy VM ports from the VIP port: `sudo wmcs-openstack port set --allowed-address ip-address=172.16.1.171 1b40be58-7182-41aa-95ce-797f94f83d66` (T295217)
  • 15:43 arturo: allow traffic to haproxy VM ports from the VIP port: `sudo wmcs-openstack port set --allowed-address ip-address=172.16.1.171 9ccc43d9-1a8a-4287-afda-67e8bab27a9f` (T295217)
  • 15:37 arturo: `aborrero@cloudcontrol1004:~ 1 $ sudo wmcs-openstack --os-project-id=paws port create --network 7425e328-560c-4f00-8e99-706f3fb90bb4 paws-haproxy-vip` (T295217)
  • 15:23 arturo: live-hacking paws-puppetmaster-01 with https://gerrit.wikimedia.org/r/c/operations/puppet/+/605944 for T195217

2020-06-15

  • 15:59 arturo: created DNS record `deploy-hook.paws.wmcloud.org IN CNAME paws.wmcloud.org` (T195217)
  • 12:28 arturo: manually created an Ingress object to test routing to the hub (T195217)
  • 12:20 arturo: created DNS record `paws.wmcloud.org IN A 185.15.56.57` (T195217)
  • 12:19 arturo: associate floating IP 185.15.56.57 with VM paws-k8s-haproxy-1 (T195217)
  • 12:18 arturo: release floating IP not in use: 185.15.56.42
  • 12:18 arturo: release floating IP not in use: 185.15.56.43
  • 11:45 arturo: reset wikitech user password for the service account `paws-dns-manager` to what is in labs/private.git/hieradata/common.yaml `profile::acme_chief::cloud::designate_sync_password` (T195217)

2020-06-12

  • 18:49 bstorm_: deployed a test of paws chart in the new cluster T211096
  • 13:23 arturo: assigned the DNS zone `paws.wmcloud.org` (T195217)
  • 13:13 arturo: live-hacking session in the puppetmaster ended
  • 13:05 arturo: live-hacking puppet tree in paws-puppetmaster-01 for T195217
  • 11:55 arturo: `aborrero@cloudcontrol1004:~ $ sudo wmcs-openstack role add --user paws-dns-manager --project paws observer` (T255252)
  • 11:55 arturo: `aborrero@cloudcontrol1004:~ $ sudo wmcs-openstack role add --user paws-dns-manager --project paws designateadmin` (T255252)
  • 11:51 arturo: created service account `paws-dns-manager` in wikitech (T255252)
  • 11:31 arturo: introduced acme-chief private data into labs/private in paws-puppetmaster-01 (T255252)
  • 11:02 arturo: created puppet prefix 'paws-acme-chief' (T255252)
  • 11:01 arturo: created VM paws-acme-chief-01 (T255252)

2020-06-11

2020-06-04

  • 14:16 arturo: added node taints to ingress nodes: `kubectl taint nodes paws-k8s-ingress-1 ingress=true:NoSchedule` (T195217)
  • 12:18 arturo: bootstrapped paws-k8s-ingress nodes, added them to the k8s cluster (T195217)
  • 12:04 arturo: created `paws-k8s-ingress` puppet prefix and add the `role::wmcs::paws::k8s::worker` role (T195217)
  • 12:02 arturo: created 2 medium VM instances: paws-k8s-ingress-1 and paws-k8s-ingress-2 with haproxy anti-affinity (T195217)

2020-05-26

  • 22:34 bstorm_: restored the deployment for maintain-kubeusers so anyone added to the paws.admin group will have admin on the cluster now that the bug is fixed T211096 T246059
  • 22:05 bstorm_: temporarily deleted the deployment for maintain-kubeusers pending patch to fix context creation for new admin accounts T211096 T246059
  • 22:04 bstorm_: created paws-focused PodSecurityPolicies and the prod namespace in the new cluster T211096
  • 22:03 bstorm_: created paws.admin group and kubernetes admin accounts on the new k8s cluster T211096 T246059
  • 18:29 bstorm_: bootstrapped the new control plane nodes T211096
  • 15:27 bstorm_: updated profile::wmcs::kubeadm::kubernetes_version to 1.16.10 for cluster init T211096

2020-05-21

  • 23:04 bstorm_: added profile::wmcs::kubeadm::k8s::encryption_key and profile::wmcs::kubeadm::k8s::node_token to labs/private T211096
  • 14:53 bstorm_: adding the hiera values to horizon for bootstrapping k8s T211096
  • 14:39 arturo: point record `k8s.svc.paws.eqiad1.wikimedia.cloud` to `172.16.1.186` (which is paws-k8s-control-1, for the initial bootstrap) (T211096)
  • 12:48 arturo: created record `k8s.svc.paws.eqiad1.wikimedia.cloud` pointing to `172.16.0.191` (which is paws-k8s-haproxy-1) (T211096)
  • 12:34 arturo: created and transferred DNS zone `svc.paws.eqiad1.wikimedia.cloud` (T211096)

2020-05-20

  • 22:35 bstorm_: created paws-k8s-worker-1/2/3/4 T211096
  • 22:12 bstorm_: created paws-k8s-haproxy-1/2 with antiaffinity group T211096
  • 21:36 bstorm_: created paws-k8s-control-1/2/3 with appropriate sec group and server group T211096
  • 18:59 bstorm_: created anti-affinity group "controlplane" T211096
  • 16:38 bstorm_: deleting the old shut-down VMs from the last effort to rebuild paws T211096
  • 16:36 bstorm_: cleaned up the old DNS entries for the external LBs that have been off for a year

2020-03-20

  • 14:03 jeh: upgrade paws-puppetmaster-01 to v5 T241719

2020-02-14

  • 21:31 andrewbogott: restarting paws-puppetmaster-01 so its clients can connect

2020-01-09

  • 18:06 bstorm_: rebooting tools-paws-master-01 T242353
  • 14:28 chicocvenancio: shutdown unused instances

2019-12-13

  • 00:27 bstorm_: rebooting the paws master since it is in a bad state after the openstack maintenance as well.

2019-11-01

  • 21:15 Krenair: Updated paws-apiserver.wmflabs.org A record list to remove 172.16.2.151 which is not allocated to any instance. The other two A records point to valid instances in the paws project.

2019-10-23

  • 09:03 arturo: paws-master-01/03 and a couple of other servers are down because hypervisor is rebooting

2019-10-14

  • 22:32 bd808: Removed project member "Afrodric". Looks like someone added accidentally when trying to make aborrero as project member
  • 22:31 bd808: Added Krenair as project member

2019-05-18

  • 11:13 chicocvenancio: point paws-proxy-02 to tools-paws-worker-1006 on paws-deploy-hook hostname (T218380)

2019-04-26

2019-04-16

  • 17:15 chicocvenancio: move paws-proxy-02 reload nginx
  • 17:07 chicocvenancio: move paws-proxy-02 to point to tools-paws-worker-1006 for upcoming master move

2019-03-27

  • 23:46 chicocvenancio: moving paws host in `paws-proxy-02` back to `tools-paws-master-01` T219460
  • 22:10 chicocvenancio: moving paws host in `paws-proxy-02` to `tools-paws-worker-1005` T219460

2019-03-25

  • 14:12 gtirloni: created `paws.wmflabs.org` subdomain under `paws` project (T211096)
  • 14:07 gtirloni: created `paws.wmflabs.org` subdomain under `paws` project T211096
  • 13:54 gtirloni: created `paws.wmflabs.org` subdomain under `paws` project (T211096)

2019-03-15

  • 02:25 gtirloni: activated TLS termination using Let's Encrypt on paws-proxy-02
  • 02:25 gtirloni: removed webproxies and created new A records pointing directly to paws-proxy-02

2019-02-21

  • 09:22 gtirloni: upgraded and rebooted paws-proxy-02

2019-02-20

  • 15:00 andrewbogott: deleting the long-shut-down paws-proxy-01

2019-02-15

  • 01:28 bd808: Re-enabled PAWS vhost on paws-proxy-02

2019-02-14

  • 22:25 gtirloni: downtimed PAWS in Icinga
  • 22:16 gtirloni: Activated maintenance page on paws-proxy-02 nginx config

2019-02-13

  • 08:32 arturo: switch paws-proxy-02 puppetmaster to labs-puppetmaster.wikimedia.org

2019-01-24

  • 19:20 andrewbogott: shutting down paws-proxy-01
  • 19:11 chicocvenancio: moved config, ready to receive traffic on paws-proxy-02 T214613
  • 18:34 chicocvenancio: firing up paws-proxy-02 for T214613

2019-01-23

2018-10-25

  • 23:58 gtirloni: Started tools-paws-worker-1010 (T208006)

2018-08-03

  • 20:19 andrewbogott: deleting paws-master-01 and paws-node-1002; unused

2018-07-03

  • 22:49 bstorm_: added stricter image space reclaiming arguments to kubelet

2018-06-20

  • 17:39 chicocvenancio: edited paws-proxy-01 to pass http_x_forwarded_proto as it receives T197248

2018-05-04

  • 02:48 chicocvenancio: killed 25 pods with more than one hour inactivity through admin interface

2018-03-14

  • 21:49 chicocvenancio: updated k8s control plane, updating nodes to v1.9.4 for T189680

2018-02-23

  • 18:33 chicocvenancio: redirected tools.wmflabs.org/paws to paws.wmflabs.org and deleted old k8s ReplicationControllers (T188068)

2018-02-22

  • 22:11 chicocvenancio: (T175202) culler is running and killing pods as designed!
  • 21:13 chicocvenancio: jupyterhub updated to fix culler (T175202) culler already ran without 404
  • 17:43 chicocvenancio: manually ran culler inside hub container

2018-02-21

  • 17:03 chicocvenancio: deleted query-killer k8s deployment T187818

2018-02-16

  • 20:18 chicocvenancio: changed userhomes group for T185434 workarround

2018-02-15

  • 01:10 chicocvenancio: changed group of all userhome folders to tools.paws

2018-02-04

  • 12:21 chicocvenancio: changed group of all userhome folders to tools.paws

2017-12-19

  • 22:11 bd808: Killed tiller pod that was in crashloopbackoff

2017-09-28

  • 21:25 andrewbogott: server docker restart on paws-node-1002; disk is full and docker is holding open a lot of deleted files

2017-03-20

  • 21:25 andrewbogott: migrating paws-base-01 to labvirt1013

2016-05-10