Nova Resource:Paws/SAL

From Wikitech
Jump to navigation Jump to search

2020-07-30

  • 20:40 bstorm: upgrading the singleuser image to test shuffling around some of the pip installs
  • 16:38 bstorm: removing the *.paws.wmflabs.org SNI name because it won't be used and it might trigger a re-issue of certs T255249
  • 15:39 bstorm: upgrading acme-chief to 0.27-1

2020-07-29

  • 18:03 bstorm: powering on paws-k8s-haproxy-1 because that worked fine
  • 18:00 bstorm: powering off paws-k8s-haproxy-1 to test failover

2020-07-24

  • 17:25 bstorm: to force repulling of every image everywhere, uninstalling paws in the new cluster and reinstalling it T258812
  • 09:39 arturo: dropped the DNS wildcard record `*.paws.wmcloud.org IN A 185.15.56.57` and created concrete CNAME records for the FQDNs we actually use (T211096)

2020-07-23

  • 22:51 bstorm: deploying via the default 'latest' tag in the new cluster T211096
  • 22:48 bstorm: tagged the newbuild tags with "latest" to set sane defaults for all images in the helm chart T211096
  • 21:14 bstorm: pushing quay.io/wikimedia-paws-prod/nbserve:newbuild to main repo T211096
  • 21:11 bstorm: pushing quay.io/wikimedia-paws-prod/deploy-hook:newbuild to main repo T211096
  • 21:09 bstorm: pushing quay.io/wikimedia-paws-prod/singleuser:newbuild to the main repo T211096
  • 21:08 bstorm: pushing quay.io/wikimedia-paws-prod/paws-hub:newbuild to the main repo T211096
  • 21:06 bstorm: pushing dbproxy docker image for new cluster into main quay.io repo T211096

2020-07-22

  • 23:32 bstorm: setting the default NFS version to 4.2 while excepting the two stretch servers T257945

2020-07-21

  • 15:13 chicocvenancio: merge pr #50 to fix T258142

2020-07-06

  • 21:41 bstorm: deployed ingress to redirect paws.wmcloud.org to the wikitech doc page T195217

2020-06-30

  • 23:00 bstorm: added paws-public.wmflabs.org to the alt-names for acme-chief, which broke it until we hand off the zone to the paws project <sorry!> T195217 T255997

2020-06-26

  • 21:57 bstorm: applied the metrics manifests to kubernetes to enable metrics-server, cadvisor, etc. T256361

2020-06-25

  • 22:52 bstorm: created paws-k8s-worker-5/6/7 as x-large nodes to bring the cluster up to roughly the same capacity as the existing one using soft anti-affinity T211096 T253267
  • 22:43 bstorm: bumped quota up to 24 instances, 128 GB RAM and 56 cores T211096
  • 16:39 bstorm: deleted the deployhook from the in-progress new cluster for now just in case T211096
  • 15:44 bstorm: deployed a proof-of-concept paws-public setup in the new cluster T255997

2020-06-24

  • 23:18 bstorm: added A record for *.paws.wmcloud.org to public and hub to use T211096 T255997 T195217
  • 21:45 bstorm: doing an initial rsync of the paws userhomes to the new project T160113

2020-06-19

  • 10:01 arturo: enabled `paws.wmflabs.org` and `*.paws.wmflabs.org` as valid ingress domains (acme-chief TLS cert, haproxy, etc) (T195217)

2020-06-17

  • 21:51 bstorm_: upgraded chart in the new cluster to include resource limits T251298
  • 21:51 bstorm_: upgraded chart in the new cluster to include resource limits

2020-06-16

  • 15:48 arturo: change DNS record k8s.svc.paws.eqiad1.wikimedia.cloud to point to the haproxy VIP port address 172.16.1.171 (T195217)
  • 15:47 arturo: associate floating IP 185.15.56.57 with haproxy VIP port (T295217)
  • 15:43 arturo: allow traffic to haproxy VM ports from the VIP port: `sudo wmcs-openstack port set --allowed-address ip-address=172.16.1.171 1b40be58-7182-41aa-95ce-797f94f83d66` (T295217)
  • 15:43 arturo: allow traffic to haproxy VM ports from the VIP port: `sudo wmcs-openstack port set --allowed-address ip-address=172.16.1.171 9ccc43d9-1a8a-4287-afda-67e8bab27a9f` (T295217)
  • 15:37 arturo: `aborrero@cloudcontrol1004:~ 1 $ sudo wmcs-openstack --os-project-id=paws port create --network 7425e328-560c-4f00-8e99-706f3fb90bb4 paws-haproxy-vip` (T295217)
  • 15:23 arturo: live-hacking paws-puppetmaster-01 with https://gerrit.wikimedia.org/r/c/operations/puppet/+/605944 for T195217

2020-06-15

  • 15:59 arturo: created DNS record `deploy-hook.paws.wmcloud.org IN CNAME paws.wmcloud.org` (T195217)
  • 12:28 arturo: manually created an Ingress object to test routing to the hub (T195217)
  • 12:20 arturo: created DNS record `paws.wmcloud.org IN A 185.15.56.57` (T195217)
  • 12:19 arturo: associate floating IP 185.15.56.57 with VM paws-k8s-haproxy-1 (T195217)
  • 12:18 arturo: release floating IP not in use: 185.15.56.42
  • 12:18 arturo: release floating IP not in use: 185.15.56.43
  • 11:45 arturo: reset wikitech user password for the service account `paws-dns-manager` to what is in labs/private.git/hieradata/common.yaml `profile::acme_chief::cloud::designate_sync_password` (T195217)

2020-06-12

  • 18:49 bstorm_: deployed a test of paws chart in the new cluster T211096
  • 13:23 arturo: assigned the DNS zone `paws.wmcloud.org` (T195217)
  • 13:13 arturo: live-hacking session in the puppetmaster ended
  • 13:05 arturo: live-hacking puppet tree in paws-puppetmaster-01 for T195217
  • 11:55 arturo: `aborrero@cloudcontrol1004:~ $ sudo wmcs-openstack role add --user paws-dns-manager --project paws observer` (T255252)
  • 11:55 arturo: `aborrero@cloudcontrol1004:~ $ sudo wmcs-openstack role add --user paws-dns-manager --project paws designateadmin` (T255252)
  • 11:51 arturo: created service account `paws-dns-manager` in wikitech (T255252)
  • 11:31 arturo: introduced acme-chief private data into labs/private in paws-puppetmaster-01 (T255252)
  • 11:02 arturo: created puppet prefix 'paws-acme-chief' (T255252)
  • 11:01 arturo: created VM paws-acme-chief-01 (T255252)

2020-06-11

2020-06-04

  • 14:16 arturo: added node taints to ingress nodes: `kubectl taint nodes paws-k8s-ingress-1 ingress=true:NoSchedule` (T195217)
  • 12:18 arturo: bootstrapped paws-k8s-ingress nodes, added them to the k8s cluster (T195217)
  • 12:04 arturo: created `paws-k8s-ingress` puppet prefix and add the `role::wmcs::paws::k8s::worker` role (T195217)
  • 12:02 arturo: created 2 medium VM instances: paws-k8s-ingress-1 and paws-k8s-ingress-2 with haproxy anti-affinity (T195217)

2020-05-26

  • 22:34 bstorm_: restored the deployment for maintain-kubeusers so anyone added to the paws.admin group will have admin on the cluster now that the bug is fixed T211096 T246059
  • 22:05 bstorm_: temporarily deleted the deployment for maintain-kubeusers pending patch to fix context creation for new admin accounts T211096 T246059
  • 22:04 bstorm_: created paws-focused PodSecurityPolicies and the prod namespace in the new cluster T211096
  • 22:03 bstorm_: created paws.admin group and kubernetes admin accounts on the new k8s cluster T211096 T246059
  • 18:29 bstorm_: bootstrapped the new control plane nodes T211096
  • 15:27 bstorm_: updated profile::wmcs::kubeadm::kubernetes_version to 1.16.10 for cluster init T211096

2020-05-21

  • 23:04 bstorm_: added profile::wmcs::kubeadm::k8s::encryption_key and profile::wmcs::kubeadm::k8s::node_token to labs/private T211096
  • 14:53 bstorm_: adding the hiera values to horizon for bootstrapping k8s T211096
  • 14:39 arturo: point record `k8s.svc.paws.eqiad1.wikimedia.cloud` to `172.16.1.186` (which is paws-k8s-control-1, for the initial bootstrap) (T211096)
  • 12:48 arturo: created record `k8s.svc.paws.eqiad1.wikimedia.cloud` pointing to `172.16.0.191` (which is paws-k8s-haproxy-1) (T211096)
  • 12:34 arturo: created and transferred DNS zone `svc.paws.eqiad1.wikimedia.cloud` (T211096)

2020-05-20

  • 22:35 bstorm_: created paws-k8s-worker-1/2/3/4 T211096
  • 22:12 bstorm_: created paws-k8s-haproxy-1/2 with antiaffinity group T211096
  • 21:36 bstorm_: created paws-k8s-control-1/2/3 with appropriate sec group and server group T211096
  • 18:59 bstorm_: created anti-affinity group "controlplane" T211096
  • 16:38 bstorm_: deleting the old shut-down VMs from the last effort to rebuild paws T211096
  • 16:36 bstorm_: cleaned up the old DNS entries for the external LBs that have been off for a year

2020-03-20

  • 14:03 jeh: upgrade paws-puppetmaster-01 to v5 T241719

2020-02-14

  • 21:31 andrewbogott: restarting paws-puppetmaster-01 so its clients can connect

2020-01-09

  • 18:06 bstorm_: rebooting tools-paws-master-01 T242353
  • 14:28 chicocvenancio: shutdown unused instances

2019-12-13

  • 00:27 bstorm_: rebooting the paws master since it is in a bad state after the openstack maintenance as well.

2019-11-01

  • 21:15 Krenair: Updated paws-apiserver.wmflabs.org A record list to remove 172.16.2.151 which is not allocated to any instance. The other two A records point to valid instances in the paws project.

2019-10-23

  • 09:03 arturo: paws-master-01/03 and a couple of other servers are down because hypervisor is rebooting

2019-10-14

  • 22:32 bd808: Removed project member "Afrodric". Looks like someone added accidentally when trying to make aborrero as project member
  • 22:31 bd808: Added Krenair as project member

2019-05-18

  • 11:13 chicocvenancio: point paws-proxy-02 to tools-paws-worker-1006 on paws-deploy-hook hostname (T218380)

2019-04-26

2019-04-16

  • 17:15 chicocvenancio: move paws-proxy-02 reload nginx
  • 17:07 chicocvenancio: move paws-proxy-02 to point to tools-paws-worker-1006 for upcoming master move

2019-03-27

  • 23:46 chicocvenancio: moving paws host in `paws-proxy-02` back to `tools-paws-master-01` T219460
  • 22:10 chicocvenancio: moving paws host in `paws-proxy-02` to `tools-paws-worker-1005` T219460

2019-03-25

  • 14:12 gtirloni: created `paws.wmflabs.org` subdomain under `paws` project (T211096)
  • 14:07 gtirloni: created `paws.wmflabs.org` subdomain under `paws` project T211096
  • 13:54 gtirloni: created `paws.wmflabs.org` subdomain under `paws` project (T211096)

2019-03-15

  • 02:25 gtirloni: activated TLS termination using Let's Encrypt on paws-proxy-02
  • 02:25 gtirloni: removed webproxies and created new A records pointing directly to paws-proxy-02

2019-02-21

  • 09:22 gtirloni: upgraded and rebooted paws-proxy-02

2019-02-20

  • 15:00 andrewbogott: deleting the long-shut-down paws-proxy-01

2019-02-15

  • 01:28 bd808: Re-enabled PAWS vhost on paws-proxy-02

2019-02-14

  • 22:25 gtirloni: downtimed PAWS in Icinga
  • 22:16 gtirloni: Activated maintenance page on paws-proxy-02 nginx config

2019-02-13

  • 08:32 arturo: switch paws-proxy-02 puppetmaster to labs-puppetmaster.wikimedia.org

2019-01-24

  • 19:20 andrewbogott: shutting down paws-proxy-01
  • 19:11 chicocvenancio: moved config, ready to receive traffic on paws-proxy-02 T214613
  • 18:34 chicocvenancio: firing up paws-proxy-02 for T214613

2019-01-23

2018-10-25

  • 23:58 gtirloni: Started tools-paws-worker-1010 (T208006)

2018-08-03

  • 20:19 andrewbogott: deleting paws-master-01 and paws-node-1002; unused

2018-07-03

  • 22:49 bstorm_: added stricter image space reclaiming arguments to kubelet

2018-06-20

  • 17:39 chicocvenancio: edited paws-proxy-01 to pass http_x_forwarded_proto as it receives T197248

2018-05-04

  • 02:48 chicocvenancio: killed 25 pods with more than one hour inactivity through admin interface

2018-03-14

  • 21:49 chicocvenancio: updated k8s control plane, updating nodes to v1.9.4 for T189680

2018-02-23

  • 18:33 chicocvenancio: redirected tools.wmflabs.org/paws to paws.wmflabs.org and deleted old k8s ReplicationControllers (T188068)

2018-02-22

  • 22:11 chicocvenancio: (T175202) culler is running and killing pods as designed!
  • 21:13 chicocvenancio: jupyterhub updated to fix culler (T175202) culler already ran without 404
  • 17:43 chicocvenancio: manually ran culler inside hub container

2018-02-21

  • 17:03 chicocvenancio: deleted query-killer k8s deployment T187818

2018-02-16

  • 20:18 chicocvenancio: changed userhomes group for T185434 workarround

2018-02-15

  • 01:10 chicocvenancio: changed group of all userhome folders to tools.paws

2018-02-04

  • 12:21 chicocvenancio: changed group of all userhome folders to tools.paws

2017-12-19

  • 22:11 bd808: Killed tiller pod that was in crashloopbackoff

2017-09-28

  • 21:25 andrewbogott: server docker restart on paws-node-1002; disk is full and docker is holding open a lot of deleted files

2017-03-20

  • 21:25 andrewbogott: migrating paws-base-01 to labvirt1013

2016-05-10