Nova Resource:Admin/SAL

From Wikitech
Jump to navigation Jump to search

2019-08-18

  • 10:39 arturo: rebooting cloudvirt1023 for new interface names configuration
  • 10:34 arturo: downtimed cloudvirt1023 for 2 days

2019-08-05

  • 17:17 bd808: Set downtime on gridengine and kubernetes webservice checks in icinga until 2019-09-02 (flaky tests)

2019-07-29

  • 20:14 bd808: Restarted maintain-kubeusers on tools-k8s-master-01 (T194859)

2019-07-25

  • 12:32 arturo: eqiad1/glance: debian-9.9-stretch image deprecates debian-9.8-stretch (T228983)
  • 09:59 arturo: (codfw1dev) drop missing glance images (T228972)
  • 09:32 arturo: (codfw1dev) deleting a bunch of VMs that were running in now missing hypervisors
  • 09:31 arturo: (codfw1dev) deleting a bunch of VMs in ERROR and SHUTDOWN state
  • 09:27 arturo: last log entry refers to the codfw1dev deployment
  • 09:27 arturo: cleanup `nova service-list` from old hypervisors (labtest*)
  • 09:23 arturo: refreshed nova DB grants in clouddb2001-dev for the codfw1dev deployment
  • 08:47 arturo: cleanup the cloud-announce pending emails (spam)

2019-07-23

  • 19:43 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 and 1004

2019-07-22

  • 23:44 bd808: Restarted maintain-kubeusers on tools-k8s-master-01 (T228529)

2019-07-11

  • 22:07 bd808: Ran `sudo systemctl stop designate_floating_ip_ptr_records_updater.service` on cloudcontrol1003
  • 22:01 bd808: `sudo apt-get install python2.7-dbg` on cloudcontrol1003 to debug hung python process
  • 21:48 bd808: Ran `sudo systemctl stop designate_floating_ip_ptr_records_updater.service` on cloudcontrol1004

2019-06-25

  • 16:05 bstorm_: updated python3.4 to update4 wherever it was installed on Jessie VMs to prevent issues with broken update3.
  • 14:56 bstorm_: Updated python 3.4 on the labs-puppetmaster server

2019-06-03

  • 15:55 arturo: T221769 rebooting cloudservices1003 after bootstrapping is apparently completed

2019-05-28

  • 21:42 bstorm_: unmounting labstore1003-scratch on all cloud clients
  • 18:14 bstorm_: T209527 switched mounts from labstore1003 to cloudstore1008 for scratch

2019-05-20

  • 17:25 arturo: T223923 dropped compat-network config from /etc/network/interfaces in eqiad1/codfw1dev neutron nodes
  • 17:22 arturo: T223923 dropped br-compat bridges and vlan interfaces (1102 and 2102) in eqiad1/codfw1dev neutron nodes
  • 17:07 arturo: T223923 dropped compat-network configuration from the neutron database in eqiad1
  • 16:55 arturo: T223923 dropped compat-network configuration from the neutron database in codfw1dev

2019-05-15

  • 17:00 andrewbogott: touching /root/firstboot_done on all VMs that cumin can reach. This will prevent firstboot.sh from running a second time if/when any of these are rebooted. T223370

2019-04-26

  • 15:51 arturo: andrew updated dns servers for the cloud-instances2-b-eqiad subnet in neutron: 208.80.154.143 and 208.80.154.24

2019-04-25

  • 11:14 arturo: T221760 increased size of conntrack table

2019-04-24

  • 12:54 arturo: T220051 puppet broken in every VM in Cloud VPS, fixing right now

2019-04-22

  • 11:14 arturo: create by hand /var/cache/labsaliaser/labs-ip-aliases.json in cloudservices2002-dev (T218575)

2019-04-16

  • 22:55 bd808: cloudcontrol2003-dev: added `exit 0` to /etc/cron.hourly/keystone to stop cron spam on partially configured cluster
  • 12:08 arturo: rebooting cloudvirt200[123]-dev because deep changes in config
  • 11:27 arturo: T219626 add DB grants for neutron and glnace to clouddb2001-dev (codfw1dev)
  • 10:37 arturo: T219626 replace 208.80.153.75 with 208.80.153.59 in the clouddb2001-dev database (codfw1dev deployment)
  • 10:30 arturo: T219626 replace labtestcontrol2003 with cloudcontrol2001-dev in the clouddb2001-dev database (codfw1dev deployment)

2019-04-15

  • 13:08 arturo: T219626 add DB grants for keystone/nova/nova_api to clouddb2001-dev (codfw1dev)

2019-04-13

  • 18:25 bd808: Restarted nova-compute service on cloudvirt1015 (T220853)

2019-04-11

  • 12:00 arturo: T151704 deploying oidentd to cloudnet1xxx servers

2019-04-02

  • 19:52 andrewbogott: installed new base Stretch image. Updated packages, and runs apt-get dist-upgrade on first boot.

2019-03-29

  • 14:34 andrewbogott: moving tools-static.wmflabs.org to point to tools-static-13 in eqiad1-r
  • 00:00 bstorm_: T193264 Added osm.db.svc.eqiad.wmflabs to cloud DNS

2019-03-25

  • 00:40 bd808: Restarted maintain-dbusers on labstore1004. Process hung up on failed LDAP connection.

2019-03-21

  • 19:32 andrewbogott: restarting keystone on cloudcontrol1003

2019-03-15

  • 16:00 gtirloni: increased nscd cache size (T217280)

2019-03-14

  • 19:04 gtirloni: bstorm started nfsd on labstore1006 (T218341)
  • 16:42 gtirloni: published new debian-9.8 image (T218314)

2019-03-04

  • 19:37 bstorm_: umounted /mnt/nfs/dumps-labstore1006.wikimedia.org across all VPS projects for T217473

2019-02-26

  • 12:46 gtirloni: shutdown toolsbeta-sgegrid-master (cronspam)

2019-02-25

  • 10:32 gtirloni: restarted nfsd on labstore1004

2019-02-21

  • 09:09 gtirloni: restarted uwsgi-labspuppetbackend.service on labpuppetmaster1001
  • 07:42 gtirloni: created project cloudstore
  • 07:36 gtirloni: deleted wmcs-nfs project

2019-02-20

  • 21:58 andrewbogott: silencing shinken and disabling puppet on shinken-02 for now

2019-02-19

  • 12:00 gtirloni: added nagios@icinga2001.wikimedia.org to cloud-admin-feed@ allowed senders

2019-02-18

  • 20:21 gtirloni: downtimed cloudvirt1020
  • 20:12 gtirloni: ran `labs-ip-alias-dump.py` on cloudservices/labservices servers

2019-02-15

  • 13:10 arturo: T216239 labvirt1019 has been drained
  • 12:22 arturo: T216239 draining labvirt1009 with a command like this: `root@cloudcontrol1004:~# wmcs-cold-migrate --region eqiad --nova-db nova 2c0cf363-c7c3-42ad-94bd-e586f2492321 labvirt1001`
  • 12:02 arturo: more nova service cleanups in the database (labvirts that were reallocated to eqiad1)
  • 11:34 arturo: T216190 cleanup from nova database `nova service-delete 35`
  • 03:50 andrewbogott: updated VPS base images for Jessie and Stretch, now featuring Stretch 9.7

2019-02-11

  • 18:13 gtirloni: cleaned old metrics data in labmon1001 T215417
  • 15:28 gtirloni: running `maintain-views --all-databases --replace-all` on labsdb1011
  • 14:18 gtirloni: running `maintain-views --all-databases --replace-all` on labsdb1010

2019-02-08

  • 14:56 gtirloni: running `maintain-views --all-databases --replace-all` on labsdb1009

2019-02-06

  • 11:47 gtirloni: downtimed labmon100{1,2} T215399
  • 00:17 bstorm_: T214106 deleted bstorm-test2 project to clean up

2019-02-05

  • 10:48 arturo: labmon1001 is now part of the 'eqiad1-r' region

2019-02-01

  • 09:54 arturo: moving canary1015-01 VM instance from cloudvirt1024 back to cloudvirt1015

2019-01-31

  • 12:44 arturo: T215012 depooling cloudvirt1015 and migrating all VMs to cloudvirt1024

2019-01-25

  • 20:11 gtirloni: deleted project yandex-proxy T212306
  • 20:11 gtirloni: deleted project T212306

2019-01-24

  • 11:50 arturo: T213925 modify subnet cloud-instances-transport1-b-eqiad1 to avoid floating IP allocations from here
  • 11:07 arturo: T214299 failover cloudnet1003 to cloudnet1004
  • 10:03 arturo: T214299 reimage cloudnet1004 to debian stretch
  • 09:51 arturo: T214299 failover cloudnet1004 to cloudnet1003

2019-01-22

  • 19:19 arturo: T214299 stretch cloudnet1003 is apparently all set
  • 18:40 arturo: T214299 manually delete from neutron agents from cloudnet1003 (must be added again after reimage, with new uuids)
  • 18:37 arturo: T214299 reimaging cloudnet1003 as debian stretch
  • 17:35 jbond42: starting roll out of apt package updates to
  • 14:41 gtirloni: T214369 deployed new jessie and stretch VM images

2019-01-21

  • 18:29 gtirloni: installed libguestfs-tools on cloudvirt1021

2019-01-16

  • 14:21 andrewbogott: stopping old VPS proxies in eqiad — T213540

2019-01-15

  • 14:20 andrewbogott: changing tools.wmflabs.org to point to tools-proxy-03 in eqiad1

2019-01-13

  • 20:00 andrewbogott: VPS proxies are now running in eqiad1 on proxy-01. Old VMs will wait a bit for deletion. T213540
  • 19:12 andrewbogott: moving the VPS proxy API backend to proxy-01.project-proxy.eqiad.wmflabs, as per T213540
  • 17:11 andrewbogott: moving all VPS dynamic proxies to proxy-eqiad1.wmflabs.org aka proxy-01.project-proxy.eqiad.wmflabs, as per T213540

2019-01-09

  • 22:21 bd808: neutron quota-update --tenant-id tools --port 256

2019-01-08

  • 18:59 bd808: Definately did NOT delete uid=novaadmin,ou=people,dc=wikimedia,dc=org
  • 18:59 bd808: Deleted LDAP user uid=neutron,ou=people,dc=wikimedia,dc=org
  • 18:58 bd808: Deleted LDAP user uid=novaadmin,ou=people,dc=wikimedia,dc=org

2019-01-06

  • 22:03 bd808: Set floatingip quota of 60 for tools project in eqiad1-r region (T212360)

2018-12-20

  • 17:10 arturo: T207663 renumbered transport network in eqiad1

2018-12-05

  • 17:59 arturo: T207663 changed labtestn transport network addressing from private to public

2018-12-03

  • 13:25 arturo: T202886 create again PTR records after dnsleak.py fix

2018-11-30

  • 14:08 arturo: running dns leaks cleanup `root@cloudcontrol1003:~# /root/novastats/dnsleaks.py --delete`

2018-11-28

  • 17:33 gtirloni: deleted contintcloud project (T209644)

2018-11-27

  • 13:32 gtirloni: enabled DRBD stats collection on labstore100[4-5] T208446

2018-11-22

  • 07:12 gtirloni: deployed new debian-9.6-stretch image

2018-11-21

  • 10:48 arturo: re-created compat-net as not shared in labtestn to test stuff related to T209954

2018-11-16

  • 12:43 gtirloni: armed keyholder on labpuppetmaster1001/1002 after reboots
  • 12:08 gtirloni: rebooted labpuppetmaster1001 (T207377)
  • 11:57 gtirloni: rebooted labpuppetmaster1002 (T207377)

2018-11-14

  • 17:19 gtirloni: added cloudvirt1016 to scheduler pool (T209426)
  • 15:41 gtirloni: reimaging labvirt1016 as cloudvirt1016
  • 15:14 gtirloni: reset-failed systemd unit nova-scheduler on cloudcontrol1004
  • 13:52 gtirloni: rebooted labservices1002 after package upgrades (T207377)
  • 13:23 gtirloni: rebooted labstore2004 after package upgrades (T207377)
  • 13:20 gtirloni: rebooted labstore2003 after package upgrades (T207377)
  • 13:20 gtirloni: rebooted labstore2001/labstore2003 after package upgrades (T207377)
  • 12:08 gtirloni: rebooted labnet1002 after package upgrades
  • 12:01 gtirloni: rebooted labmon1002 after package upgrades
  • 11:41 gtirloni: rebooted labcontrol1002 after package upgrades
  • 11:15 gtirloni: rebooted cloudcontrol1004 after package upgrades

2018-11-09

  • 18:17 gtirloni: restarted neutron-linuxbridge-agent on cloudvirt1018/1023

2018-11-08

  • 11:00 gtirloni: Added novaproxy-02 to $CACHES
  • 10:50 gtirloni: Added cloudvirt1017 to eqiad1 region

2018-11-07

  • 13:49 arturo: T208733 moving labvirt1017 from main deployment to eqiad1 and renaming it to cloudvirt1017

2018-10-22

  • 16:24 arturo: T206261 another update to dmz_cidr in eqiad1
  • 10:26 arturo: change again in dmz_cidr in eqiad1: VMs will connect between them without NAT even when using floating IPs (T206261)

2018-10-19

  • 12:02 arturo: revert change in dmz_cidr in eqiad1 for now (T206261)
  • 11:16 arturo: change in dmz_cidr in eqiad1: VMs will connect between them without NAT even when using floating IPs (T206261)
  • 10:14 arturo: we have new virt servers in the eqiad1 deployment since past week and this week: cloudvirt1018, cloudvirt1023, cloudvirt1024

2018-09-26

  • 10:40 arturo: T205524 all sorts of restarts in all neutron daemons
  • 10:20 arturo: T205524 stop/start all neutron agents in cloudnet1003.eqiad.wmnet
  • 10:13 arturo: T205524 restart all agents in cloudnet1004.eqiad.wmnet
  • 10:10 arturo: restart neutron-server in cloudcontrol1003, investigating T205524

2018-09-24

  • 10:57 arturo: try to increase floating ip allocation pool in eqiad1. Of 185.15.56.0/25 we are using only 185.15.56.10-185.15.56.31, I don't know why. Let's use 185.15.56.2-185.15.56.126

2018-09-21

  • 17:18 bd808: Running `sudo maintain-meta_p --all-databases --purge` across labsdb10(09|10|11) for T201890

2018-09-17

  • 22:08 bd808: Granted gtirloni project roles of admin, projectadmin, and user

2018-09-12

  • 11:20 arturo: T202636 distributing default routes using classless-static-route for all VMs in main/labtest (dnsmasq/nova-network)

2018-09-11

  • 16:52 arturo: again, restarted nova-network after killing all dnsmasq procs in labnet1001 for T202636
  • 16:08 arturo: restarted nova-network after killing all dnsmasq procs in labnet1001 for T202636
  • 10:53 arturo: T202636 creating all the compat-network configuration in neutron
  • 10:36 arturo: T202636 creating br-compat bridge in eqiad1 for the compat network
  • 10:33 arturo: T202636 manually reserve 10.68.23.253 (in nova-network)

2018-09-10

  • 22:46 andrewbogott: deleting all VMs on labvirt1019 and 1020 as prep for T204003

2018-08-30

  • 15:46 andrewbogott: restarting rabbitmq-server on cloudcontrol1003
  • 13:07 arturo: T202636 internal network routing now exists in labtest/labtestn for VM to communicate with each other

2018-08-28

  • 11:04 arturo: T202549 eqiad1 databases are all now running in m5-master. Mysql has been cleaned from cloudcontrol100[3,4]

2018-08-23

  • 16:17 arturo: T188589 bstorm_ merged patch to reduce nova DB connection usage
  • 13:15 arturo: T202115 `root@cloudcontrol1003:~# neutron subnet-update --allocation-pool start=10.64.22.4,end=10.64.22.4 e4fb2771-a361-4add-ac4e-280cc300c59f`
  • 13:10 arturo: T202115 (was `{"start": "10.64.22.2", "end": "10.64.22.254"}` )
  • 13:08 arturo: T202115 `root@cloudcontrol1003:~# neutron subnet-update --allocation-pool start=10.64.22.254,end=10.64.22.254 e4fb2771-a361-4add-ac4e-280cc300c59f`

2018-08-22

  • 15:28 arturo: cleanup local glance,keystone databases in cloudcontrol1003.wikimedia.org (already in m5-master)
  • 15:27 arturo: cleanup local keystone database in cloudcontrol1003.wikimedia.org (already in m5-master)

2018-08-21

  • 15:39 andrewbogott: initial test message
  • 10:31 arturo: eqiad1 remove leftover port for HA on labnet1004
  • 10:15 arturo: test

2018-05-07

  • 18:07 bstorm_: stopped the toolhistory job because it is totally broken and fills /tmp.

2018-02-09

  • 00:55 bd808: Added Arturo Borrero Gonzalez and Bstorm as project members
  • 00:54 bd808: Removed Yuvipanda at user request (T186289)