Nova Resource:Integration/SAL

From Wikitech
Jump to navigation Jump to search

2023-06-05

  • 18:42 mutante: - access to old gerrit service IP (gerrit-old.wikimedia.org) for cloud IPs was removed with gerrit:927246 (homer deploy), T336427

2022-05-03

  • 18:02 bd808: `sudo wmcs-openstack role remove --user dzahn --project integration user` per request

2022-04-14

  • 20:22 mutante: - revoking project-admin for myself

2022-01-26

  • 14:17 arturo: created flavor g3.cores8.ram24.disk20.ephemeral60.4xiops T299704

2021-09-05

  • 15:15 andrewbogott: changing the puppetmaster for integration-puppetmaster-02 to the default puppetmaster. It's a lot easier to have a project-local puppetmaster if it isn't its own master; otherwise there's epic cert confusion. I've confirmed that there aren't any custom features applied to this master.

2021-08-23

  • 19:58 bstorm: acked the alert for puppet on the integration pkgbuilder hosts using the new alertmanager thingy T288237

2021-06-11

  • 14:39 balloons: add 16 vcpu, to 178 total T284507

2021-03-02

  • 12:26 arturo: shutdown integration-agent-docker-1009 because it was stuck in nova MIGRATING status, trying to fix that by hand

2020-12-14

  • 22:07 andrewbogott: resizing integration-docker-registry-1003 to a g2 flavor: g2.cores4.ram24576.disk300

2020-08-16

  • 20:16 andrewbogott: moving agent-qemu-1001 to a new host
  • 19:50 andrewbogott: moving integration-agent-docker-1002 to a new host

2020-07-14

  • 15:28 bd808: Silenced prometheus alerts for 7d

2019-10-01

  • 12:19 arturo: migrating integration-castor03 to cloudvirt1021 (T232646)

2019-09-25

  • 16:19 andrewbogott: moving integration-agent-docker-1009 and integration-agent-docker-1010 to cloudvirt1021
  • 15:55 andrewbogott: moving integration-slave-jessie-1002 to cloudvirt1021

2019-08-09

  • 12:15 arturo: rebalance load, reallocating integration-slave-docker-1040 from cloudvirt1018 to cloudvirt1026
  • 11:32 arturo: depool integration-slave-docker-1040 in preparation for reallocating

2019-07-19

  • 22:26 andrewbogott: moving integration-cumin to cloudvirt1015
  • 17:12 andrewbogott: depooling and moving integration-slave-docker-1041 and integration-slave-jessie-1002

2019-06-06

  • 14:57 andrewbogott: moving 'webperformance' VM to cloudvirt1030

2019-06-05

  • 08:56 arturo: move integration-slave-docker-1059 and integration-slave-docker-1058 to cloudvirt1028 (T223971)

2019-06-04

  • 08:56 arturo: reallocating integration-slave-docker-1059 and integration-slave-docker-1058 to cloudvirt1012 (T223971)

2019-05-15

  • 10:36 arturo: T223148 reallocation of integration-castor03 is now done
  • 09:31 arturo: updating DNS servers in integration-puppetmaster01 and reboot
  • 09:25 arturo: add myself as projectadmin
  • 09:20 arturo: T223148 reallocating saucelabs-01 to cloudvirt1007
  • 09:13 arturo: T223148 reallocating integration-castor03 to cloudvirt1002
  • 08:56 arturo: T223148 reallocating integration-puppetmaster01 to cloudvirt1001

2019-03-29

2019-01-31

  • 12:05 arturo: VM instances integration-slave-docker-1044,integration-slave-docker-1046,integration-slave-docker-1047, were stopped briefly due to issue in hypervisor (T215012)

2018-12-10

  • 18:15 andrewbogott: moving 'jenkinstest' to eqiad1-r

2018-12-07

  • 20:36 andrewbogott: moving integration-r-lang-01 to eqiad1-r
  • 20:33 andrewbogott: moving webperformance to eqiad1-r
  • 19:45 andrewbogott: moving saucelabs-01 and saucelabs-02 to eqiad1-r
  • 19:33 andrewbogott: moving integration-puppetmaster01 to eqiad1-r
  • 18:07 andrewbogott: moving integration-slave-docker-1037 to eqiad1-r
  • 18:05 andrewbogott: moving integration-slave-docker-1040 to eqid1-r

2018-11-16

  • 16:03 andrewbogott: moving integration-cumin to eqiad1-r

2018-11-14

  • 15:58 andrewbogott: migrating integration-slave-jessie-1001 to eqiad1-r
  • 15:14 andrewbogott: moving integration-slave-docker-1021 to eqiad1-r

2018-11-13

  • 18:14 andrewbogott: moving integration-slave-docker-1038 to eqiad1-r
  • 18:12 andrewbogott: moving integration-slave-docker-1034 to eqiad1-r

2018-10-22

  • 23:22 andrewbogott: migrating integration-slave-jessie-1003 and integration-slave-jessie-1004 to eqiad1-r
  • 20:55 andrewbogott: migrating integration-slave-docker-1033 to eqiad1-r
  • 20:55 andrewbogott: migrated integration-slave-docker-1017 to eqiad1-r

2018-09-20

  • 09:28 arturo: T204373 increasing quotas

2018-03-23

  • 19:59 andrewbogott: upgraded python-conftool on integration-slave-jessie-1001 and integration-slave-jessie-1004 to resolve puppet warnings

2018-02-14

  • 15:21 andrewbogott: migrating integration-slave-jessie-1001 to labvirt1015
  • 15:21 andrewbogott: migrating integration-slave-jessie-1002 to labvirt1014

2018-02-09

  • 01:02 bd808: Removed Yuvipanda at user request (T186289)

2017-10-01

  • 22:10 bd808: Cold migrating integration-slave-jessie-1004 from labvirt1015 to labvirt1017
  • 22:09 bd808: Cold migrating integration-slave-jessie-1003 from labvirt1015 to labvirt1017

2016-07-04

  • 09:41 paladox: <hashar> !log CI is out of Nodepool instances, the pool has drained because instances can no more be deleted over the OpenStack API

2016-04-27

  • 19:58 mutante: IP address 10.68.16.66 has 26 names, 25 are in contintcloud, one is sm-puppetmaster-trusty2.servermon.eqiad.wmflabs.
  • 19:32 mutante: integration-raita "Could not find class role::ci::raita" puppet error. manually stopping ganglia-monitor

2015-09-08

  • 20:49 andrewbogott: re-enabled integration-slave-trusty-1011
  • 20:15 andrewbogott: disconnecting integration-slave-trusty-1011 and migrating to a new virt host

June 4

  • 14:54 YuviPanda: ran sudo sed -i 's/GlobalSign_CA.pem/ca-certificates.crt/' /etc/ldap/ldap.conf on integration-saltmaster

April 30

  • 19:28 andrewbogott: moved integration-puppetmaster and -slave-trusty-1013 to labvirt hardware. This involved a reboot and possible IP change.

December 16

  • 14:12 YuviPanda: manually cleaned and re-requested puppet cert for i-0000078a.eqiad.wmflabs

December 13

December 2

  • 23:54 bd808: Tricked Jenkins into using english UI strings by setting default language to en-us and applying the change

November 26

  • 19:02 bd808: Changed worker count on wikidata-jenkins[1-3] from 5 to 3 as requested by hoo

November 6

  • 23:28 bd808: deleted corrupt mediawki/core clone in workspace/mwext-MobileFrontend-qunit-mobile on gallium

October 22

  • 00:26 bd808: Greg Grossmeier added and made project admin

September 29

  • 08:57 hashar: rebased puppetmaster

September 26

  • 14:19 hashar: disabled puppet on all instances
  • 13:53 hashar: rebasing puppetmaster

September 25

  • 17:44 bd808: Deleted 1G of /tmp/mw-ocg-latexer*/ files on integration-slave1006
  • 17:36 bd808: Disk usage for / on integration-slave1006 at 90% vs 54% on integration-slave1001. Not sure where the difference is.
  • 17:25 bd808: Restarted nslcd on integration-slave1006. Lots of "error writing to client: Broken pipe" in syslog
  • 17:22 bd808: Forced puppet run on integration-slave1006. No changes applied which doesn't bode well for fixing the Jenkins failures.
  • 17:19 bd808: Added BryanDavis and Ori.livneh to default sudo policy
  • 17:16 bd808: Added BryanDavis (self) as project member and admin
  • 17:07 bd808: Disabled Jenkins slave integration-slave1006.eqiad.wmflabs to see if it is causing false failures bug 71314

September 23

  • 23:07 bd808: Jenkins and deployment-bastion talking to each other again after six (6!) disconnect, cancel jobs, reconnect cycles
  • 22:55 bd808: Took deployment-bastion offline to try and reset master's view of it's workers
  • 22:53 bd808: Killed all pending jobs for deployment-bastion in Jenkins queue ("waiting for executor" bug/issue)

September 15

  • 22:40 andrewbogott: migrating integration-slave1002 and integration-slave1007 to virt1002
  • 18:34 Krinkle: Create and set up pool of Jenkins slaves with Trusty (integration-slave1006, integration-slave1007, integration-slave1008); bug 68256
  • 18:30 Krinkle: Delete the experimental integration-slave1005 instance

September 10

  • 02:14 mutante: package nodejs-legacy not found - puppet fail on integration slaves
  • 02:11 mutante: - package php5-parsekit not found on trusty slave

September 9

  • 23:38 bd808: Enabled "Throttle Concurrent Builds" for beta-update-databases-eqiad in an attempt to keep it from hanging all executors on deployment-bastion. Change only made via Jenkins interface, not JJB.
  • 23:24 bd808: Restarted jenkins slave on deploymnet-bastion multiple times to fix "waiting for executor" problem
  • 03:01 bd808: Restarted agent on deployment-bastion (twice)

September 3

  • 16:54 bd808: Restarted jenkins worker on deployment-bastion a second time. Jenkins seems to see its executors again now.
  • 16:50 bd808: Restarted jenkins worker on deployment-bastion.

August 27

  • 20:54 ^d: upgraded elasticsearch to 1.3.2 on integration-slave100[1-3]

August 21

  • 14:31 hashar_: rebased puppetmaster

August 9

  • 18:36 hashar: rebasing puppet repo on integration-puppetmaster

August 6

  • 09:50 hashar: upgraded package on trusty node hhvm-dev bumped (3.3-dev+20140728+wmf3) over (3.3-dev+20140728+wmf1)

August 4

  • 16:14 hashar: rebased puppet repo on puppetmaster

July 31

  • 13:48 hashar: updating puppet master

July 23

  • 16:47 hashar: apt get upgrade on integration-slave1004-trusty (it is not pooled yet)
  • 16:37 hashar: upgraded hhvm / elasticsearch on jenkins slaves

July 22

  • 00:56 hashar: restarting diamond on integration-slave1001 - 1003 Related to bug 68254
  • 00:54 hashar: puppet somehow stalled on integration-slave instances. Had to delete /var/lib/puppet/state/agent_catalog_run.lock

July 21

  • 21:08 hashar: Switching integration-slave1004-trusty to its own puppetmaster
  • 20:44 hashar: created integration-slave1004-trusty a trusty instance unsurprisingly

July 19

July 18

  • 19:59 Krinkle: Setting up integration-slave1004 to be the first Trusty-based (w/ nodejs 0.10) Jenkins slave
  • 00:26 bd808: Manually triggered beta-update-databases-eqiad and watched it succeed
  • 00:20 bd808: Killed stuck beta-update-databases-eqiad job

July 15

  • 16:10 bd808: beta-update-databases-eqiad job was stuck for ~24 hours :(
  • 16:01 bd808: Killed stuck beta-update-databases-eqiad job

July 9

  • 20:40 hashar: created experimental instances integration-zuul-merger and integration-zuul-server . Moved them to use local puppetmaster

July 2

June 23

  • 18:50 hashar: rebased puppetmaster . Among others: that migrates us to puppet 3

June 6

  • 13:13 hashar: migrating instances to puppet 3 137898
  • 12:43 hashar: unbroken puppet on the jenkins slaves. We had some dupe definitions. Patches uploaded in Gerrit and cherry picked on local puppetmaster

June 4

  • 08:50 hashar: rebased operations/puppet.git on puppetmaster

May 30

  • 12:29 hashar: created integration-slave1003 (Ubuntu Precise) and switched it to use local puppet master.
  • 09:32 hashar_: deleting trusty instance slave1003, no real time to play with Trusty right now. Will recreate it as a Precise instance to add a new slave

May 29

  • 13:00 hashar: Creating integration-slave1003 instance with an Ubuntu Trusty image.

May 27

  • 08:44 hashar: rebased puppetmaster

May 22

  • 10:01 hashar: rebase operations/puppet on puppetmaster. A bunch of contint related changes have been merged yesterday and this morning.

May 21

  • 14:48 hashar: applied role::ci::publisher::labs to integration-publisher to setup rsync 134608 
  • 14:22 hashar: migrated integration-publisher to use puppetmaster::self
  • 11:04 hashar: deleted integration-composer instance. Archived /mnt/ in /data/project/

May 16

  • 20:53 bd808: Enabled beta-code-update-eqiad job
  • 20:50 bd808: Disabled beta-code-update-eqiad job to test a fix for TimedMediaHandler

May 7

  • 16:49 manybubbles: installed elasticsearch highlighter plugin that cirrus needs for integration tests

May 5

  • 07:43 hashar: rebase operations/puppet repo on puppetmaster

April 28

  • 09:29 hashar: deploying phantomjs from integration/phantomjs.git 130049
  • 09:29 hashar: rebased operations/puppet

April 23

  • 20:06 hashar: Updated puppetmaster local operations/puppet.git clone
  • 20:04 hashar: switching integration-dev to use the project puppetmaster instance

April 17

  • 12:26 manybubbles: upgrading elasticsearch on integration-slave1002

April 15

  • 12:24 hashar: apt-get upgrading instances
  • 12:21 hashar: rebased puppetmaster

April 9

  • 22:56 Krinkle: Restarting integration-slave1002

April 7

  • 10:45 hashar: Getting PHP Composer installed on labs slaves. 124305

April 3

  • 10:22 hashar: attempting to reinstall hhvm on Jenkins slaves (cherry pick of 123573 )
  • 10:21 hashar: rebased puppet repository on puppetmaster

March 31

  • 21:46 hashar: updated puppet repo

March 28

  • 15:39 hashar: integration project fully migrated to eqiad \O/
  • 15:39 hashar: deleting instance ntegration-selenium-driver no more needed. browsertests jobs should now be runnable on integration-slave1001 and integration-slave1002 (in eqiad)
  • 11:36 hashar: converted integration-puppetmaster as a self puppet master \O/
  • 11:23 hashar: creating integration-puppetself attended to be a puppet master for the integration project

March 27

  • 15:55 hashar: deleting integration-apache1 Was used as a proxy for other instances and as a dev box for integration.wikimedia.org/ . Freeing the public IP address while at it

March 21

  • 21:02 hashar: pmtpa /data/project is now emptied up!!! \O/
  • 13:32 hashar: created web proxy entry ci.wmflabs.org pointing to integration-dev.eqiad.wmflabs:80
  • 13:31 hashar: deleting integration-jenkins2 which had gerrit/zuul/jenkins. Will rebuild it in integration-dev.eqiad.wmflabs

March 18

  • 14:26 hashar: creating new slaves integration-slave1001 and integration-slave1002 using role::ci::slave::labs

March 12

  • 00:13 ^d: scratch that about integration-slave1001, will finish later

March 11

  • 23:33 ^d: slave hhvm-build added to eqiad earlier today, running hhvm nightly builds for testing
  • 23:33 ^d: spinning up new general purpose slave in eqiad, integration-slave1001. will replace slave02 (and maybe 03) from pmtpa

February 18

  • 20:27 Krinkle: Installed grunt-cli on slave02 and slave03 to broken jenkins jobs for mwext-VisualEditor, oojs-ui, oojs-core
  • 20:27 Krinkle: Upgraded npm to v1.4.3 on slave02 and slave03 to fix ssl certificate errors

February 17

  • 15:07 hashar: adding in 4CPU instance integration-slave03
  • 15:05 hashar: adding in 4CPU instance integration-slave02

January 21

  • 22:26 hashar: adding jeremyb as a project memeber

December 18

  • 14:46 hashar: granted Amire80 and Zfilipin root access on integration-sikuli
  • 14:44 hashar: added Amire80 and Zfilipin to the project. Created integration-sikuli.pmtpa.wmflabs instace.

December 17

  • 14:12 hashar: installing libsikuli-script-java on integration-selenium-driver for bug 54393
  • 01:21 Krinkle: Ensured npm/grunt-cli (0.1.11) is globally available on integration-slave01
  • 01:20 Krinkle: Upgraded npm from v1.1.39 to v1.3.18 on integration-slave01

December 4

  • 10:18 hashar: refreshed slave-scripts on integration-selenium-driver , doing a git pull in /srv/deployment/integration/slave-scripts

November 27

  • 15:18 hashar: created User:jenkins-slave for the integration project. Credentials in fenari.wikimedia.org/home/wikipedia/doc/labs-jenkins

November 22

  • 20:12 hashar: added Anomie as user + sudoer

September 24

  • 13:31 hashar: integration-selenium-driver : installing iptables and running ` iptables -t nat -I OUTPUT --dest 208.80.153.219 -j DNAT --to-dest 10.4.1.133` to work around NAT issue (bug 45868)
  • 13:05 hashar: hashar@integration-selenium-driver:~$ sudo dpkg -i phantomjs_1.9.0-1_amd64.deb
  • 10:04 hashar: created integration-pbuilder , a 4GB RAM instance to replace integration-jobbuilder which dies with out of memory issues with some big packages.

September 20

  • 19:20 hashar: on integration-selenium-driver installing openjdk-7-jre-headless and qa/browsertests.git packages dependencies (see also {{bug:54385}} and 85264

September 19

  • 21:56 hashar: installing jenkins-debian-glue-buildenv-piuparts on integration-debian-builder and updating sudo policy to let jenkins run piuparts_wrapper

August 1

  • 15:09 hashar: unbroke puppet on integration-jenkins2

July 3

  • 15:11 hashar: upgrading jenkins and gerrit on integration-jenkins2

June 13

  • 09:36 hashar: upgraded Gerrit on integration-jenkins2 to 2.6-rc0-144-gb1dadd2 which comes from apt.wm.o
  • 09:29 hashar: integration-apache1 updated the proxy rules in /etc/apache2/sites-enabled to use the integration-jenkins2.pmtpa.wmflabs hostname instead of an IP.
  • 09:16 hashar: added Yuvipanda to the project with su rights.

April 17

  • 10:18 hashar: rebooting -jobbuilder : can't ssh to it

April 11

  • 21:23 hashar: -jobbuilder : updated local puppet and running puppetd -tv

April 10

  • 11:02 hashar: jenkins2 instance: upgrading Zuul / Jenkins (maybe Gerrit too)

March 4

  • 22:15 Krinkle: Upgrading npm from 1.1.39 to 1.2.13 on integration-apache1 (sudo npm install -g npm)

February 19

  • 08:31 hashar: dist-upgrade on integration-jenkins2

February 11

  • 15:44 hashar: applying misc::package-builder to integration-jobbuilder

February 1

  • 20:45 hashar: Upgraded Gerrit on integration-jenkins2 to 336eb70b51fe2328d4dd21fef3c78ba11e32758d

January 15

  • 15:55 hashar: -jenkins2 manually updated /etc/zuul/wikimedia repo
  • 15:54 hashar: -jenkins2 : reset hard /usr/local/src/zuul . It had a failed merge. That should make puppet bring up the latest Zuul� version.
  • 15:52 hashar: running puppet on integration-jenkins2 to find out how bad it is right now.

January 11

December 14

  • 14:27 hashar: deleted -jenkins instance, replaced by -jenkins2

October 31

  • 13:02 hashar: trying out role::zuul::labs on integration-jenkins2

October 16

  • 11:42 hashar: rebooting integration-apache1 in hope the security rule will be applied :(
  • 11:03 hashar: manually updating operations/puppet on integration-apache1

October 15

  • 13:24 hashar: redid PHPUnit install with: `pear install --alldeps pear.phpunit.de/PHPUnit` to bring in PHP_Invoker like in production
  • 13:22 hashar: manually installing PHPUnit on -jenkins : pear config-set auto_discover 1 && pear install pear.phpunit.de/PHPUnit

September 28

  • 10:34 hashar: applying 25236/2 to integration-jenkins (switch zuul from 'dev' branch to 'master' one)

September 26

  • 09:14 hashar: Starting Gerrit Code Review: OK !!!!!
  • 08:59 hashar: manually installing gerrit on -jenkins

September 24

  • 19:10 hashar: added dsc (David Schoonover) to the project so he can plays with jenkins

September 21

  • 20:16 hashar: converting integration-jenkins to puppetmaster::sef

July 30

  • 13:12 hashar: dist-upgrading psm-precise

July 3

  • 07:05 Krinkle: Rebooting intergration-apache1. CPU and load has been raising linear for the past 2 hours up to 100% just now. Cause unknown, instance was not in use for the last 24 hours.

July 1

  • 09:59 hashar: rebooted psm-precise and integration-apache1 due to leap second bug

June 29

  • 12:31 Krinkle: testswarm/config: client.requireRunToken = true;

June 28

  • 14:50 hashar: created geoip-on-labs Lucid instance to find out if geoip puppet class apply cleanly on labs

June 27

  • 15:37 hashar: applying puppetmaster::self to psm-precise
  • 15:21 hashar: created psm-precise to test puppetmaster::self on a Precise box
  • 15:18 hashar: deleting psm-lucid instance. puppetmaster::self does run without errors on a fresh instance!
  • 14:55 hashar: created psm-lucid instance to test out bootstrapping of puppetmaster::self from scratch
  • 14:54 hashar: integration-apache1 now uses puppetmaster:self (see /var/lib/git/ )
  • 13:56 hashar: Running `puppetd -tv` on integration-apache1. puppetmaster::self has been fixed by ops
  • 11:09 Krinkle: Installing bunch of stuff on integration-apache1 - experimentation right now, documenting steps on Nova_Resource:Integration/Setup (subject to retroactively change at any time)
  • 11:01 Krinkle: Added rule for port 80 (from outside world) to integration/default security group
  • 09:41 mutante: - Allocated new public IP address: 208.80.153.222
  • 09:21 Krinkle: Enabling group puppet "puppetmaster::self" on integration-apache1
  • 09:11 mutante: - raised floating_ips quota from 0 to 1
  • 08:56 hashar: Created ALL/ALL sudo policy for Krinkle and I
  • 00:50 Krinkle: created instance integration-apache1 to use as sandbox for setting up TestSwarm+BrowserStack