Release Engineering/SAL/Archive 4

From Wikitech

2018-12-31

2018-12-22

2018-12-21

2018-12-20

2018-12-19

  • 23:27 thcipriani: integration-slave-jessie-1003:sudo rm -rf /srv/jenkins-workspace/workspace/* and bring back online
  • 18:05 Hauskatze: Created mediawiki/extensions/ExternalGuidance.git repos on Gerrit, set-up mirrors on Diffusion and GitHub; per mediawiki.org request.
  • 15:20 hashar: Testing spicerack postmerge job for T205894: contint1001$ zuul enqueue --trigger gerrit --pipeline postmerge --project operations/software/spicerack --change 480724,2

2018-12-18

2018-12-17

2018-12-14

  • 16:19 Krenair: T204500 Powering off the remaining trusty and main deployment instance, deployment-zotero01, which is no longer in use anyway
  • 13:08 Lucas_WMDE: lucaswerkmeister-wmde@deployment-mwmaint01:~$ mwscript extensions/WikibaseQualityConstraints/maintenance/ImportConstraintEntities.php --wiki=wikidatawiki --config-format=wgConf | tee WikibaseQualityConstraints-config.php # T209957
  • 06:23 kart_: Beta: Updated cxserver to de618f3

2018-12-13

2018-12-12

  • 21:50 Hauskatze: github: deleting https://github.com/wikimedia/mediawiki-extensions-ConditionalShowSection | refs. T211821
  • 21:20 bearND: (beta): Update mobileapps to 55981a8
  • 19:45 hashar: contint1001: sudo chown -R zuul:zuul /etc/zuul/wikimedia/.git
  • 19:37 greg-g: enabled 2fa on the wmfgerrit github user account, recovery codes in releng's pwstore
  • 17:24 Lucas_WMDE: lucaswerkmeister-wmde@deployment-mwmaint01:~$ mwscript extensions/WikibaseQualityConstraints/maintenance/ImportConstraintEntities.php --wiki=wikidatawiki --config-format=wgConf | tee WikibaseQualityConstraints-config.php # T209957

2018-12-11

2018-12-09

  • 23:38 Krenair: restarted apache on deployment-mediawiki-07 for T211524

2018-12-05

  • 14:58 thcipriani: bring integration-slave-jessie-1004 back online (disks look fine)
  • 02:57 kart_: Beta: Updated cxserver to c4240e6

2018-12-04

2018-12-03

  • 22:05 Krenair: T210214 deleting deployment-cache-text04
  • 15:01 Amir1: ores:03b9c98 is going beta

2018-11-30

  • 22:32 thcipriani: removing deployment-redis0{5,6} since I can't ssh in, and they don't seem to have any puppet roles applied

2018-11-29

  • 14:26 godog: switch mediawiki logging to use localhost syslog -> kafka -> logstash
  • 13:54 hashar: Building docker-registry.discovery.wmnet/releng/java8-sonar-scanner:0.1.0 for https://gerrit.wikimedia.org/r/475496 | T209849
  • 08:35 hashar: Deleted instances castor02 and integration-publishing , replaced by new instances in the new WMCS region | T208803

2018-11-28

  • 08:45 hashar: Switching Jenkins job cache to integration-castor03 with AN EMPTY CACHE | T208803
  • 08:18 godog: test mediawiki kafka logging on deployment-mediawiki-07

2018-11-27

2018-11-26

  • 22:22 Krenair: shutoff deployment-cache-text04, now replaced with deployment-cache-text05 in the new region - T210214
  • 18:02 Amir1: deploy ores:3cdaaa6 to beta

2018-11-23

  • 13:55 dcausse: restarted elasticsearch on all deployement-elastic0X nodes (search broken on the beta cluster)
  • 11:09 hashar: Jenkins: removing plugins "Single Use Slave" and "Event Publisher (via ZMQ PUB SUB)". Were used for Nodepool | T209361
  • 10:29 hashar: Building releng/java8-wikidata-query-rdf:0.2.1 container for https://gerrit.wikimedia.org/r/#/c/integration/config/+/475428/ T209776

2018-11-22

2018-11-21

  • 21:44 Krenair: Changed beta.wmflabs.org and *.beta.wmflabs.org A records from 208.80.155.135 (deployment-cache-text04) to 185.15.56.36 (deployment-cache-text05)
  • 20:49 hashar: gerrit: added WMDE-leszek to group "Gerrit Managers" https://gerrit.wikimedia.org/r/#/admin/groups/119,members | T200311
  • 03:25 bd808: Forced puppet run on deployment-mediawiki-0[79] to pick up new redis::shards settings T210030
  • 03:02 bd808: Repeated application of role::mediawiki::memcached on deployment-memc04, deployment-memc06, and deployment-memc07 for T210030
  • 01:22 bd808: Applied role::mediawiki::memcached on deployment-memc05.deployment-prep.eqiad.wmflabs to provision redis T210030

2018-11-20

  • 23:45 mutante: deployment-deploy01 edited /srv/deployment/iegreview/iegreview/.git/DEPLOY_HEAD - - replaced deployment-tin with deployment-deploy1 to fix scap cloning / puppet
  • 23:01 mutante: deployment-deploy01 edited /srv/deployment/scholarships/scholarships/.git - replaced deployment-tin with deployment-deploy1 to fix scap / cloning of scholarships app
  • 22:42 thcipriani: repooling integration-slave-jessie-1003 after cleaning mvn and gradle cache
  • 20:23 Krenair: Changed deployment-prep mysql repl password in attempt to get replication working again, have stored it at deployment-puppetmaster03:/var/lib/git/labs/private/modules/secret/secrets/mysql/repl_password
  • 18:18 mdholloway: deployment-mcs01 deployed [mobileapps/deploy@7553087]
  • 14:59 hashar: created integration-castor03.integration.eqiad.wmflabs intended as a replacement for castor02 | T208803
  • 14:34 hashar: beta-scap job fixed by deleting a file on deployment-mediawiki-07
  • 14:27 hashar: deployment-mediawiki-07 : sudo rm -fR /srv/mediawiki/.~tmp~/
  • 13:55 andrewbogott: deleting deployment-redis05 and deployment-redis06 as per Giuseppe, "we're not using the old jobqueue, we should remove those vms"
  • 13:08 twentyafterfour: removed local unix user mwdeploy from deployment-mediawiki-07 because it was shadowing the real mwdeploy user in ldap
  • 13:01 twentyafterfour: PHP Startup: Unable to load dynamic library '/usr/lib/php/20151012/luasandbox.so' - /usr/lib/php/20151012/luasandbox.so: cannot open shared object file: No such file or directory T208101
  • 12:11 twentyafterfour: scap failures on deployment-mediawiki-07 are related to uid/gid mismatch of the mwdeploy user, specifically the owner of that user's home dir is uid 603 but /etc/passwd|group have a different uid/gid for the same username. T208101
  • 11:47 hashar: Armed keyholder on deployment-deploy01 Got shutdown while being migrated a new cloud region # T208101
  • 11:00 hashar: deployment-deploy01 got migrated to a new region but the Jenkins configuration had not been updated. Adjusting IP address from 10.68.23.38 to 172.16.4.18 | T208101
  • 10:58 hashar: Clearing out deployment-deploy01 disk space. Went offline due to disk space consumption
  • 10:04 Krenair: manually fixed deployment-mediawiki-09:/srv/mediawiki/wmf-config/db-labs.php to match deployment copy, not sure why it didn't deploy properly yet

2018-11-19

  • 22:40 Krenair: manually sorted certs for deployment-puppetdb02
  • 22:23 Krenair: duplicated wiki(user|admin) mysql users on deployment-db0[34] - previous hosts 10.%, new hosts 172.16.%
  • 18:32 Krenair: creating deployment-cache-text05 to replace deployment-cache-text04
  • 15:53 Amir1: cherry-picking gerrit:474694/1 in beta puppetmaster
  • 15:06 Amir1: ores:e957b24 is going beta
  • 13:47 hashar: Shutdown integration-publishing , replaced by integration-publishing02 # T208803
  • 13:21 hashar: Created integration-publishing02 172.16.4.5 for WMCS region migration # T208803
  • 11:13 hashar: updating jobs wikidata-query-rdf-maven-java8-docker wikidata-query-rdf-maven-java8-docker-site-publish for https://gerrit.wikimedia.org/r/#/c/integration/config/+/474660/
  • 10:23 gehel: Updating docker-pkg files on contint1001 for wikdiata-query-rdf image

2018-11-17

2018-11-16

2018-11-15

  • 21:09 hashar: Last Nodepool instance had id 1099516 (yeah more than a million)
  • 17:15 Amir1: ores:51cdf6b is going beta

2018-11-14

  • 18:48 andrewbogott: moving deployment-mediawiki-07 to labvirt1008
  • 18:31 andrewbogott: moving deployment-chromium01 to labvirt1009
  • 18:06 andrewbogott: moving deployment-mx02 to labvirt1003
  • 18:05 andrewbogott: migrating deployment-snapshot01 to labvirt1001
  • 10:39 gtirloni: added gtirloni as projectadmin in deployment-prep project

2018-11-13

  • 22:39 thcipriani: reenabling and running puppet on deployment-deploy01
  • 22:19 andrewbogott: moving deployment-urldownloader02 to labvirt1012
  • 22:02 thcipriani: disable puppet on deployment-deploy01 temporarily while deployment-deploy02 is migrating to preserve dsh files
  • 21:59 andrewbogott: moving deployment-deploy02 to another labvirt
  • 21:55 andrewbogott: moving deployment-webperf12 to a new labvirt
  • 21:50 andrewbogott: moving deployment-dumps-puppetmaster02 to a new labvirt
  • 21:43 andrewbogott: moving deployment-elastic05 to a new labvirt to clear out labvirt1016
  • 13:01 arturo: a puppet refactor for the aptly module may have caused some puppet issues. Should be solved now
  • 09:31 addshore: manually brought integration-slave-docker-1021 back online
  • 09:29 addshore: integration-slave-docker-1021:/# docker rmi $(docker images | grep " months " |grep -v " [1-2] months " | awk '{print $3}')
  • 09:26 addshore: integration-slave-docker-1021:/# docker rmi $(docker images | grep " months " |grep -v " [1-5] months " | awk '{print $3}')

2018-11-12

2018-11-11

  • 23:47 addshore: manually set quibble-vendor-mysql-hhvm-docker timeout to be 60 (i should really get my jjb stuff working again)

2018-11-09

  • 13:43 Amir1: ores:0728805 is going beta
  • 12:13 kart_: Update cxserver to 01686f6

2018-11-08

2018-11-07

  • 23:00 thcipriani: repooling integration-slave-docker-1017 after cleaning up docker images
  • 19:09 Amir1: ores:25dfa4f is going to beta cluster
  • 09:06 hashar: building container releng/operations-puppet:0.5.0 for python3 | T208873

2018-11-06

2018-11-04

2018-11-02

2018-11-01

2018-10-31

  • 21:16 Krenair: remove horizon hiera config for deployment-redis0[56] to unbreak puppet and remove old redis0[12] instance IPs T208040
  • 19:55 andrewbogott: moving deployment-elastic06 to labvirt1012
  • 19:40 andrewbogott: moving deployment-cpjobqueue to labvirt1012 to help clear out labvirt1017
  • 19:11 andrewbogott: moving deployment-kafka-jumbo-1 to labvirt1012 to help clear out labvirt1017
  • 18:54 andrewbogott: moving deployment-kafka-main-2 to labvirt1012 to help clear out labvirt1017
  • 17:21 Amir1: ores:70ba14b is going to beta
  • 13:23 godog: enable statsd reporting for swift

2018-10-30

2018-10-29

2018-10-26

2018-10-25

  • 17:21 bearND: (beta): Update mobileapps to 58cbdff
  • 07:50 hashar: enabling puppet again on deployment-deploy01 . Was disabled by _joe_ for apache-fast-test hacking

2018-10-24

  • 20:44 hasharDinner: Rebuilding CI containers for Quibble 0.0.28
  • 20:27 hasharDinner: tagged Quibble 0.0.28 at 1ac8fe3
  • 08:57 hashar: gerrit: added Lars "liw" Wirzenius to the Administrators group | T207830
  • 07:27 Krenair: T207825 reapplied role::jobqueue_redis::master to deployment-redis prefix
  • 07:01 Krenair: T207825 replacing deployment-redis3-changeprop with deployment-redis3-changeprop02 (jessie m1.small)
  • 06:59 Krenair: T207825 moved role::jobqueue_redis::master role from deployment-redis prefix to deployment-redis0[56]
  • 05:22 kart_: Beta: Updated cxserver to 9ad60d9

2018-10-23

2018-10-22

2018-10-21

2018-10-19

2018-10-18

  • 13:27 Amir1: deploying ores:d724d20

2018-10-16

2018-10-15

2018-10-13

2018-10-12

2018-10-11

2018-10-10

  • 19:38 awight: restarted ORES celery workers on ores2003 (~17:00), ores200* (17:05)
  • 17:15 twentyafterfour: cowboy-coding on deployment-deploy01 to solve scap fatal error dcheck failing to catch fatals

2018-10-09

  • 19:00 Krinkle: Re-enable beta-scap-eqiad job
  • 18:19 Krinkle: Messing with scap in beta to test T121597 / D1114
  • 15:32 thcipriani: deployment-deploy01:sudo rm -rf /tmp/scap_l10n_*

2018-10-08

  • 03:31 kart_: Update(d) cxserver to 47a864b
  • 03:02 legoktm: onlined integration-slave-jessie-1002
  • 02:20 legoktm: legoktm@integration-slave-jessie-1002:/srv/jenkins-workspace/workspace$ sudo rm -rf *

2018-10-06

  • 13:59 Reedy: cleared some large folders out of /tmp on deployment-deploy01

2018-10-05

  • 19:47 marxarelli: bringing integration-slave-docker-1040 back online
  • 19:05 marxarelli: taking integration-docker-slave-1040 offline for docker daemon restart
  • 19:04 marxarelli: bringing integration-slave-docker-1038/1041/1043 back online
  • 19:02 marxarelli: taking integration-docker-slave-1038/1041/1043 offline for docker daemon restart
  • 19:01 marxarelli: bringing integration-slave-docker-1033/1037 back online
  • 18:58 marxarelli: taking integration-docker-slave-1033/1037 offline for docker daemon restart
  • 18:56 marxarelli: bringing integration-slave-docker-1034 back online
  • 18:56 marxarelli: integration-puppetmaster01:/var/lib/git/operations/puppet is up-to-date again after manually updating submodules and subsequent automated git-sync-upstream
  • 18:33 marxarelli: taking integration-slave-docker-1034 offline for docker daemon restart
  • 11:18 Krenair: rm -rf /tmp/scap_l10n_* on deployment-deploy01
  • 06:00 legoktm: deployed https://gerrit.wikimedia.org/r/464757
  • 04:35 legoktm: deploying https://gerrit.wikimedia.org/r/464747
  • 01:17 legoktm: triggering php71 quibble jobs manually via contint1001

2018-10-04

  • 22:23 thcipriani: updated jenkins jobs: https://phabricator.wikimedia.org/P7634
  • 22:19 thcipriani: updating 103 jenkins jobs that use ci-src-setup-simple
  • 21:45 thcipriani: Updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/#/c/integration/config/+/464719/
  • 21:13 marxarelli: bringing integration-slave-jessie-1003 back online
  • 21:12 marxarelli: deleted /mnt/home/jenkins-deploy/{.m2,.gradle} on integration-slave-jessie-1003
  • 21:10 marxarelli: deleting cache directories in /mnt/home/jenkins-deploy on integration-slave-jessie-1003 to free up disk space
  • 20:31 marxarelli: terminating long-running (> 3 hours) CI docker containers (T198517)

2018-10-03

  • 18:45 marxarelli: deploying I0fcd95 for all 313 affected jobs
  • 18:16 marxarelli: deploying I0fcd95 for 12 mediawiki-quibble-vendor-* jobs
  • 17:46 marxarelli: removing old workspaces on integration-slave-docker-1034 (`rm -rf /srv/jenkins-workspace/workspace/*`) and bringing back online
  • 16:52 thcipriani: reloading zuul to deploy Change gate-and-submit-l10n to low precedence
  • 15:46 marxarelli: bringing integration-slave-docker-1040/1043 nodes back online after killing long running docker job and freeing up /var/lib/docker space (T206134)
  • 15:40 marxarelli: killing long running docker jobs on integration-slave-docker-1040/1043 which are filling up /var/lib/docker with log output
  • 15:12 thcipriani: reloading zuul to deploy l10 pipeline
  • 14:53 thcipriani: integration-slave-docker-1038 removed workspaces, brought back online integration-slave-docker-104{0,3} need more investigation since docker is full(?)
  • 07:07 mdholloway: deployment-maps04 deployed [kartotherian/deploy@27062b4]: Specify WDQS endpoint in the service config (T205607)
  • 06:22 _joe_: cherry-picking 455154 (php-fpm installation) on deployment-prep

2018-10-02

2018-10-01

  • 20:09 thcipriani: deployment-deploy01:sudo rm -rf /tmp/scap_l10n_* to remove stale l10n json and free up space
  • 17:13 marxarelli: bringing integration-slave-docker-1041 back online following source directory clean up (T205902)
  • 16:52 marxarelli: removing old workspace src directories left by non-quibble docker jobs on integration-slave-docker-1041
  • 07:10 mdholloway: deployment-maps04 updated kartotherian and tilerator to latest
  • 05:24 mdholloway: deployment-mcs01 deployed [mobileapps/deploy@07cbfb4]: Update mobileapps to a1fa41b

2018-09-29

2018-09-28

  • 15:31 Amir1: ladsgroup@deployment-deploy01:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=fawiki --prefix (T201009)
  • 14:50 thcipriani: investigating integration-slave-docker-1041
  • 07:35 mdholloway: deployment-mcs01 deployed [mobileapps/deploy@bf09080]: Update mobileapps to 7878ffc

2018-09-27

  • 17:53 mdholloway: deployment-mcs01 deployed [mobileapps/deploy@a0054ba]: Update mobileapps to 0d6c2b7
  • 07:38 mdholloway: deployment-maps04 updated tilerator and kartotherian node modules (T195513, T200594)

2018-09-26

  • 15:13 thcipriani: integration-slave-docker-1034:sudo rm -rf /srv/jenkins-workspace/workspace/* and bring back online -- https://phabricator.wikimedia.org/P7592
  • 15:05 thcipriani: integration-slave-docker-1033:sudo rm -rf /srv/jenkins-workspace/workspace/* and bring back online
  • 14:47 thcipriani: investigating integration-slave-docker-103{3,4}
  • 11:57 Amir1: gerrit:462927 (ores) is going to beta
  • 08:24 hashar: Restarting CI Jenkins on contint1001 [#2]
  • 08:14 hashar: Restarting CI Jenkins on contint1001

2018-09-25

  • 23:01 marxarelli: configured new jenkins node integration-slave-docker-1043 with 6 executors
  • 23:01 marxarelli: replaced integration-slave-docker-1042 with new integration-slave-docker-1043 instance
  • 22:39 marxarelli: launching new integration-slave-docker-1042 bigram instance
  • 22:33 marxarelli: deleting remaining m1.medium instances used as m4executors (T205362)
  • 22:15 marxarelli: taking remaining m1.medium m4executor jenkins nodes offline (T205362)
  • 18:16 marxarelli: reconfiguring bigram jenkins nodes to use 6 executors. 7 were configured by mistake (T205362)
  • 18:00 marxarelli: configuring new integration-slave-docker-1041 jenkins node with 7 executors (T205362)
  • 17:42 marxarelli: configuring new jenkins node integration-slave-docker-1040 with 7 executors (T205362)
  • 17:38 marxarelli: launching integration-slave-docker-1041 bigram instance (T205362)
  • 17:30 marxarelli: the puppet parameter for docker_lvm_volume specified in horizon was not applied correctly on the first puppet run for some reason. tearing down integration-slave-docker-1039...
  • 17:25 marxarelli: launching integration-slave-docker-1040 bigram instance (T205362)
  • 17:24 marxarelli: deleting instances integration-slave-docker-1007/1008 (T205362)
  • 17:13 marxarelli: launching new integration-slave-docker-1039 bigram instance
  • 17:12 marxarelli: taking integration-slave-docker-1007/1008 offline for replacement (T205362)
  • 17:09 marxarelli: deleting integration-slave-docker-1030/1031 instances (T205362)
  • 17:05 marxarelli: taking integration-slave-docker-1030/1031 offline for replacement
  • 16:47 marxarelli: increasing executors to 7 for jenkins nodes integration-slave-docker-1033/1034
  • 16:46 marxarelli: new instance creation delayed due to quota
  • 16:45 marxarelli: launching new integration-slave-docker-1039/1040 bigram instances
  • 01:21 legoktm: deployed https://gerrit.wikimedia.org/r/450508
  • 00:36 legoktm: deploying https://gerrit.wikimedia.org/r/462609
  • 00:22 legoktm: deploying https://gerrit.wikimedia.org/r/453447

2018-09-24

  • 20:21 bearND: (beta): Update mobileapps to badb463
  • 10:55 hashar: gerrit: granting labs/tools/* project owners the ability to submit changes | https://gerrit.wikimedia.org/r/#/c/labs/tools/+/462420/
  • 09:51 hashar: deployment-deploy01 : backed up /srv/mediawiki-staging/php-master/cache/gitinfo and created a new. Its size of 69632 bytes might cause slow writes?? | T204762
  • 09:24 hashar: Live hacked scap code on deployment-deploy01 for T204762 and reverted hack changes
  • 08:32 hashar: deployment-deploy01 rm -fR /tmp/scap_l10n_*
  • 06:41 legoktm: deploying https://gerrit.wikimedia.org/r/462341
  • 03:45 kart_: Update cxserver to d913793

2018-09-23

  • 14:03 Krenair: rm stuff in deployment-deploy01:/tmp to try to clear space and stop shinken whining
  • 01:05 andrewbogott: rebooted deployment-maps03; OOM and also T205195

2018-09-22

  • 20:51 Hauskatze: github: deleting several wikimedia/mediawiki-extensions-Collection-.* mirror repos for T183891
  • 20:05 Hauskatze: github: deleted mirror wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-zim_renderer | T183891; moving to the next one
  • 18:21 Krenair: went to do the same with deployment-maps03 and accidentally broke SSH access to the server
  • 18:21 Krenair: removed ferm package from deployment-snapshot01 as it appeared unmanaged by puppet and was causing problems with SSH access from the current deployment hosts (previous logs referenced T153468, this just explains why puppet hadn't purged stuff)
  • 18:01 Krenair: rm deployment-maps03:/etc/ferm/conf.d/10_redis_exporter_6379 as it was breaking ferm from starting (T153468), puppet has not re-created it so I assume it was historical (shouldn't puppet be purging such files?)
  • 18:00 Krenair: rm deployment-snapshot01:/etc/ferm/conf.d/10_prometheus-nutcracker-exporter as it was breaking ferm from starting (T153468), puppet has not re-created it so I assume it was historical (shouldn't puppet be purging such files?)

2018-09-21

  • 17:26 marxarelli: adding jenkins node integration-slave-docker-1038 with 7 executors
  • 16:47 marxarelli: added new jenkins node integration-slave-docker-1037 with 7 executors
  • 15:49 marxarelli: replacing integration-slave-docker-1036 with new bigram instance
  • 15:48 marxarelli: taking node integration-slave-docker-1035 offline due to unusually high steal cpu time and long build durations
  • 15:17 marxarelli: integration-slave-docker-1035/1036 showing unusually high cpu steal and unusually long mean build durations
  • 15:15 marxarelli: taking integration-slave-docker-1036 offline due to unusually high cpu steal % trend
  • 15:13 marxarelli: launching integration-slave-docker-1037 bigram instance
  • 13:03 Amir1: ores:7b987a7 is going beta
  • 05:32 legoktm: deployed https://gerrit.wikimedia.org/r/461510

2018-09-20

  • 23:48 marxarelli: adding new integration-slave-docker-1035/1036 jenkins nodes, each with 7 executors
  • 23:23 marxarelli: launching integration-slave-docker-1035/1036 bigram instances
  • 23:20 marxarelli: taking integration-slave-docker-1004/1005 offline for replacement (T202160)
  • 16:52 Amir1: deploy ores:ee2d28b
  • 11:21 hashar: Refreshing jenkins jobs to get rid of docker run option "--tmp /tmpfs" . It is mounted with 'noexec' which causes various jobs to fail. | T203181 and T204919
  • 11:17 hashar: deployment-deploy01: removed /srv/deployment/analytics/refinery-cache (8GBytes)
  • 11:07 hashar: deployment-deploy01 is out of disk space (again)

2018-09-19

2018-09-18

  • 10:04 hashar: Updating Quibble Jenkins jobs to 0.0.26
  • 09:46 hashar: updating mwselenium-quibble-docker to Quibble 0.0.26
  • 07:48 hashar: Updating jenkins jobs to use Quibble 0.0.25
  • 07:34 hashar: deployment-sca01: rm -fR /srv/ores /srv/deployment/cxserver.jenkins # untouched since 2016
  • 07:31 hashar: cleaning disk on deployment-sca01
  • 00:30 marxarelli: configuring integration-slave-docker-1034 jenkins node to use 6 executors

2018-09-17

  • 22:06 Hauskatze: added missing log entries (actor, etc.) on AbuseFilter (addMissingLoggingEntries.php) for beta
  • 22:00 Reedy: update.php in screen is done
  • 21:53 Hauskatze: maurelio@deployment-deploy01:~$ mwscript extensions/TorBlock/maintenance/loadExitNodes.php --wiki=deploymentwiki --force (998 nodes loaded)
  • 21:12 Reedy: running `foreachwiki update.php --quick` in screen on deployment-deploy01
  • 20:17 bearND: (beta): Update mobileapps to d56e4cf
  • 20:14 marxarelli: running `jenkins-jobs update config/ 'service-pipeline*'` to deploy I5df2b7
  • 19:37 hasharAway: jenkins: remove compiler02.puppet3-diffs.eqiad.wmflabs and compiler03.puppet3-diffs.eqiad.wmflabs from jenkins config. Instances deletd
  • 17:25 mdholloway: deployment-maps04 updated kartotherian and tilerator to latest (T109776)
  • 13:24 hashar: Building docker images for Quibble 0.0.25 | https://gerrit.wikimedia.org/r/460536
  • 12:58 hashar: mwselenium-quibble-docker to Chromium 69 | https://gerrit.wikimedia.org/r/#/c/integration/config/+/460512/ | T204214
  • 12:27 phuedx: phuedx@deployment-deploy01:~$ foreachwikiindblist gettingstarted-with-category-suggestions.dblist extensions/GettingStarted/maintenance/populate_categories.php
  • 09:15 Amir1: ladsgroup@deployment-deploy01:~$ foreachwikiindblist all-labs populateChangeTagDef.php --set-user-tags-only --force
  • 07:04 legoktm: building npm6 docker images: https://gerrit.wikimedia.org/r/453441 https://gerrit.wikimedia.org/r/451690 https://gerrit.wikimedia.org/r/453445
  • 00:10 legoktm: deployed https://gerrit.wikimedia.org/r/460775

2018-09-16

2018-09-14

  • 17:47 marxarelli: adding instance-type-* labels to m4executor nodes in jenkins
  • 14:27 hashar: deployment-deploy01 /srv/mediawiki synced
  • 14:16 hashar: deployment-deploy01: sudo rm -fR /srv/mediawiki && mkdir /srv/mediawiki && chown mwdeploy:mwdeploy /srv/mediawiki && scap pull
  • 13:57 hashar: deployment-deploy01: mv /srv/mediawiki/php-master/cache/l10n/*.php /srv/mediawiki/php-master/cache/l10n/backup-php/
  • 13:55 hashar: deployment-deploy01 13:53:58 rsync: write failed on "/srv/mediawiki/php-master/cache/l10n/upstream/l10n_cache-ab.cdb.json": No space left on device (28)
  • 13:48 hashar: Tagged Quibble 0.0.25 01663f5
  • 11:49 hashar: Cleaned /tmp on deployment-deploy01
  • 11:31 hashar: Manually running https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/28313/ | T204340
  • 05:50 legoktm: deployed https://gerrit.wikimedia.org/r/460475
  • 05:40 legoktm: deployed https://gerrit.wikimedia.org/r/460473

2018-09-13

2018-09-12

  • 21:17 marxarelli: adding new jenkins node integration-slave-docker-1034 with 4 executors
  • 21:01 marxarelli: launching integration-slave-docker-1034 bigram instance
  • 20:59 marxarelli: deleting integration-slave-docker-1003/-1004 instances
  • 20:57 marxarelli: taking integration-slave-docker-1003/-1004 offline for replacement
  • 20:51 marxarelli: deleting integration-slave-docker-1002 instance
  • 20:50 marxarelli: taking integration-slave-docker-1002 offline for replacement
  • 20:16 marxarelli: adding newly provisioned integration-slave-docker-1033 jenkins node with 4 executors
  • 20:01 marxarelli: launching new integration-slave-docker-1033 bigram instance
  • 19:58 marxarelli: replacing integration-slave-docker-1032 offline 85/15% split for docker/workspace left too little space for workspace. puppet change has been updated to use 70/30% volume space ratio
  • 18:44 marxarelli: adding jenkins node integration-slave-docker-1032 with 4 executors
  • 18:25 marxarelli: launching new bigram instance integration-slave-docker-1032
  • 18:21 marxarelli: deleting integration-slave-docker-1027 instance
  • 18:20 marxarelli: deleting jenkins node integration-slave-docker-1027 due to insufficient /var/lib/docker space (replaced with 1031 which has dedicated /var/lib/docker volume)
  • 18:18 marxarelli: added new jenkins node integration-slave-docker-1031 with 4 executors
  • 18:00 marxarelli: launching new xlarge instance integration-slave-docker-1031
  • 17:59 marxarelli: deleting integration-slave-docker-1001 instance
  • 17:58 marxarelli: deleting now idle node integration-slave-docker-1001
  • 17:52 marxarelli: removing integration-slave-docker-1001 jenkins node for replacement with a xlarge instance
  • 17:40 marxarelli: adding Jenkins node integration-slave-docker-1030
  • 17:36 twentyafterfour: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/#/c/integration/config/+/460067/
  • 15:49 marxarelli: provisioning new xlarge integration-slave-docker-1030
  • 15:47 marxarelli: cherry-pick https://gerrit.wikimedia.org/r/c/operations/puppet/+/459875 on integration-puppetmaster01 for testing
  • 09:38 hashar: Updating jobs oojs-ui-docker-publish oojs-ui-npm-run-jenkins-node-6-docker for Chromium 69 and Firefox 60 - T203902
  • 00:14 marxarelli: deleting instance integration-slave-docker-1029

2018-09-11

  • 22:43 marxarelli: launching m1.xlarge integration-slave-docker-1029 using stretch image
  • 22:40 marxarelli: deleting integration-slave-docker-1028 in favor of trying a stretch instance
  • 22:05 marxarelli: launching replacement instance integration-slave-docker-1028
  • 21:49 marxarelli: removing unresponsive jenkins node integration-slave-docker-1025
  • 21:39 marxarelli: cherry-pick https://gerrit.wikimedia.org/r/c/operations/puppet/+/459850 on integration-puppetmaster01
  • 09:38 hashar: sudo cumin --force 'name:docker' 'rm -fR /srv/jenkins-workspace/workspace/selenium-daily-beta-Wikibase*' # T188742

2018-09-10

2018-09-09

2018-09-08

2018-09-07

  • 20:01 marxarelli: bringing integration-slave-docker-1006 online again since disk space has been reclaimed
  • 19:15 Krinkle: marked integration-slave-docker-1025 offline (no space), aborted builds manualy
  • 18:30 legoktm: started gear_client on contint1001
  • 18:19 marxarelli: setting integration-slave-docker-1026 executors to 4 to avoid disk space exhaustion due to concurrent builds
  • 18:12 Krinkle: Marking integration-slave-docker-1026 offline (ENOSPC)
  • 09:30 hashar: integration-slave-docker-1025 lower number of executors from 5 to 4. 8 CPUS can not sustain 5 concurrent Quibble builds | T201972

2018-09-06

  • 18:54 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/458553
  • 18:48 legoktm: reverted tmpfs change for search-mjolnir-tox-docker
  • 17:37 thcipriani: integration-slave-docker-1026 sudo docker kill eae9ba3a1459 -> stopped a container that had been running for 5 hours
  • 16:33 thcipriani: mark integration-slave-docker-1026 back online after diskspace recovery
  • 12:08 hashar: cleaned integration-slave-docker-1012 and integration-slave-docker-1026

2018-09-05

  • 18:31 thcipriani: bring integration-slave-docker-1026 back online since disk space is normal again
  • 07:47 elukey: tested and removed a patch to operations/puppet on puppetmaster03. Solved a git rebase conflict between two changes (hope I did it well) and updated the nginx submodule
  • 04:08 legoktm: deployed https://gerrit.wikimedia.org/r/c/integration/config/+/457070 (tmpfs for /tmp) to all *tox*docker and *composer*docker jobs
  • 03:35 legoktm: deploying https://gerrit.wikimedia.org/r/458095

2018-09-04

2018-09-03

2018-09-02

  • 22:25 Hauskatze: maurelio@deployment-sca01:~$ sudo puppet agent -tv | attempting to fix "<shinken-wm> PROBLEM - Puppet errors on deployment-sca01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]"
  • 13:26 Hauskatze: maurelio@deployment-deploy01:/srv/mediawiki-staging/php-master/cache$ sudo -u jenkins-deploy chmod -R 777 l10n/ | more permissions fixes for beta-scap-eqiad
  • 12:48 Krenair: deployment-deploy01: `sudo rm /srv/mediawiki/.git/gc.log` to clear error about permissions problems accessing nonexistent file.
  • 12:28 Hauskatze: root@deployment-deploy01:/srv/mediawiki-staging# chown -R jenkins-deploy:wikidev portals scap | fixing beta-scap-update-equiad failures

2018-09-01

2018-08-31

2018-08-30

2018-08-29

2018-08-28

2018-08-27

2018-08-25

2018-08-24

  • 19:47 Krenair: set profile::elasticsearch::cirrus::tls_port: 9243 to appease puppet on deployment-elastic* hosts following https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/447568/
  • 12:01 addshore: manaully install nmon on integration-slave-docker-1025 to inspect IO
  • 11:55 addshore: manaully install iotop on integration-slave-docker-1025 to inspect IO

2018-08-23

2018-08-22

2018-08-21

  • 18:39 mateusbs17: deployment-maps04 kartotherian/deploy@2047778 "Updating snapshot package on kartotherian"
  • 10:37 Amir1: ladsgroup@deployment-deploy01:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki=wikidatawiki --rebuild-all-terms (T202260)
  • 08:09 hashar: castor: removing cache for mediawiki-extensions-* jobs (no more used) rm -fR /srv/jenkins-workspace/caches/*/*/mediawiki-extensions* | T202341
  • 07:46 legoktm: deployed https://gerrit.wikimedia.org/r/454201
  • 04:20 legoktm: deployed https://gerrit.wikimedia.org/r/454192
  • 04:05 legoktm: deploying https://gerrit.wikimedia.org/r/453975
  • 01:13 legoktm: triggering seccheck jobs manually on contint1001

2018-08-20

2018-08-18

  • 00:45 mdholloway: moved maps-beta.wmflabs.org proxy from deployment-maps03 to deployment-maps04

2018-08-17

2018-08-16

  • 14:37 mdholloway: deployed [mobileapps/deploy@166eafa]: Update mobileapps to a808c9d (T201979)

2018-08-15

  • 21:21 Krinkle: krinkle@deployment-deploy01: Removing php-master/StartProfiler.php for T201782.
  • 16:24 Reedy: beta-update-databases-eqiad broken due to ooui patch not being merged by jerkins

2018-08-14

  • 20:23 Krenair: deactivated and cleaned node for deployment-elastic09 which vanished mysteriously
  • 17:10 Reedy: Updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/452714
  • 16:58 mateusbs17: deployment-maps04 kartotherian/deploy@a0f7111 Update node mapnik to 3.7.2 - Fixing submodule deploy
  • 16:58 mateusbs17: deployment-maps04 tilerator/deploy@f4c4359 Update node mapnik to 3.7.2 - Fixing submodule deploy
  • 16:00 Reedy: Updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/#/c/integration/config/+/452634/
  • 14:33 mateusbs17: deployment-maps04 kartotherian/deploy@fed001c Update node mapnik to 3.7.2
  • 14:33 mateusbs17: deployment-maps04 tilerator/deploy@31e8585 Update node mapnik to 3.7.2

2018-08-13

2018-08-12

2018-08-10

  • 17:58 marxarelli: Reloading zuul to deploy I32007b
  • 16:47 marxarelli: Adding "Blubber" label to all integration-slave-docker-* nodes in jenkins

2018-08-09

  • 22:19 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/451798
  • 20:09 Krenair: deleted deployment-urldownloader (the oldest host still running, on trusty, from feb 2015) and replaced with deployment-urldownloader02
  • 17:13 bearND: (beta): Update mobileapps to 616ffef
  • 01:42 awight: T201518: ORES, fawiki wp10, misc updates

2018-08-08

2018-08-07

2018-08-06

2018-08-04

2018-08-02

  • 20:59 legoktm: deployed https://gerrit.wikimedia.org/r/450149
  • 20:12 legoktm: deployed https://gerrit.wikimedia.org/r/450085
  • 19:01 legoktm: removed all docker images from integration-slave-docker-1004 to free up root partition, cleaned up all workspaces while I was at it (T201077)
  • 16:27 gehel: re-imaging depoyment-elastic* to stretch completed
  • 14:07 gehel: re-imaging depoyment-elastic* to stretch
  • 07:47 legoktm: triggering jobs directly on contint1001 w/ gear_client.py

2018-08-01

2018-07-31

  • 22:04 thcipriani: moving active deployment-prep deployment server to deployment-deploy01
  • 19:50 marxarelli: Configuring Jenkins to include integration/pipelinelib as an available pipeline library
  • 17:50 Krinkle: Apply role::webperf::profiling_tools to deployment-webperf12; T195312 / T180761
  • 10:15 hashar: gerrit: deleting branch wmf/es6 on mediawiki/vendor . We use 'es6' branch instead. (made for dcausse )
  • 06:05 legoktm: deployed https://gerrit.wikimedia.org/r/449402
  • 05:56 legoktm: deploying https://gerrit.wikimedia.org/r/449399

2018-07-30

2018-07-29

2018-07-28

2018-07-27

2018-07-26

2018-07-24

  • 05:22 kart_: Updated cxserver to d3c9d15

2018-07-23

  • 20:06 mdholloway: deployment-mcs01 deployed [mobileapps/deploy@565b41a]: Update mobileapps to 254cef5
  • 19:16 mdholloway: deployment-maps04 deployed dependency updates for agreement on mapnik v3.5.14
  • 16:25 Krinkle: Applying role::webperf::profiling_tools class to deployment-webperf13, T195312
  • 16:04 Krinkle: Set up puppet cert stuff on deployment-webperf13 T195312
  • 15:57 Krinkle: Set 'puppetmaster' Hiera for deployment-webperf13 / T195312
  • 15:34 Krinkle: Creating deployment-webperf13 - T195312
  • 15:34 Krinkle: Deleting deployment-webperf12 - T195312
  • 10:26 hashar: Regenerate all debian-glue jobs from JJB. Just to be sure.
  • 10:24 hashar: Regenerate https://integration.wikimedia.org/ci/job/debian-glue/ from JJB. The timeout is forced to 3 when it should rely on the BUILD_TIMEOUT environment variable

2018-07-19

2018-07-17

  • 01:36 Krinkle: Applying role::webperf::profiling_tools class to webperf12 in Beta Cluster - T195312, T180761.

2018-07-16

  • 21:10 awight: ran namespaceDupes.php on beta enwiki
  • 19:56 bearND: (beta): Update mobileapps to bed7b29

2018-07-14

  • 21:20 Krenair: redirected security_audit traffic (see T72181) traffic from deployment-mediawiki06 to deployment-mediawiki-09 to fix puppet on varnish (06 was deleted in T192996)
  • 03:27 Krinkle: Clearing various workspaces on integration-slave-jessie-1001 to fix operations-mw-config-php55lint Jenkins builds - T179963

2018-07-13

  • 17:31 Reedy: Reloading Zuul to deploy stuff

2018-07-12

  • 22:45 Krenair: deployment-maps04 groupadd -g 1000 maps-admins
  • 19:04 mdholloway: deployment-prep: deleting new instance deployment-maps04 (initial puppet run failed) and creating deployment-maps05

2018-07-11

  • 20:07 bearND: (beta): Update mobileapps to b5e152d
  • 19:53 Reedy: next set of db updates for beta might be a bit slow. Expected!
  • 13:35 mdholloway: deployment-prep: launched new instance deployment-maps04 for maps testing on stretch

2018-07-10

2018-07-09

2018-07-08

  • 20:17 Krinkle: Shutting off deployment-apertium02 (T142152)
  • 18:31 Krenair: deleted deployment-mx T184244
  • 16:55 Krenair: deleted deployment-redis01 T179371
  • 16:54 Krenair: deleted deployment-redis02 T179371
  • 16:42 Krenair: deleted deployment-puppetmaster02. root and home files are in archives under deployment-puppetmaster03:/root/

2018-07-07

  • 03:38 thcipriani: deployment-tin:sudo rm -rf /srv/mediawiki/.git

2018-07-06

  • 03:53 kart_: Updated cxserver to bfc9c84

2018-07-05

  • 06:27 kart_: Updated cxserver to f8c71a1

2018-07-04

2018-07-03

  • 22:11 Krenair: rebased stuff on deployment-puppetmaster and ran puppet everywhere through cumin
  • 21:48 Krenair: cleaned up cherry-pick conflict to try to fix puppet
  • 08:44 hashar: Building docker container releng/tox-labs-striker to add libssl-dev | T198076
  • 04:31 Krinkle: Setting up puppetmaster/cerf for deployment-webperf12 (T195312)
  • 03:49 Krinkle: Create deployment-webperf12 as equivalent of webperf1002/webperf2002 in prod (T195312, T194390)
  • 03:41 Krinkle: Shut off deployment-mediawiki06 from Horizon (but not yet deleted) - T192996

2018-07-02

2018-06-30

  • 20:40 Krenair: ran git gc on deployment-tin:/srv/mediawiki to free up space

2018-06-28

  • 21:17 hasharAway: castor02: nuking cache of npm/node jobs via rm -fR /srv/jenkins-workspace/caches/*/*/*node* (note: other jobs might still have a npm cache) | T198348
  • 21:09 hasharAway: castor: nuking caches castor-mw-ext-and-skins/master/wmf-quibble-vendor-mysql-hhvm-docker/npm and castor-mw-ext-and-skins/master/wmf-quibble-vendor-mysql-php70-docker/npm | T198348
  • 21:07 hasharAway: castor: nuking caches mediawiki-core/master/mediawiki-quibble-vendor-mysql-php70-docker and mediawiki-core/master/mediawiki-quibble-vendor-mysql-hhvm-docker | T198348
  • 11:18 Hauskatze: Ran namespaceDupes for eswiki and eswikibooks following namespace changes on JADE.

2018-06-27

  • 16:27 hashar: Building Docker containers releng/quibble-jessie-php55:0.0.19-1 and releng/quibble-stretch:0.0.19-1 | T196346 T198336
  • 04:45 mdholloway: deployed to beta: [mobileapps/deploy@2207b66]: Update mobileapps to d7221ba

2018-06-26

  • 22:17 Krenair: arming keyholder on deployment-deploy01
  • 20:16 Krenair: done the same on -sca02 and -apertium02
  • 20:09 Krenair: package upgrades on -sca01 to try to fix apertium stuff
  • 15:32 hashar: repooling integration-slave-docker-1013 integration-slave-docker-1014 and integration-slave-docker-1015 (converted to m1.medium instances)
  • 15:11 hashar: Deleting integration-slave-docker-1013 integration-slave-docker-1014 and integration-slave-docker-1015 . Recreating them as m1.medium instances
  • 15:11 hashar: Changing integration-slave-docker-1012 in jenkins m1executor -> m4executor. It is a m1.medium instance and can thus run mediawiki jobs
  • 13:18 hashar: cleaned containers on integration-slave-docker-1006

2018-06-25

  • 21:19 mdholloway: deployed to beta: [mobileapps/deploy@770cdb0]: Update mobileapps to 8c76d52
  • 17:45 Krenair: fixed puppet repo rebasing
  • 16:20 hashar: deployment-prep: git gc on a few repositories under /srv/mediawiki-staging/php-master
  • 12:19 kart_: "Beta: Update cxserver to ece5e7a"
  • 07:49 hashar: github: deleting archived repository https://github.com/wikimedia/mediawiki-extensions-CommunityVoice | T196618

2018-06-23

  • 19:19 Krenair: restarted hhvm on -mediawiki-07 then apache2 to bring beta back up

2018-06-22

  • 21:37 hashar: deployment-prep: sudo cumin --force '*' 'apt-get clean' # some instances are causing disk warnings
  • 14:46 zeljkof: Reloading Zuul to deploy 94c38c1
  • 12:34 hashar: Building docker images for Quibble 0.0.19 | T197687
  • 12:10 hashar: Tagging quibble 0.0.19 which pass "--autoplay-policy=no-user-gesture-required" to Chromium | T197687

2018-06-21

2018-06-20

  • 19:47 hasharAway: Removing old Quibble images from the CI Docker slaves
  • 17:33 hashar: Refreshing Quibble Jenkins jobs to use 0.0.18-2 docker image. That adds JSDuck | T197806
  • 16:32 hashar: Rebuilding Quibble images to get jsduck included in the image | T197806

2018-06-19

  • 15:01 hashar: CI docker slaves: docker container prune --force ; docker image prune --force
  • 05:12 Amir1: ladsgroup@deployment-tin:~$ foreachwikiindblist all-labs deleteAutoPatrolLogs.php

2018-06-18

  • 17:25 Amir1: ladsgroup@deployment-tin:~$ mwscript populateChangeTagDef.php --wiki=enwiki
  • 16:15 Amir1: ladsgroup@deployment-tin:~$ mwscript populateChangeTagDef.php --wiki=enwiki
  • 16:07 Amir1: change_tag_def is back now
  • 16:05 Amir1: making enwiki in beta clsuter readonly

2018-06-15

  • 12:19 zeljkof: Reloading Zuul to deploy 529965b
  • 08:30 hashar: github: deleted lot of archived/read-only extensions and skins | T180864
  • 07:54 hashar: apt-get upgrade on integration-slave-jessie-android
  • 07:54 hashar: Fixing puppet on integration-slave-jessie-android . Has been stall for a month due to jenkins-debian-glue-buildenv.deb which could not be upgraded
  • 07:51 hashar: cleaning disk space on integration-slave-jessie-1001

2018-06-14

  • 21:47 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/#/c/integration/config/+/439572/
  • 14:26 zeljkof: Reloading Zuul to deploy 04b1c86
  • 08:45 Hauskatze: maurelio@deployment-tin:~$ mwscript namespaceDupes.php --wiki=eswiki --fix (64 broken pagelinks resolved, no remaining conflicts)
  • 07:40 hashar: Cleaned up bunch of Docker containers and images from the CI slave-docker-* instances
  • 07:40 hashar: Armed keyholder on integration-cumin using passphrase from integration-puppetmaster01| T197207

2018-06-13

2018-06-12

2018-06-11

2018-06-10

  • 17:03 Krenair: also fixed scap install on -zotero01
  • 04:09 Krenair: added SPF and DMARC records to beta.wmflabs.org
  • 00:42 Krenair: cumin 'P{R:Package = scap} and P{F:lsbdistcodename = jessie}' 'apt-get install scap=3.8.2-1+0~20180607230422.353~1.gbp2bb4cc -y --force-yes'

2018-06-09

  • 21:00 Krenair: Temporarily substituting certificates on deployment-cache-text04 for certs generated from T182927 to test
  • 02:20 Krenair: stopping deployment-puppetmaster02 again, looks like it was automatically booted by novaadmin after security patches a couple days ago
  • 02:17 Krenair: shut down old deployment-dumps-puppetmaster instance (replaced with a newer stretch instance), emailed ariel
  • 02:13 Krenair: shut down old deployment-redis01 and deployment-redis02 instances T179371

2018-06-08

2018-06-07

  • 23:13 Krenair: armed keyholder on deployment-cumin after reboot, found passphrase at deployment-puppetmaster03:/var/lib/git/labs/private/files/ssh/tin/cumin_rsa.passphrase
  • 18:51 Krenair: deployment-deploy-01 rejecting all connections, rebooting
  • 17:24 Reedy: unstuck jenkins deployments to beta
  • 14:27 _joe_: force-run puppet on deployment-prep appservers, restarted nutcracker after reconfigruation
  • 14:16 _joe_: restarted apache2 in deployment-puppetmaster02 in deployment-prep, which is the correct way to run the puppetmaster there
  • 14:16 _joe_: killed "puppet master" process in depoloyment-prep
  • 14:03 hashar: github: deleting https://github.com/wikimedia/mediawiki-extensions-WikivoteMapsYandex | T193844
  • 11:22 hashar: Switching quibble jenkins jobs to 0.0.18 images
  • 10:41 hashar: Building Docker images for Quibble 0.0.18
  • 09:15 addshore: added wmde-leszek to deployment-prep

2018-06-06

2018-06-05

  • 19:59 mdholloway: deployed to BC: [mobileapps/deploy@7ecc3b6]: Update mobileapps to 66727b7
  • 12:07 kart_: Update cxserver to 7fb7671

2018-06-04

  • 20:37 mdholloway: Deployed to BC: [mobileapps/deploy@276ea43]: Update mobileapps to f579f0d
  • 19:37 Reedy: running foreachwiki maintenance/deduplicateArchiveRevId.php --force T196401
  • 15:44 awight: ORES: Fix T194322

2018-06-02

  • 21:20 legoktm: legoktm@integration-slave-docker-1003:~$ sudo docker rmi $(sudo docker images -q)
  • 19:57 greg-g: gjg@integration-slave-docker-1003:/srv/jenkins-workspace/workspace$ sudo rm -rf *
  • 19:31 Krenair: restarted parsoid on deployment-parsoid09 to try to fix stuff
  • 18:07 Krinkle: Beta Cluster's RESTBase or Parsoid is broken. Saving VE times out, logstash-beta contain restbase: "internal_http_error" / "Error: ESOCKETTIMEDOUT"
  • 01:16 legoktm: running docker-pkg in a screen because my connection is super flaky

2018-06-01

  • 22:17 Reedy: https://gerrit.wikimedia.org/r/#/c/436902/ finished deploying
  • 21:37 Krinkle: Re-create performance-beta.wmflabs.org webproxy (wired to webperf01) - T195314
  • 21:29 Krinkle: Re-creating webperf01 in deploymet-prep, T195314
  • 20:57 legoktm: deploying docker-pkg with https://gerrit.wikimedia.org/r/436859 for reals this time (again)
  • 20:51 legoktm: deleting old versions of docker images
  • 20:48 hashar: contint1001: deleting some old wikimedia/mediawiki-services-mathoid docker images
  • 20:40 mutante: contint1001 - mkdir /srv/zuul-debug-logs ; mv debug.log.2018-05-* from /var/log/zuul/ over there to free up disk space on / VG
  • 20:26 mutante: contint1001 - apt-get clean got a little bit more disk space
  • 20:07 legoktm: really Updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/436852
  • 20:02 Reedy: Updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/436852
  • 13:52 hashar: Tagged Quibble 0.0.17, rebuilding Docker images and bumping jenkins jobs
  • 08:45 hashar: Bumping Quibble jobs to 0.0.16
  • 02:39 legoktm: deployed https://gerrit.wikimedia.org/r/436716

2018-05-31

2018-05-30

  • 21:07 legoktm: deployed https://gerrit.wikimedia.org/r/436404, reverted quibble upgrade
  • 20:37 legoktm: deployed https://gerrit.wikimedia.org/r/436351
  • 20:28 hasharAway: contint1001: triggered a few quibble runs from contint1001. Running in a screen
  • 20:27 hasharAway: contint1001: (for ext in BlueSpiceAbout BlueSpiceArticleInfo BlueSpiceAuthors BlueSpiceAvatars BlueSpiceBlog BlueSpiceCategoryManager BlueSpiceChecklist BlueSpiceConfigManager BlueSpiceContextMenu BlueSpiceCountThings BlueSpiceEditNotifyConnector BlueSpiceEmoticons BlueSpiceExtendedFilelist BlueSpiceExtendedSearch BlueSpiceExtendedStatistics BlueSpiceExtensions BlueSpiceFoundation BlueSpiceGroupManager BlueSpiceHideTitle
  • 18:32 mutante: created instance deployment-deploy-01 with stretch and flavor x-large (T192561)

2018-05-29

2018-05-28

  • 11:09 hashar: Building Quibble Docker images 0.0.13 | T195634
  • 11:01 hashar: tagging quibble 0.0.13 hhvm server should set .svg Content-Type | T195634
  • 04:47 legoktm: running modified version of hashar's gear_client.py on contint1001, feel free to kill if it causes problems

2018-05-27

2018-05-26

  • 23:09 Krinkle: Killed a bunch of stuck beta-mediawiki-config-update-eqiad jobs in Jenkins
  • 23:06 Krinkle: beta-mediawiki-config-update-eqiad jobs have been stuck in Zuul for 17 hours

2018-05-25

2018-05-24

2018-05-23

2018-05-22

  • 20:42 eddiegp: eddie@eddie-thinkpad:~$ for host in deployment-elastic07 deployment-ircd deployment-logstash2 deployment-mathoid deployment-ms-fe02 deployment-prometheus01 deployment-tin; do ssh $host.eqiad.wmflabs sudo puppet agent -tv; done (for T194926, finished without errors on any of the hosts)
  • 20:38 eddiegp: eddie@eddie-thinkpad:~$ for host in deployment-elastic07 deployment-ircd deployment-logstash2 deployment-mathoid deployment-ms-fe02 deployment-prometheus01 deployment-tin; do ssh $host.eqiad.wmflabs sudo apt-get -q -y --force-yes -o DPkg::Options::=--force-confold install ldap-utils; done T194926
  • 20:15 eddiegp: eddie@deployment-db03:~$ sudo apt-get -q -y --force-yes -o DPkg::Options::=--force-confold install ldap-utils
  • 16:53 Krinkle: Created deployment-webperf01 instance (m1.small) - ref T195312

2018-05-21

2018-05-20

  • 16:55 addshore: reload zuul for (Merged) jenkins-bot: Add quibble for Wikibase experimental [integration/config] - https://gerrit.wikimedia.org/r/434198 (owner: Addshore)
  • 09:18 greg-g: gjg@integration-slave-jessie-1001:/srv/jenkins-workspace/workspace$ sudo rm -rf *

2018-05-19

2018-05-18

2018-05-17

  • 07:18 legoktm: deploying mediawiki-phan-seccheck:0.2.1 image

2018-05-16

  • 17:51 twentyafterfour: deployed mobileapps-periodic-test to jenkins with jenkins-job-builder. refs T177896
  • 07:09 thcipriani: removed shadow mwdeploy users on deployment-mediawiki-07
  • 02:32 legoktm: deployed https://gerrit.wikimedia.org/r/433308

2018-05-15

2018-05-14

  • 19:41 bearND: BC: Update mobileapps to 39c16e4
  • 17:16 legoktm: deployed https://gerrit.wikimedia.org/r/432764
  • 03:40 legoktm: Building image docker-registry.discovery.wmnet/releng/wikimedia-audit-resources:0.1.1
  • 03:17 legoktm: Building image docker-registry.discovery.wmnet/releng/wikimedia-audit-resources:0.1.0

2018-05-11

  • 00:08 Krinkle: Add Console Section patterns to Jenkins Config for "Selenium"
  • 00:08 Krinkle: Update Console Sections in Jenkins Config to collapse setup-* sections by default

2018-05-10

2018-05-09

  • 22:53 awight: ORES: wheels fixups
  • 21:31 awight: Bump ORES wheels
  • 21:21 twentyafterfour: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/#/c/432150/
  • 21:04 awight: ORES: drafttopic in beta
  • 16:17 thcipriani: deployment-tin:apt-get clean && rm -rf /srv/mediawiki/.git to free space

2018-05-08

  • 19:49 legoktm: aborted mwext-phpunit-coverage-patch #2819
  • 12:19 moritzm: upgrading app servers in beta to wikidiff 1.6.0 (T190717)

2018-05-07

2018-05-06

2018-05-05

2018-05-04

2018-05-03

2018-05-02

2018-05-01

  • 16:51 Hauskatze: maurelio@deployment-tin:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=deploymentwiki --logwiki=deploymentwiki 'Spambot~BobFeliciano' 'BobFeliciano'

2018-04-30

2018-04-27

  • 22:07 hashar: Running quibble-vendor-mysql-php70-docker against ~ 900 MediaWiki extensions. Triggered with a custom gear-client.py script from contint1001. PID 29710
  • 21:31 hashar: Running quibble-vendor-mysql-php70-docker against 43 mediawiki skins. Triggered with a custom gear_client.py script from contint1001 . PID 13134

2018-04-26

  • 20:19 hashar: bumped quibble Jenkins jobs to 0.0.11
  • 19:00 hashar: Building releng/quibble 0.0.11 docker images
  • 17:41 mobrovac: stopped cpjobqueue and purging logs
  • 16:51 thcipriani: mark integration-slave-jessie-1001 online
  • 16:51 thcipriani: thcipriani@integration-slave-jessie-1001:/srv/jenkins-workspace/workspace$ sudo rm -rf *
  • 16:49 greg-g: tyler marked integration-slave-jessie-1001 as offline

2018-04-25

  • 21:27 hasharAway: building releng/quibble docker images @0.0.10
  • 21:13 hasharAway: tagged quibble 0.0.10 a4e88ae
  • 20:13 bearND: deployed mobileapps to BC: Config: Start up to 4 workers in parallel during start-up
  • 18:55 awight: ORES: Revscoring 2.2.2
  • 10:08 _joe_: deleted deployment-mediawiki0[45] and deployment-jobrunner02
  • 00:14 legoktm: deployed https://gerrit.wikimedia.org/r/415769

2018-04-24

2018-04-23

  • 23:27 ejegg: cleared out /srv/jenkins-workspace/workspace on jenkins-1003
  • 21:20 thcipriani: cleaned up 1.5G of space on deployment-tin:/tmp hopefully fixes beta-scap-eqiad
  • 20:29 mdholloway: deployed to BC: [mobileapps/deploy@5650605]: Update mobileapps to b011b2a
  • 19:09 ottomata: replacing deployment-kafka0[45] with deployment-kafka-main-[12]
  • 07:27 hashar: Upgrading Blue Ocean on CI Jenkins 1.4.2 -> 1.5.0

2018-04-20

  • 21:32 eddiegp: deployment-kafka04: full disk, deleted /var/log/syslog.1 (3G) to make interactive sessions work again
  • 17:30 thcipriani: rebase integration-puppetmaster01:/var/lib/puppet/git conflicting on https://gerrit.wikimedia.org/r/#/c/348236/
  • 08:38 hashar: Cleaned integration-slave-docker-1005 disk (deleting workspace and all docker images)
  • 08:27 hashar: Created integration-slave-docker-1016 and integration-slave-docker-1017 (2G RAM / 2 executors )
  • 08:26 hashar: replaced integration-slave-docker-1010 and integration-slave-docker-1011 with bigger instances (4GB RAM)
  • 07:03 hashar: Replaced integration-slave-docker-1001 to get 4GB of RAM
  • 00:46 awight: roll back ORES beta to master
  • 00:08 awight: Push ORES git-lfs to look at stuff

2018-04-19

  • 21:23 hashar: Pooling in the new integration-slave-docker-1002 and integration-slave-docker-1003
  • 21:08 hashar: rebuilding integration-slave-docker-1002 and integration-slave-docker-1003 ci1.medium > m1.medium (+2G RAM)
  • 21:01 hashar: Bringing back quibble (0.0.9), this time with COMPOSER_PROCESS_TIMEOUT=600 | T192576
  • 19:22 hashar: integration-slave-docker-1004 : wiping all jenkins workspace and all docker images
  • 18:00 thcipriani: cleared a bunch of old docker images from integration-slave-docker-1004, freed 4.4GB of space
  • 17:43 thcipriani: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/#/c/427717/
  • 10:34 Hauskatze: fixStuckGlobalRename.php ran to unblock rename jobs never arriving to jobqueue
  • 09:41 hashar: building releng/quibble-stretch:0.0.8-2
  • 08:19 eddiegp: eddie@deployment-tin:~$ for wiki in deploymentwiki enwiki enwikinews loginwiki metawiki simplewiki; do mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=$wiki --logwiki=loginwiki 'Samtar' "There'sNoTime"; done T192476

2018-04-18

2018-04-17

  • 23:40 Krinkle: Rebuilding Nodepool snapshot for https://gerrit.wikimedia.org/r/415312
  • 21:10 hasharAway: Running https://integration.wikimedia.org/ci/job/quibble-integration/6/ overnight
  • 20:28 hasharAway: building releng/quibble.*:0.0.8 images
  • 18:16 Krinkle: Deleted performance-webpagetest-wmf job (removed from config.git)
  • 14:32 hashar: Added integration-slave-docker-1008 and integration-slave-docker-1009 (m4executor label)
  • 13:33 moritzm: removed role::mediawiki::imagescaler from deployment-mediawiki05, per watroles the only use of that role in WMCS
  • 13:32 moritzm: removed role::mediawiki::imagescaler from deployment-prep, per watroles the only use of that role in WMCS
  • 02:41 Krinkle: Fix Jenkins config for console section "Composer" end pattern (rm mandatory .+ match from end)

2018-04-16

  • 23:13 legoktm: removing obsolete coverage report legoktm@contint1001:/srv/org/wikimedia/doc/cover$ sudo -u jenkins-slave rm -rf mediawiki-core-php7
  • 22:32 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/427027
  • 22:25 legoktm: kick stuch beta-update jenkins jobs
  • 20:35 mdholloway: restarted restbase and mobileapps services for testing (T192287)
  • 18:58 awight: Update ORES editquality; T185903
  • 13:13 _joe_: also shutting down deployment-jobrunner02 T192071
  • 13:12 _joe_: manually fixing ssh hostkey for mediawiki-jobrunner03 in scap on deployment-prep
  • 12:48 _joe_: turning off deployment-mediawiki{04,05} T192071, will be deleted by EOW
  • 02:51 legoktm: deployed https://gerrit.wikimedia.org/r/426837

2018-04-13

  • 17:23 marxarelli: Enabling Pipeline Utility Steps plugin in jenkins for changes to service pipeline in https://gerrit.wikimedia.org/r/#/c/425936/
  • 16:55 eddiegp: cherry-pick https://gerrit.wikimedia.org/r/#/c/426104/ (test 3)
  • 10:02 eddiegp: root@deployment-tin:/mnt/home/jenkins-deploy# sudo -u jenkins-deploy -- sh -c 'ssh-keyscan "deployment-mediawiki-09.deployment-prep.eqiad.wmflabs" >> .ssh/known_hosts' to fix beta-scap-eqiad
  • 08:48 eddiegp: debugging T189493 on beta
  • 08:20 _joe_: creating deployment-mediawiki-09 with stretch, eliminating -08 which was left in an unusable state T192071
  • 07:22 legoktm: restarting jenkins
  • 07:21 legoktm: uploading xunit 1.103-wmf.1 to jenkins
  • 07:01 hashar: rebuilding quibble containers to use 0.0.7
  • 00:53 awight: ORES: Test large file in LFS

2018-04-12

  • 23:01 awight: Try gerrit-based submodules for ORES, T180627
  • 17:25 thcipriani: running scap pull on deployment-mediawiki07 to catch up from missed beta-scap-eqiad deploys
  • 17:23 thcipriani: add ssh host key for deployment-mediawiki07 to /mnt/home/jenkins-deploy/.ssh/known_hosts so that beta-scap-eqiad will work again
  • 15:57 thcipriani: integration-slave-docker-1003 had to reinstall python-pbr to fix puppet complaining about tzdata update
  • 14:24 _joe_: installing deployment-mediawiki{08,09} for the beta upgrade to stretch of deployment-prep (T192071)
  • 14:24 _joe_: installing deployment-mediawiki{08,09} for the beta upgrade to stretch
  • 12:51 hashar: building releng/quibble-jessie and releng/quibble-jessie-php55
  • 11:52 _joe_: creating deployment-mediawiki-07, first stretch appserver T192071
  • 00:14 twentyafterfour: preparing to deploy phabricator rPHDEP/release/2018-04-12/1 https://phabricator.wikimedia.org/project/view/3335/

2018-04-11

2018-04-10

  • 13:41 moritzm: upgraded HHVM on mediawiki-deployment04/05/06 to a build with a patch for the MEMC_VAL_COMPRESSION_ZLIB flag in the memcached module (T184854)
  • 12:38 hashar: gerrit: created repo operations/debs/tidy-0.99 , a for of tidy Jessie package | T191771

2018-04-09

  • 19:50 awight: Redundant virtualenv for ORES
  • 18:06 awight: Restore to ORES master branch
  • 17:17 awight: Test git-lfs in ORES
  • 11:46 Hauskatze: maurelio@deployment-mira:~$ sudo puppet agent -tv to fix T191786 (success: Notice: Applied catalog in 27.11 seconds)

2018-04-06

2018-04-05

  • 21:34 eddiegp: updated deployment-prep cherry-pick of https://gerrit.wikimedia.org/r/c/392221/ to PS38
  • 18:08 eddiegp: ran 'cat portals/urls-to-purge.txt | mwscript purgeList.php' on deployment-tin:/srv/mediawiki-staging to clear portals cache
  • 17:50 eddiegp: Cherry-picking https://gerrit.wikimedia.org/r/c/424361/ on deployment-puppetmaster02 to try unbreak T173887
  • 17:08 bearND: Update mobileapps to dbc0687
  • 17:05 hashar: docker: building releng/quibble:0.0.5
  • 17:00 bearND: Update mobileapps to 2d5ab5b on beta
  • 11:32 addshore: reloaded zuul for https://gerrit.wikimedia.org/r/424271
  • 07:52 moritzm: removed unused/defunct deployment-videoscaler01 from deployment-prep (T191293)
  • 07:52 moritzm: removed unused/defunct deployment-tmh01 from deployment-prep (T191293)
  • 07:47 hashar: deployment-prep: made Muehlenhoff an admin

2018-04-04

  • 22:17 thcipriani: deployment-mediawiki0{4,5} clear apt-cache, restart clear hhvm cache, restart hhvm
  • 21:24 awight: Roll back beta ORES
  • 20:19 mdholloway: deployed to BC: [mobileapps/deploy@0460519]: Update mobileapps to 2d5ab5b
  • 20:12 awight: Try dsh scap config for ORES
  • 20:07 addshore: reload zuul for https://gerrit.wikimedia.org/r/424045 (and some quibble experimental stuff)
  • 16:09 addshore: reloaded zuul for https://gerrit.wikimedia.org/r/423958
  • 15:30 addshore: reload zuul for https://gerrit.wikimedia.org/r/423652
  • 12:17 hashar: added experimental quibble job to mediawiki core / vendor / skins/Vector
  • 11:58 hashar: Building releng/quibble-stretch:0.0.4
  • 11:58 hashar: Building releng/quibble:0.0.4
  • 07:11 hashar: deployment-prep: adding EddieGP as a member

2018-04-03

  • 09:49 hashar: building releng/quibble:0.0.3
  • 00:09 paladox: created new repo "All-Avatars" which will be used to host avatars used by gerrit. Setting owner as Gerrit Managers will allow merging in the repo fro all users soon :)

2018-04-02

  • 20:19 mdholloway: deployed to BC: [mobileapps/deploy@940bd48]: Update mobileapps to 58a0a88

2018-03-31

  • 21:42 Hauskatze: Ran sudo puppet agent --enable and sudo puppet agent -tv on deployment-maps03 to fix puppet staleness

2018-03-30

  • 11:38 dcausse: deployment-prep reindexing with forceSearchIndex all beta wikis (T189694)
  • 09:56 hashar: Nuking /srv/zuul/git/labs/tools/stewardbots on zuul-merger hosts (contint1001 and contint2001). Fetch fails with org.eclipse.jgit.transport.UploadPackInternalServerErrorException | T191077
  • 09:56 hashar: Nuking /srv/zuul/git/labs/tools/stewardbots on zuul-merger hosts (contint1001 and contint2001). Fetch fails with org.eclipse.jgit.transport.UploadPackInternalServerErrorException | TT191077

2018-03-29

2018-03-28

  • 23:42 legoktm: deployed https://gerrit.wikimedia.org/r/422590
  • 23:00 legoktm: deployed https://gerrit.wikimedia.org/r/422566 https://gerrit.wikimedia.org/r/422418
  • 20:45 bearND: Update mobileapps to a5833a0 on BC
  • 19:13 legoktm: killed stuck docker container on 1003 to free up root partition, and then deleted old/all images to free up the rest of the space
  • 19:07 legoktm: legoktm@integration-slave-docker-1003:/srv/jenkins-workspace/workspace$ sudo rm -rf * # full disk
  • 16:04 hasharAway: nodepool: deleting 4 instances that are no more used but that Nodepool failed to detect as no omre used (due to some reboots in the openstack infra)

2018-03-27

  • 15:23 hashar: building docker-registry.wikimedia.org/releng/operations-puppet:0.3.1 | https://gerrit.wikimedia.org/r/#/c/422168/
  • 08:30 kart_: Update cxserver to 9e8ebda
  • 03:10 legoktm: deleted mediawiki-core-code-coverage* workspaces to work around git/gerrit issue and retriggered jobs

2018-03-26

  • 20:43 mdholloway: deployed to BC: [mobileapps/deploy@e223f51]: Update mobileapps to 534f95d
  • 13:55 hashar: restarting CI Jenkins . Upgrades Mail plugin from 1.20 to 1.21 | T190393
  • 03:35 bd808: Marked integration-slave-docker-1003 offline because / is full. Needs cleanup. (cc hashar, thcipriani)

2018-03-23

  • 21:19 greg-g: gjg@deployment-tin:/srv/mediawiki-staging/php-master$ git remote prune origin

2018-03-22

2018-03-21

  • 23:25 RoanKattouw: Created maps security group for port 6533; removed port 6533 from sca security group
  • 23:22 bd808: Raised security group quota from 20 to 40
  • 22:40 hashar: cancel beta-code-update build in Jenkins (deadlock)
  • 22:40 hashar: cancel beta-update-database-eqiad build in Jenkins (deadlock)
  • 20:19 mdholloway: deployed to BC: [mobileapps/deploy@675837f]: Update mobileapps to e6b50a0
  • 18:40 bd808: Re-added ArielGlenn as deployment-prep project admin
  • 18:18 elukey: fixing operations/puppet on deployment-puppetmaster02, git-sync stuck due to a old patch still there
  • 17:52 hashar: deployment-prep: scap sync-file docroot/wwwportal/portal 'T173887'
  • 00:43 Reedy: enabled compiler03.puppet3-diffs.eqiad.wmflabs and disabled compiler02.puppet3-diffs.eqiad.wmflabs in jenkins

2018-03-20

  • 19:04 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/420777
  • 17:27 mdholloway: deployed to BC: [mobileapps/deploy@fad1009]: Update mobileapps to 634a15f
  • 12:25 hashar: deployment-prep: added Muehlenhoff (moritzm) as a member
  • 10:35 elukey: stop eventlogging + mysql on deployment-eventlog05 for maintenance
  • 09:58 hashar: deployment-mira: deleted iegreview/scholarships files scap/checks.yaml (invalid yaml somehow), ran puppet and restore the files
  • 09:38 hashar: deployment-tin: deleted iegreview/scholarships files scap/checks.yaml (invalid yaml somehow), ran puppet and restore the files

2018-03-19

2018-03-18

2018-03-16

  • 12:56 hashar: deployment-tin: ran git remote prune origin / git gc on all /srv/deployment git repositories
  • 12:56 hashar: deployment-tin: setting /srv/deployment files/dir to be group owned by wikidev and group writable
  • 12:50 hashar: deployment-tin: deleting /srv/grafana (no more in Gerrit)
  • 12:46 hashar: deployment-tin: sudo chown mwdeploy:mwdeploy /srv/mediawiki/.git/objects/pack/* # some pack of 6GB belonged to root
  • 11:11 hashar: zuul: reenqueue all coverage jobs lost when restarting Zuul
  • 10:53 hashar: Upgrading zuul to zuul_2.5.1-wmf4 to resolve a mutex deadlock T189859
  • 02:44 thcipriani: beta-scap-eqiad stuck 14 hours, had to do disconnect reconnect dance

2018-03-15

  • 17:15 bearND: Update mobileapps to c5e1522
  • 16:43 Amir1: created deployment-ores01 to use stretch
  • 16:36 Amir1: deleting deployment-ores01 to reimage
  • 15:47 awight: ORES with git-lfs, scap config
  • 09:24 hashar: deployment-prep: added gehel and dcausse has members/admins of the labs project
  • 00:21 awight: ORES with git-lfs

2018-03-14

2018-03-13

2018-03-12

  • 20:13 mdholloway: deployed to BC: [mobileapps/deploy@c764714]: Update mobileapps to 5c90db7
  • 20:09 bd808: Forced puppet run on deployment-logstash2 for {{gerrit|418986}}
  • 20:09 bd808: Removed role::logstash::eventlogging from deployment-logstash2 because the hiera config is failing (undefined method `[]' for nil:NilClass at /etc/puppet/modules/role/manifests/logstash/eventlogging.pp:11)
  • 19:55 bd808: Forced puppet run on deployment-logstash2. Failed due to bad logstash::eventlogging hiera data
  • 19:54 bd808: Cherry picked https://gerrit.wikimedia.org/r/#/c/418986/ to deployment-puppetmaster02
  • 12:33 hashar: Jenkins: installed Blue Ocean plugin. Eg: https://integration.wikimedia.org/ci/blue/ | T155840
  • 12:24 hashar: Jenkins: uninstalled the "cvs" plugin

2018-03-10

  • 07:03 greg-g: gjg@integration-slave-jessie-1004:/srv/jenkins-workspace/workspace$ sudo rm -rf * - T189365

2018-03-09

  • 22:01 legoktm: legoktm@integration-slave-jessie-1001:/srv/jenkins-workspace/workspace$ sudo rm -rf *
  • 20:10 legoktm: deployed https://gerrit.wikimedia.org/r/418024
  • 16:06 hashar: Deploying docker releng/npm-test:0.5.0 https://gerrit.wikimedia.org/r/#/c/417960/
  • 10:58 hashar: Polling 6 new Docker instances to jenkins: integration-slave-docker 1010 to 1015. They are ci.medium (2G RAM / 2vcpu) each with 2 executors and labels DebianJessieDocker, m1executor
  • 10:37 hashar: Update *-maven-java8-docker-site-publish jobs which were not mounting /src into the container | T188686
  • 02:31 legoktm: deployed https://gerrit.wikimedia.org/r/417343

2018-03-08

  • 22:19 hasharDinner: cleaned up /srv on integration-slave-jessie-1001 . Upgrade packages and reboot.
  • 21:59 legoktm: legoktm@integration-slave-jessie-1003:/srv/jenkins-workspace/workspace$ sudo rm -rf * # out of disk space
  • 18:36 bearND: Update mobileapps to afb0167
  • 17:13 hashar: deleting a few nodepool instances that are no more registered in Jenkins
  • 14:12 hashar: deployment-tin: rm -fR /srv/ocg
  • 14:03 hashar: deployment-tin: rm /srv/jenkins/home/jenkins-deploy/workspace/beta-scap-eqiad/central.hhbc # 1.4GBytes
  • 14:02 hashar: deployment-tin is out of disk space on /srv
  • 10:27 hashar: Deploy docker images for /deploy repositories | https://gerrit.wikimedia.org/r/#/c/417217/

2018-03-07

2018-03-06

  • 20:59 hashar: gerrit: changed scoring/ores/assets parent permission group to scoring/ores
  • 20:59 hashar: gerrit: created scoring/ores/draftquality scoring/ores/drafttopic scoring/ores/articlequality scoring/ores/editquality scoring/ores/deploy all inherit permissions from scoring/ores itself inheriting from scoring/
  • 20:56 hashar: gerrit: created scoring/ parent project with owner being research-ores ( https://gerrit.wikimedia.org/r/#/admin/projects/scoring,access )
  • 20:52 MaxSem: refreshing spoofuser on beta
  • 19:34 Hauskatze: maurelio@deployment-tin:~$ foreachwiki extensions/TorBlock/maintenance/loadExitNodes.php --force
  • 19:31 Hauskatze: maurelio@deployment-tin:~$ foreachwiki extensions/AbuseFilter/maintenance/purgeOldLogIPData.php
  • 18:34 mdholloway: deployed to beta: [mobileapps/deploy@5986ab7]: Update mobileapps to afbe9af
  • 03:43 Krinkle: Jenkins postmerge queue has 'beta-scap-eqiad' and 'beta-update-databases-eqiad' stuck "Waitinf for execute" for over 3h

2018-03-05

  • 16:38 Reedy: deleted the stack traces too
  • 16:37 Reedy: removed pre 2018 hhvm error logs from deployment-mediawiki04
  • 16:37 Reedy: that was from deployment-mediawiki04
  • 16:35 Reedy: removed 2G temp folder from /srv/mediawiki/php-master/cache/l10n/upstream
  • 06:14 legoktm: legoktm@integration-slave-jessie-1001:/srv/jenkins-workspace/workspace$ sudo rm -rf *

2018-03-04

  • 19:00 thcipriani: cleared /tmp, apt-cache deployment-mediawiki04
  • 06:02 Krinkle: Re-create php-master/StartProfiler.php on deployment-tin in Beta Cluster, similar to the one scap auto-creates each week in prod, except to include StartProfiler-labs.php instead.
  • 02:12 Krenair: Regenerated captcha images for T164047

2018-03-02

2018-03-01

  • 20:48 Hauskatze: maurelio@deployment-tin:~$ foreachwiki extensions/AbuseFilter/maintenance/purgeOldLogIPData.php
  • 14:55 elukey: delete deployment-eventlog02 ubuntu instance in favor of the brand new deployment-eventlog05 (stretch)
  • 02:11 legoktm: manually queued jenkins jobs

2018-02-28

  • 16:31 legoktm: manually queued jenkins jobs
  • 10:51 hashar: integration-slave-jessie-android killed stall qemu-system-i386 process
  • 10:42 hashar: build docker-registry.discovery.wmnet/releng/npm-browser-test:0.1.2 and docker-registry.discovery.wmnet/releng/npm-test-oojsui:0.1.1
  • 08:42 legoktm: queued more jenkins jobs (last for tonight)
  • 07:29 legoktm: mass queuing jenkins jobs again

2018-02-27

  • 21:17 hashar: Building docker image releng/npm-test-oojsui:0.1.0 - https://gerrit.wikimedia.org/r/#/c/415102/
  • 10:15 zeljkof: Reloading Zuul to deploy d9ed9d4
  • 09:35 hashar: deployment-mediawiki05: out of disk space. Ran apt-get clean, cleaned old kernels/packages and dropped hhvm bytecode cache
  • 08:44 hashar: deployment-mediawiki06: out of disk space. Ran apt-get clean
  • 07:40 legoktm: deployed https://gerrit.wikimedia.org/r/414957
  • 06:11 legoktm: manually triggering a bunch of jenkins jobs
  • 03:29 legoktm: deployed https://gerrit.wikimedia.org/r/414896
  • 03:02 Krinkle: Deleted beta-* related job builds in Jenkins that were stuck >1hr
  • 03:01 Krinkle: Jenkins slave connection to deployment-tin is broken again. No error. Script console works. Disconnect/Relaunch doesn't resolve. 6 idle executors but jobs are no starting for some reason.

2018-02-26

  • 22:40 Hauskatze: updating list of Tor nodes for TorBlock on Beta Cluster wikis
  • 22:39 Hauskatze: purging old abusefilter IP data from Beta Cluster wikis while we wait for a cron job to do this automatically
  • 18:39 mutante: deployment-cache-text04 - manually creating Letsencrypt SSL cert for fr.wikipedia.beta.wmflabs.org (acme-setup -i "fr_wikipedia_beta_wmflabs_org" -s "fr.wikipedia.beta.wmflabs.org" --key-user root --key-group root), restarted nginx (T188288)

2018-02-24

2018-02-23

  • 16:52 elukey: created deployment-eventlogging05 to test eventlogging on Debian in deployment-prep
  • 11:50 hashar: deployment-mediawiki04 : apt-get clean && apt-get autoremove --purge
  • 11:50 hashar: deployment-mediawiki04 : rm /var/cache/hhvm/*.sq3 and restarting hhvm
  • 11:48 hashar: deployment-mediawiki04 is out of disk space on / causing beta-scap-eqiad to fail
  • 08:19 hashar: gerrit: marked apps/android/java-mwapi.git read-only | T187995

2018-02-22

2018-02-21

  • 22:31 hashar: Building docker image releng/npm-test-3d2png:0.1.0 and reloading Zuul | https://gerrit.wikimedia.org/r/413227
  • 20:51 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/413060
  • 16:15 zeljkof: Reloading Zuul to deploy d90d617
  • 11:26 Hauskatze: Refreshed TOR exit nodes & cleaned old abusefilter log data for all Beta Cluster wikis.
  • 02:39 no_justification: beta: ran initSiteStats.php --update on all wikis

2018-02-20

  • 20:57 Hauskatze: Refreshed tor exit node lists for all Beta Cluster wikis
  • 20:50 legoktm: refreshing nodepool image: nodepool@labnodepool1001:~$ nodepool image-update wmflabs-eqiad snapshot-ci-jessie
  • 20:47 HausAFKatze: mwscript extensions/TorBlock/maintenance/loadExitNodes.php --wiki=deploymentwiki --force
  • 19:41 Hauskatze: maurelio@deployment-tin:~$ foreachwikiindblist all-labs.dblist extensions/AbuseFilter/maintenance/purgeOldLogIPData.php

2018-02-19

  • 23:40 Krinkle: Running `nodepool image-update wmflabs-eqiad snapshot-ci-jessie` to deploy https://gerrit.wikimedia.org/r/412825
  • 18:52 legoktm: deleted all current nodepool instances for ci-jessie
  • 18:36 legoktm: manually refreshing nodepool images (nodepool image-update wmflabs-eqiad snapshot-ci-jessie)
  • 10:03 hashar_: deployment-tin: git gc in /srv/mediawiki-staging/php-master and /srv/mediawiki-staging/php-master/extensions

2018-02-18

  • 20:37 Hauskatze: Ran foreachwikiindblist all-labs.dblist extensions/AbuseFilter/maintenance/purgeOldLogIPData.php on Beta

2018-02-16

2018-02-15

2018-02-14

2018-02-13

2018-02-12

  • 23:06 legoktm: deployed https://gerrit.wikimedia.org/r/410075 https://gerrit.wikimedia.org/r/410076
  • 20:53 mdholloway: deployed mobileapps@f14bdd5 to beta cluster
  • 18:55 Hauskatze: Running maurelio@deployment-tin:~$ foreachwikiindblist all-labs.dblist extensions/AbuseFilter/maintenance/purgeOldIPLogData.php for T186870
  • 18:48 Hauskatze: Restoring first missing log entries on Beta refs. T54919
  • 18:46 Hauskatze: maurelio@deployment-tin:~$ mwscript extensions/AbuseFilter/maintenance/purgeOldLogIPData.php --wiki=arwiki (37 rows purged - T186870)
  • 18:45 Hauskatze: maurelio@deployment-tin:~$ mwscript extensions/AbuseFilter/maintenance/purgeOldLogIPData.php --wiki=aawiki (0 rows purged - T186870)
  • 18:44 Hauskatze: maurelio@deployment-tin:~$ mwscript extensions/AbuseFilter/maintenance/purgeOldLogIPData.php --wiki=aawiki (0 rows purged)
  • 18:43 Hauskatze: Starting to purge old afl_ip data from abuse_filter_log on Beta Cluster - T186870
  • 15:59 hashar: Deploying java8 docker image https://gerrit.wikimedia.org/r/#/c/409881/

2018-02-10

  • 18:58 Hauskatze: maurelio@deployment-tin:~$ mwscript extensions/AbuseFilter/maintenance/purgeOldLogIPData.php --wiki=eswiki (1695 rows purged - T186870)
  • 18:49 Hauskatze: maurelio@deployment-tin:~$ mwscript extensions/AbuseFilter/maintenance/addMissingLoggingEntries.php --wiki=zhwiki (22 rows missing inserted)
  • 18:46 Hauskatze: Ran mwscript extensions/AbuseFilter/maintenance/addMissingLoggingEntries.php --wiki=deploymentwiki (17 rows inserted)

2018-02-09

2018-02-08

  • 13:58 Hauskatze: maurelio@deployment-tin:~$ mwscript initSiteStats.php --wiki=deploymentwiki --update --active --use-master
  • 10:02 hashar: Rebuilding docker-pkg images on contint1001. Would get chromium 64 into npm-browser-test | T179552

2018-02-07

2018-02-06

  • 21:41 hashar: Rebuilding Zuul package to hotfix T186381
  • 21:14 legoktm: restarted zuul due to patch being stuck (T186381)
  • 19:25 hashar: Restarted Zuul due to T186381
  • 18:14 thcipriani: removing /srv/mediawiki/.git on deployment-tin to clear space
  • 02:33 legoktm: deploying https://gerrit.wikimedia.org/r/408480

2018-02-05

  • 22:09 mdholloway: deployed mobileapps@3140b1a to BC
  • 16:09 mdholloway: mobileapps deployment to BC failed with error (T186532)
  • 12:37 Hauskatze: deployment-prep maurelio@deployment-tin:~$ mwscript cleanupSpam.php --wiki=deploymentwiki *.doxawatches.com --delete
  • 11:02 hashar: Upgrading jenkins-debian-glue to 0.18.4-wmf1 | T186494
  • 09:48 hashar: operations/debs/jenkins-debian-glue create branches debian/jessie-wikimedia and patch-queue/debian/jessie-wikimedia based on v0.17.0 | T186494

2018-02-04

2018-02-03

  • 21:05 legoktm: manually deleted /srv/zuul/git/mediawiki/tools/phan on contint1001 so zuul could clone the new repo
  • 21:02 legoktm: deployed https://gerrit.wikimedia.org/r/407991
  • 04:01 legoktm: disable/enabled gearman in jenkins
  • 03:55 legoktm: restarting zuul to drop 407165,3 from the queue
  • 03:48 legoktm: disabled/enabled gearman in jenkins

2018-02-01

  • 16:14 Amir1: deleting deployment-sca03 (T184501)
  • 07:09 legoktm: legoktm@integration-slave-jessie-1001:/srv/jenkins-workspace/workspace$ sudo rm -rf *

2018-01-31

  • 22:01 mdholloway: updated mobileapps to 3d717fa on beta cluster
  • 05:29 legoktm: brought integration-slave-jessie-1003 back online after clearing disk space
  • 05:28 legoktm: legoktm@integration-slave-jessie-1003:/srv/jenkins-workspace/workspace$ sudo rm -rf *

2018-01-30

2018-01-29

  • 23:24 awight: Experiment with versioned ORES venv, T181071

2018-01-24

  • 23:14 Krenair: armed keyholder on deployment-cumin using deployment-puppetmaster02:/var/lib/git/labs/private/files/ssh/tin/cumin_rsa.passphrase - this seems to have fixed cumin

2018-01-23

2018-01-22

  • afk: restarting jenkins

2018-01-20

2018-01-19

  • 17:23 zeljkof: Reloading Zuul to deploy 25cab6f
  • 17:10 zeljkof: Reloading Zuul to deploy 7d6b4ee
  • 15:34 elukey: added deployment-eventlog02.deployment-prep.eqiad.wmflabs to /etc/ssh/ssh_known_hosts on deployment-tin (following https://phabricator.wikimedia.org/T116206#2251441) to unblock "Host key verification failed" for Analytics

2018-01-18

  • 18:29 bearND: (beta): Update mobileapps to 2690899
  • 17:00 ottomata: stashing local changes to deployment-puppetmaster02 in /var/lib/git/operations/puppet (mail/mx.pp and exim/exim4.conf.mx.erb)
  • afk: cleared some space on deployment-mediawiki05 (apt-cache and old logs) so scap had room to work again. Although space is tight.
  • 11:49 legoktm: mediawiki-core-doxygen-publish jobs are stuck
  • 00:21 bd808: Deleted 6-7 nodepool instances in an alive but offline for running jobs state

2018-01-17

  • 16:24 zeljkof: Reloading Zuul to deploy 5f75731
  • 13:11 hashar: nodepool: updating snapshot to get hhvm +wmf4 for T185024  : nodepool image-update wmflabs-eqiad snapshot-ci-jessie

2018-01-16

2018-01-15

  • 21:59 hashar: deployment-mx echo -n > /var/log/mtail/mtail.log
  • 21:59 hashar: deployment-mx rm /var/log/git-sync-upstream.log*
  • 17:25 zeljkof: Reloading Zuul to deploy ff0a02d
  • 17:14 zeljkof: Reloading Zuul to deploy fff6431
  • 13:59 kart_: Ran: "mwscript extensions/Translate/scripts/createMessageIndex.php --wiki=metawiki" for T180841
  • 13:37 kart_: Ran update.php on metawiki betacluster (T180841)
  • 11:23 hashar: Mirroring git://anonscm.debian.org/pkg-php/php-ast.git to operations/debs/pkg-php/php-ast.git | T174338
  • 09:50 hashar: integration/zuul pushed upstream git tags to our repo

2018-01-12

  • 14:11 zeljkof: Reloading Zuul to deploy 3816837

2018-01-11

2018-01-10

2018-01-09

  • 14:53 hashar: Change integration/zuul.git HEAD from 'master' to 'patch-queue/debian/jessie-wikimedia' | T158243 T162191
  • 13:20 addshore: reloaded zuul to deploy https://gerrit.wikimedia.org/r/402862
  • 10:54 hashar: gerrit: created operations/debs/node-tunnel-agent a fork of git://anonscm.debian.org/collab-maint/node-tunnel-agent.git | T183569
  • 08:51 Amir1: ladsgroup@deployment-tin:~$ mwscript extensions/ORES/maintenance/PopulateDatabase.php --wiki=enwiki (T184276)
  • 08:49 Amir1: ladsgroup@deployment-sca03:/srv/deployment$ sudo rm -rf ores (T184282)
  • 08:48 Amir1: stopping ores services in deployment-sca03 (T184282)
  • 08:43 Amir1: changed DNS that ores-beta.wmflabs.org points to deployment-ores01 instead of deployment-sca03
  • 08:42 Amir1: deleted deployment-ores-redis-01 in favor of deployment-ores01 (T184282)

2018-01-08

2018-01-05

  • 21:02 legoktm: legoktm@contint1001:/srv/org/wikimedia/doc/cover$ sudo -u jenkins-slave rm -rf extensions
  • 14:26 halfak: restarted celery-ores-worker on deployment-sca03

2018-01-03

  • 19:55 hashar: manually upgrading puppet to 4.8 on deployment-mx / deployment-redis01 / deployment-redis02 | T184114
  • 19:52 hashar: purging old kernels on deployment-mx / deployment-redis01 / deployment-redis02 | T184114
  • 19:24 hashar: deployment-prep: fix puppet run broken by a duplicate definition due to profile::base::firewall vs base::firewall
  • 19:20 hashar: deployment-tin "upgrade" scap to 3.7.4-3 the version in apt.wm.o
  • 19:09 hashar: apt-get upgrade on deployment-tin . "downgrade" scap from 3.7.4-3 (apt.wm.o) to 3.7.4-1~20180103034049.266 (from CI)
  • 19:07 hashar: deployment-prep: restored all the cherry picks on the puppet master
  • 06:01 legoktm: manually installing php-xdebug on integration-slave-jessie-1004 to make sure this works (temporary)

2018-01-01

  • 23:46 Krenair: ran `mwscript extensions/ORES/maintenance/CheckModelVersions.php --wiki=sqwiki` on deployment-tin for T183862
  • 19:56 Amir1: ladsgroup@deployment-tin:~$ mwscript extensions/ORES/maintenance/CheckModelVersions.php --wiki=nlwiki (T183862)
  • 19:56 Amir1: restarting ores services in deployment-sca03 (T183862)

Archives