Release Engineering/SAL

From Wikitech
Jump to navigation Jump to search

2019-02-19

2019-02-18

2019-02-17

2019-02-16

2019-02-15

  • 17:28 thcipriani: integration-slave-jessie-1002:/srv/jenkins-workspace/workspace$ `sudo rm -rf *` due to full disk

2019-02-14

2019-02-13

  • 21:32 marxarelli: dduvall@integration-slave-jessie-1001:/mnt/home/jenkins-deploy$ `rm -rf .gradle/ .m2/` due to full disk
  • 21:21 marxarelli: bringing integration-slave-docker-1046 and integration-slave-jessie-1001 back online
  • 21:20 marxarelli: dduvall@integration-slave-jessie-1001:/srv/jenkins-workspace/workspace$ `sudo rm -rf *` due to full disk
  • 21:15 marxarelli: removing old docker images on integration-slave-docker-1046
  • 21:10 marxarelli: starting migrated integration-slave-docker-1046 instance
  • 21:01 marxarelli: pooling new jenkins node for integration-slave-docker-1052
  • 20:46 marxarelli: pooling jenkins node for integration-slave-docker-1051
  • 20:45 marxarelli: launching replacement instance integration-slave-docker-1052
  • 20:35 marxarelli: launching replacement instance integration-slave-docker-1051
  • 20:32 marxarelli: pooling jenkins node for integration-slave-docker-1050
  • {{safesubst:SAL entry|1=20:15 marxarelli: integration-slave-docker-{1044,1046,1047} unresponsiveness due to cloudvirt failure. 1046 is being moved already by CS. deleting 1044 and 1047}}
  • {{safesubst:SAL entry|1=19:57 marxarelli: seeing jenkins agent connection failures for integration-slave-docker-{1044,1046,1047}}}
  • 19:48 marxarelli: pooling replacement jenkins node integration-slave-docker-1049
  • 19:34 marxarelli: deleting integration-slave-jessie-android jenkins node and instance
  • 19:33 marxarelli: deleting integration-slave-jessie-1003 jenkins node and instance
  • 19:32 marxarelli: deleting integration-slave-docker-1033 jenkins node and instance
  • 19:25 marxarelli: deleting integration-slave-docker-1017 jenkins node and instance
  • 18:45 Krinkle: integration-slave-jessie-1003 seems to be consitently unable to start jobs, marking as offline manually
  • 18:32 thcipriani: bringing up new integration-castor03, re-enabling castor-save* jobs
  • 18:15 marxarelli: adding new jenkins node integration-slave-docker-1048
  • 18:02 marxarelli: launching new integration-slave-docker-1048 instance
  • 17:59 marxarelli: deleting integration-slave-docker-1038 node and deleting instance
  • 17:50 marxarelli: bringing integration-slave-docker-1033 back online after clearing out old docker images
  • 17:33 thcipriani: rebuilding integration-castor03
  • 17:21 thcipriani: stopping rsync server on castor03
  • 17:21 twentyafterfour: stopped rsync on castor03
  • 17:16 twentyafterfour: disconnected castor03 from jenkins
  • 16:48 thcipriani: reloading zuul to deploy https://gerrit.wikimedia.org/r/#/c/integration/config/+/487880/
  • 14:34 thcipriani: modified castor-save-workspace-cache to exit 0 and run on blubber nodes while integration-castor03 is down
  • 14:26 dcausse: deployement-prep: upgrading to elastic 5.6.14

2019-02-12

2019-02-11

2019-02-10

2019-02-08

  • 20:20 Krinkle: Delete various jobs on Jenkins that no longer exist in JJB config, ref T91410
  • 15:59 addshore: this reload also included "Switch npm-audit job to node10"? T211784, which did touch the zuul file
  • 15:58 addshore: reloaded zuul for https://gerrit.wikimedia.org/r/#/c/integration/config/+/489241/
  • 03:10 Krinkle: Delete various jobs on Jenkins that no longer exist in JJB config
  • 00:28 Krinkle: krinkle@doc1001: sudo -u doc-uploader chmod 775 /srv/docroot/org/wikimedia/doc/
  • 00:12 marxarelli: removed old docker images on contint1001 to free up space

2019-02-07

  • 23:17 thcipriani: integration-slave-jessie-1003:sudo rm -rf /srv/jenkins-workspace/workspace/*
  • 23:15 thcipriani: integration-slave-docker-1033:sudo docker image prune and bring back online
  • 22:28 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/467550 (
  • 19:09 paladox: created integration/zuul/build gerrit repo for T215458
  • 19:05 paladox: created integration/zuul/wheels gerrit repo for T215458
  • 15:48 addshore: brought integration-slave-docker-1043 back online
  • 15:48 addshore: addshore@integration-slave-docker-1043:~$ sudo docker image prune -a --force --filter "until=2191h" // (3 months?) Total reclaimed space: 14.86GB
  • 08:49 hashar: cleaning docker images on integration-slave-docker-1021

2019-02-06

  • 22:34 shdubsh: Deploy node-exporter 0.17 T213708
  • 14:12 godog: shut off deployment-prometheus01 - T215272
  • 14:00 godog: switch beta-prometheus to deployment-prometheus02 - T215272

2019-02-05

  • 20:07 ebernhardson: jobrunner port 9006 is firewalled, revert to 9005 and created T215339 to fix job queue in beta cluste
  • 19:36 ebernhardson: Update profile::cpjobqueue::{jobrunner,videoscaler}_host in horizon hiera from port 9005 to 9006 to match new restrictions in gerrit.wikimedia.org/r/481866
  • 16:29 addshore: T215288 added mirrys to deployment-prep as a user
  • 15:32 addshore: T215278 addshore@integration-slave-docker-1037:~$ sudo docker image prune -a --force --filter "until=2191h" // (3 months?) Total reclaimed space: 16.59GB

2019-02-04

  • 23:13 thcipriani: integration-slave-docker-1040:sudo docker image prune and bring back online
  • 23:12 thcipriani: integration-slave-docker-1038:sudo docker image prune and bring back online
  • 21:48 ebernhardson: restart logstash on deployment-logstash2
  • 15:25 hashar: removed Jenkins user "nodepoolmanager" as well as related authorizations | T209361

2019-02-03

2019-02-02

  • 22:17 legoktm: legoktm@integration-slave-jessie-1004:/srv/jenkins-workspace/workspace$ sudo rm -rf *

2019-01-31

  • 15:03 thcipriani: rearm keyholder on deployment-deploy01
  • 12:05 arturo: VM instances deployment-deploy01,deployment-deploy02,deployment-fluorine02,deployment-kafka-jumbo-2,deployment-kafka-main-1,deployment-maps04,deployment-mcs01,deployment-mediawiki-09,deployment-memc04,deployment-ms-be03,deployment-ms-fe02,deployment-parsoid09,deployment-sca04,deployment-webperf12, were stopped briefly due to issue in hypervisor (T215012)

2019-01-30

2019-01-29

  • 07:41 legoktm: legoktm@integration-slave-jessie-1001:/srv/jenkins-workspace/workspace$ sudo rm -rf * b/c full disk

2019-01-28

  • 16:33 hashar: contint1001: cleaning up disk space on /
  • 13:07 addshore: bringing integration-slave-docker-1041 back online
  • 13:07 addshore: addshore@integration-slave-docker-1041:~$ sudo docker image prune -a --force --filter "until=2191h" // (3 months?) Total reclaimed space: 16.12GB
  • 09:37 Amir1: ores:ad160b0 is going beta

2019-01-27

  • 19:57 addshore: bringing integration-slave-docker-1034 back online
  • 19:50 addshore: addshore@integration-slave-docker-1034:~$ sudo docker image prune -a --force --filter "until=2191h" // (3 months?) Total reclaimed space: 17.12GB

2019-01-26

2019-01-25

2019-01-23

2019-01-22

2019-01-21

  • 19:49 hashar: integration: update sudo rule for debian-glue to keep env variable EXTRAPACKAGES. Would let us get eatmydata included | T214328
  • 15:40 hashar: contint1001: removing all generated doc/cover from /srv/org/wikimedia/doc | T137890

2019-01-18

  • 23:22 hashar: contint1001: sudo docker image prune # Total reclaimed space: 3.592GB
  • 23:00 Krinkle: Some docker builds on integration-slave-docker-1021 failing with ENOMEM
  • 23:00 mutante: contint1001 - gzipping more files in /var/log/zuul/
  • 22:57 mutante: contint1001 - moved zuul logs from 2018 and gzipped zuul logs from /var/log/zuul to /srv/logs/zuul to free disk space on /
  • 22:39 mutante: contint1001 - apt-get clean - disk space low
  • 22:31 Krinkle: Updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/482527 / T212602

2019-01-17

  • 19:34 thcipriani: integration-slave-jessie-1002:sudo rm -rf /srv/jenkins-workspace/workspace/* and bring back online
  • 08:39 legoktm: deploying composer docker image - https://gerrit.wikimedia.org/r/484853

2019-01-16

  • 21:11 bearND: (beta): Update mobileapps to 258d76b page summary changes

2019-01-15

  • 09:00 hashar: Deleting Docker images on integration-slave-docker-1021

2019-01-14

  • 22:02 bearND: (beta): Update mobileapps to f2658de
  • 21:47 mutante: deployment-mcs01 - sudo su deploy-service; cd /srv/deployment/mobileapps/deploy-cache/revs/1182b3b8f288df0221257b929ca43fb86862c2f8/scap ; touch log (for debugging permission problem reported by bearND)
  • 14:31 hashar: Nuked Castor cache for all *tox* jobs. Some might have cached binary wheels compiled against a lib that is no more existing (eg libmysqlclient.so.18 for mysql-python). Follow up the jessie -> stretch upgrade # T191764
  • 14:28 hashar: Deleted Castor cache for wikimedia-cz/tracker mysql-python got cached as a wheel but compiled against libmysqlclient.so.18. That fails with the new tox...:0.3.0 containers which uses mariadb / libmysqlclient.so compat symlink

2019-01-11

2019-01-09

2019-01-08

  • 21:52 thcipriani: reloading zuul to deploy https://gerrit.wikimedia.org/r/#/c/integration/config/+/476600/
  • 21:31 Hauskatze: github: @niedzielski updated @jdlrobson permission on Wikimedia from `read` to `admin`
  • 21:30 Hauskatze: github:
  • 20:46 thcipriani: reloading zuul to deploy https://gerrit.wikimedia.org/r/#/c/integration/config/+/482855/
  • 19:53 mutante: deployment-prep adjusting puppet config on deployment-mwmaint01. remove "mediawiki_maintenance" role from "other classes" section and apply "mediawiki::maintenance" instead after role rename in gerrit:479131 for consistency with other mediawiki:: roles
  • 19:53 mutante: adjusting puppet config on deployment-mwmaint01. remove "mediawiki_maintenance" role from "other classes" section and apply "mediawiki::maintenance" instead after role rename in gerrit:479131 for consistency with other mediawiki:: roles
  • 14:25 hashar: Upgrading plugins on https://releases-jenkins.wikimedia.org/
  • 09:19 hashar: gerrit: resaved configuration for All-Projects by changing "Max Reviewers" from 3 to 4. Might enable adding reviewers automatically based on git blame. See task for config diff # T 101131
  • 05:37 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/482752
  • 02:45 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/482751

2019-01-07

2019-01-06

2019-01-03

2019-01-02

  • 10:19 hashar: updating all debian-glue jobs and creating new ones with hardcoded distributions (trusty, jessie, stretch, unstable) T210780

2019-01-01

  • 15:33 hashar: contint1001: deleting some extensions documentation for wmf branches: rm -fR /srv/org/wikimedia/doc/{Kartographer,MinervaNeue,MobileFrontend,Wikibase}/wmf # T118599

Archives