Jump to content

Traffic/Cookbooks

From Wikitech

CDN

roll-restart-varnish.py

Purpose

Roll restart Varnish frontend based on parameters

How to Use

Roll restart Varnish frontend based on parameters.

   Example usage:
       cookbook sre.cdn.roll-restart-varnish --alias cp-text_codfw --reason 'Emergency restart' \
           --grace-sleep 30 restart_daemons
       cookbook sre.cdn.roll-restart-varnish --query 'A:cp-eqiad and not P{cp1001*}' --reason 'Emergency restart' \
           --batchsize 2 restart_daemons --threads-limited 100000


Flags

--threads-limited
Restart Varnish only if the varnish_main_threads_limited metric variation in the last 10 minutes metric is above the given threshold.


transfer-purged-positions.py

Purpose

Transfer kafka consumer offsets for purged topics.

This cookbook is useful when moving purged from one DC to another

Those are the steps provided by the cookbook:

- Depool the host (automatic) - Disable icinga notification (automatic) - Check that puppet-agent is disabled (pre_scripts) - Transfer kafka consumer offset (_custom_action) - Enable and run Puppet (puppet agent **must** be disabled first with cumin) (_custom_action()) - Repool the host (automatic)


How to Use

Roll apply configuration using puppet-agent for purged

   This cookbook is useful when moving purged consumers from one kafka DC to another
   and syncing positions across consumers groups is required
   Example usage(s):
       cookbook sre.cdn.transfer-purged-positions \
           --alias cp-text_codfw \
           --dc-to codfw \
           --reason 'move purged back to codfw' \
           --puppet-reason 'TXXXXXX'
   

Flags

--puppet-reason
Puppet reason. Must match puppet disable reason used by Cumin
--dc-to
Name of the datacenter switch to. One of %(choices)s.


roll-upgrade-ats.py

Purpose

Upgrade ATS on CDN nodes

  • Depool
  • Install specified trafficserver version
  • Restart trafficserver
  • Repool


How to Use

Roll upgrade Apache Traffic Server based on parameters.

   Example usage:
       cookbook sre.cdn.roll-upgrade-ats \
           --alias cp-text_codfw \
           --reason '9.2.0 upgrade' \
           --version '9.2.0-1wm1'
   

Flags

--version
Specific version to install.


roll-restart-reboot-ncredir.py

Purpose

NCRedir roll operations cookbook.

How to Use

Cookbook to perform a rolling reboot/restart of NCRedir

   Usage example:
       cookbook sre.cdn.roll-restart-reboot-ncredir            --reason "Rolling reboot to pick up new kernel" reboot
       cookbook sre.cdn.roll-restart-reboot-ncredir         --reason "Rolling restart to pick new OpenSSL" restart_daemons



run-puppet-restart-varnish.py

Purpose

Swap port 80 from Varnish to HAProxy, stopping Varnish first and then running Puppet.

This cookbook is useful when managing both Varnish and HAProxy configuration and a ordered restart/reload is needed. Has been created to swap the daemon listening on port 80 from Varnish to HAProxy but can be used as template for other tasks too.

Those are the steps provided by the cookbook:

- Depool the host (automatic) - Disable icinga notification (automatic) - Check that puppet-agent is disabled (pre_scripts) - Stop Varnish service (_custom_action) - Enable and run Puppet (puppet agent **must** be disabled first with cumin) (_custom_action()) - Start Varnish service (_custom_action()) - Test that ports (80 and 443) are open (post_action) - Repool the host (automatic)


How to Use

Roll apply configuration using puppet-agent for both HAProxy and Varnish

   This cookbook is useful when managing both Varnish and HAProxy configuration
   and a ordered restart/reload is needed.
   Has been created to swap the daemon listening on port 80 from Varnish to HAProxy
   but can be used as template for other tasks too.
   Example usage(s):
       cookbook.sre.cdn.run-puppet-restart-varnish \
           --alias cp-text_codfw \
           --reason 'Let HAProxy manage port 80' \
           --puppet-reason 'TXXXXXX' \
           --grace-sleep 1200
   

Flags

--puppet-reason
Puppet reason. Must match puppet disable reason used by Cumin


roll-reboot.py

Purpose

Depool, unmonitor, and reboot instances one-by-one.

How to Use

Reboot CP nodes in the CDN

   Example usage:
       cookbook sre.cdn.roll-reboot             --alias 'cp-text_ulsfo'             --reason 'Kernel update'             --task-id T123456
       cookbook sre.cdn.roll-reboot             --alias 'cp-text_ulsfo'             --reason 'Kernel update'             --task-id T123456             --grace-sleep 1200
   


roll-upgrade-haproxy.py

Purpose

Upgrade HAProxy on CDN nodes

How to Use

Roll upgrade HAProxy based on parameters.

   Example usage:
       cookbook sre.cdn.roll-upgrade-haproxy --alias cp-text_codfw --reason '2.6.9 upgrade' \
           --grace-sleep 30 restart_daemons
       cookbook sre.cdn.roll-upgrade-haproxy --query 'A:cp-eqiad and not P{cp1001*}' --reason '2.6.9 upgrade' \
           --batchsize 2 restart_daemons



upgrade-varnish.py

Purpose

Upgrade/downgrade Varnish on the given cache host between major releases.

- Set Icinga/Alertmanager downtime - Depool - Disable puppet (unless invoked with --hiera-merged) - Wait for admin to merge hiera puppet change (unless invoked with --hiera-merged) - Remove packages - Re-enable puppet and run it to upgrade/downgrade - Run a test request - Repool - Remove Icinga/Alertmanager downtime

Usage example:

   cookbook sre.hosts.upgrade-varnish --hiera-merged "Upgrading varnish -- TXXXXXX" cp3030.esams.wmnet


How to Use

Flags

host
FQDN of the host to act upon.
--downgrade
Downgrade varnish instead of upgrading it
--hiera-merged
Pass this flag if hiera is already updated and puppet is disabled on the host with this message


DNS

roll-restart-reboot-wikimedia-dns.py

Purpose

Rolling restart of Wikimedia DNS services or full reboot.

Whether restarting the services or rebooting the entire host, typical decommissioning logic is preserved. systemd unit ordering should safeguard these units as well.


How to Use

Rolling restart of Wikimedia DNS services

   Example usage:
       cookbook sre.dns.roll-restart-reboot-wikimedia-dns             --alias wikidough-codfw             --reason "Scheduled maintenance"             restart_daemons
       cookbook sre.dns.roll-restart-reboot-wikimedia-dns             --query 'A:wikidough-eqiad and not P{doh1001*}'             --reason "Scheduled maintenance"             --task-id "T12345"             --ignore-restart-errors             --batchsize 2             --grace-sleep 90             restart_daemons
       cookbook sre.dns.roll-restart-reboot-wikimedia-dns             --query 'A:wikidough-eqiad and not P{doh1001*}'             --reason "Scheduled maintenance"             --task-id "T12345"             --batchsize 2             reboot
   


roll-restart-haproxy.py

Purpose

Rolling service restart of haproxy, part of A:dnsbox and providing internal recursor services.

How to Use

Rolling restart of HAProxy on <TODO>.

   Example usage:
       cookbook sre.dns.roll-restart-haproxy                 --query '<TODO> and not P{dns1004*}'            --reason "Scheduled maintenance"            reboot
   


roll-restart-reboot-durum.py

Purpose

Rolling service restart or reboot of durum, the Wikimedia DNS check service.

This is based on the roll-restart-reboot-wikimedia-dns cookbook, with relevant changes to durum.


How to Use

Rolling restart of durum.

   Example usage:
       cookbook sre.dns.roll-restart-reboot-durum             --query 'A:durum-eqiad and not P{durum1001*}'             --reason "Scheduled maintenance"             reboot
       cookbook sre.dns.roll-restart-reboot-durum             --query 'A:durum-eqiad and not P{durum1001*}'             --reason "Scheduled maintenance"             --task-id "T12345"             --ignore-restart-errors             --batchsize 2             --grace-sleep 90             restart_daemons
   


roll-restart-ntp.py

Purpose

Rolling restart of ntpsec.service on the DNS hosts identified by A:dnsbox.

How to Use

Rolling restart of ntpsec.service on the DNS hosts.

   This cookbook is for the rolling restarts of ntpsec.service on the DNS
   hosts.  Since Puppet no longer manages the restarts for us (intentionally),
   this cookbook helps us do that and sets sane automatic defaults for the
   batches and sleep intervals.
   Note that there is an alert in place for ntp.conf: if the file is modified
   and ntpsec.service is not restarted to pick up the changes, we are alerted
   about that. The fix for that is to restart ntpsec.service and now it should
   be done through this cookbook.
   Example usage:
       cookbook sre.dns.roll-restart-ntp                 --alias 'A:dnsbox'                 --task-id T12345                 --reason 'Restarting ntp service' 
       cookbook sre.dns.roll-restart-ntp                 --alias 'A:dnsbox'                 --task-id T12345                 --reason 'Restarting ntp host'                 --grace-sleep 900
   


admin.py

Purpose

Cookbook for GeoDNS pool/depool of a site.

How to Use

Cookbook for GeoDNS pool/depool of a site.

   Pool or depool a site for GeoDNS. By default, it will act on a given site
   for all services (text-addrs, upload-addrs, etc.) unless a service is
   manually specified via --service.
   Usage examples:
       cookbook sre.dns.admin depool eqiad # [depools eqiad for everything]
       cookbook sre.dns.admin pool magru   # [pools magru for everything]
       cookbook sre.dns.admin --service upload-addrs -- depool codfw      # [depool codfw for upload-addrs]
       cookbook sre.dns.admin depool esams --service text-addrs text-next # [depool esams for text*]
   

Flags

action
The kind of action to perform (pool, depool, or show).
valid arguments are: pool, depool, show
site
The site/DC on which to perform the action on.
-s, --service
The service in the site/DC on which the action should be performed.
-r, --reason
An optional reason for the action.
-t, --task-id
An optional Phabricator task ID to log the action.
-f, --force
If passed, do not prompt for any actions (default: prompt)
--emergency-depool-policy
If passed, override the depool threshold and ignore all depool safety checks


roll-reboot.py

Purpose

Rolling reboot of DNS hosts identified by the cumin alias A:dnsbox.

How to Use

Rolling reboot of DNS hosts.

   This cookbook is for the rolling reboots of the DNS hosts, referred to by
   the cumin alias A:dnsbox. This covers both the DNS rec and auth hosts since
   a given DNS box serves both roles.
   Example usage:
       cookbook sre.dns.roll-reboot                 --alias 'A:dnsbox'                 --task-id T12345                 --reason 'Restarting DNS host' 
       cookbook sre.dns.roll-reboot                 --alias 'A:dnsbox'                 --task-id T12345                 --reason 'Restarting DNS host'                 --grace-sleep 900
   


roll-restart.py

Purpose

Rolling service restart of pdns-recursor and HAProxy on A:dnsbox

How to Use

Rolling restart of pdns-recursor and HAProxy on A:dnsbox.

   Example usage:
       cookbook sre.dns.roll-restart             --query 'A:dnsbox and not P{dns1004*}'             --reason "Scheduled maintenance"             restart_daemons
   

Flags

service
The service to restart
valid arguments are: authdns-.*