User:Bhartshorne/ops meeting notes 2011-08-24

From Wikitech
  • update db.php to mark the cluster read only (near the bottom of the file)
  • deploy the new db.php
  • read Switch_master
  • script in /home/w/src/mediawiki/tools/switch-master
    • only works when the master is up
    • assumes ~/.my.cnf contains the root mysql password


when a master crashes, we never wait for crash recovery. we always rotate in a new master.

dns

see PowerDNS and DNS

pdns has 3 backends - order is pipe, geo, bind

  • bind
    • in svn, yadda yadda.
  • geo
    • ip-map - comes from out there on the net. contains IP ranges to country mapping. last octet of the 127 addr is the ISO country code.
    • geo-maps - contains the ISO code (from ip-map) to localized name.
    • resolution is based on the source IP of the DNS server querying our authoritative server
    • powerds/scenarios/*
      • there are three scenarios for when a given datacenter is down or everything's normal. switched using symlinks to the right place.
  • pipe
    • if a query source address is in a select list of participants, the pipe backend will return an IPV6 response for upload.esams.wikimedia.org in addition to the IPV4 response.

caching

pmtpa

  • no ICP - squids aren't peers. cache affinity is by URL hashing.
  • frontend squid and backend squid (coresident on the same host)
  • frontend
    • has 100MB in-memory cache. this is duplicated over all the squids (i.e. same stuff in all of them)
    • uses CARP to hash the URL to pass to a specific backend squid
    • serves about 50% of our page load
  • backend
    • disk-backed cache

esams

  • same front end
  • backend misses go to a specific pmtpa backend using the same hashing algorithm rather than going through pmtpa's front ends.

different services

  • text: actual wiki text
  • upload: static images
  • bits: javascript, css - things that don't change
    • uses varnish, not squid
    • entire dataset fits in memory
    • also hosts geoiplookup.wikimedia.org

API

  • api requests are in the text hostname, so share the text frontends
  • frontends hash all API urls to a different set of squid backends
    • squid backends in text are some text some api

mobile

  • separate cluster, all varnish3

cache expiration

  • we don't rely on expiration times. we want to expire the page when it changes, not after a timeout
  • htcp is like icp (squid's peering protocol) - udp cache purging.
    • works with multicast
  • varnish - daemon that also listens for the same packets, obey the same purge messages
  • thumbnail purging is different
    • nginx servers run a daemon that listen to htcp

Purging

  • htcp, etc.
  • also can put a URL parameter on a page to mediawiki to convince it to purge the page (?purge=yes or something)

Mobile

  • m.wiki is a collection of ruby (not on rails) servers (fronted by LVS).
  • ruby forwards to the frontend squids
  • mediawiki has a 'MobileFrontend' extension
  • normal squids have an ACL that do some device detection and forward you to m.wiki if you match.

New system:

  • mobile starts at varnish boxes
    • similar frontend / backend setup to the squids
    • daemon processing htcp only purges backend, frontend has a 300s cache timeout
  • varnish boxes detect what mobile device you're running and set an X-mobile header (using a VCL)
  • goes straight to the app servers (does not go through the squids)
  • URL coming into mobile is en.m.wiki.../normal/path. that's translated to the normal URL in varnish (not sure which one) so the app servers get the same URL for both mobile and non-mobile requests
    • can differentiate by the x-mobile header
  • app servers set no-cache if the request is coming from squid, legacy
  • css/javascript/etc all comes from bits, same as the regular site.
    • the MobileFrontend extension uses ResourceLoader to pull in the appropriate js/css files for mobile as part of the page

swift

questions for russ:

  • does swiftmedia extenios support chunked uploads?
  • can we split out thumbs and originals into separate containers? it's been useful in the past
    • we already split out per project...
  • stashed media?
  • archived / deleted files?
  • when migrating files into swift (for the initial deploy) can we keep the current timestamp?
    • this is important, but we can live without it if it's a realy PITA