User:Bhartshorne/ops meeting notes 2011-08-24
Appearance
(Redirected from User:Bhartshorne/mysql failover notes)
- update db.php to mark the cluster read only (near the bottom of the file)
- deploy the new db.php
- read Switch_master
- script in /home/w/src/mediawiki/tools/switch-master
- only works when the master is up
- assumes ~/.my.cnf contains the root mysql password
when a master crashes, we never wait for crash recovery. we always rotate in a new master.
dns
pdns has 3 backends - order is pipe, geo, bind
- bind
- in svn, yadda yadda.
- geo
- ip-map - comes from out there on the net. contains IP ranges to country mapping. last octet of the 127 addr is the ISO country code.
- geo-maps - contains the ISO code (from ip-map) to localized name.
- resolution is based on the source IP of the DNS server querying our authoritative server
- powerds/scenarios/*
- there are three scenarios for when a given datacenter is down or everything's normal. switched using symlinks to the right place.
- pipe
- if a query source address is in a select list of participants, the pipe backend will return an IPV6 response for upload.esams.wikimedia.org in addition to the IPV4 response.
caching
pmtpa
- no ICP - squids aren't peers. cache affinity is by URL hashing.
- frontend squid and backend squid (coresident on the same host)
- frontend
- has 100MB in-memory cache. this is duplicated over all the squids (i.e. same stuff in all of them)
- uses CARP to hash the URL to pass to a specific backend squid
- serves about 50% of our page load
- backend
- disk-backed cache
esams
- same front end
- backend misses go to a specific pmtpa backend using the same hashing algorithm rather than going through pmtpa's front ends.
different services
- text: actual wiki text
- upload: static images
- bits: javascript, css - things that don't change
- uses varnish, not squid
- entire dataset fits in memory
- also hosts geoiplookup.wikimedia.org
API
- api requests are in the text hostname, so share the text frontends
- frontends hash all API urls to a different set of squid backends
- squid backends in text are some text some api
mobile
- separate cluster, all varnish3
cache expiration
- we don't rely on expiration times. we want to expire the page when it changes, not after a timeout
- htcp is like icp (squid's peering protocol) - udp cache purging.
- works with multicast
- varnish - daemon that also listens for the same packets, obey the same purge messages
- thumbnail purging is different
- nginx servers run a daemon that listen to htcp
Purging
- htcp, etc.
- also can put a URL parameter on a page to mediawiki to convince it to purge the page (?purge=yes or something)
Mobile
- m.wiki is a collection of ruby (not on rails) servers (fronted by LVS).
- ruby forwards to the frontend squids
- mediawiki has a 'MobileFrontend' extension
- normal squids have an ACL that do some device detection and forward you to m.wiki if you match.
New system:
- mobile starts at varnish boxes
- similar frontend / backend setup to the squids
- daemon processing htcp only purges backend, frontend has a 300s cache timeout
- varnish boxes detect what mobile device you're running and set an X-mobile header (using a VCL)
- goes straight to the app servers (does not go through the squids)
- URL coming into mobile is en.m.wiki.../normal/path. that's translated to the normal URL in varnish (not sure which one) so the app servers get the same URL for both mobile and non-mobile requests
- can differentiate by the x-mobile header
- app servers set no-cache if the request is coming from squid, legacy
- css/javascript/etc all comes from bits, same as the regular site.
- the MobileFrontend extension uses ResourceLoader to pull in the appropriate js/css files for mobile as part of the page
swift
questions for russ:
- does swiftmedia extenios support chunked uploads?
- can we split out thumbs and originals into separate containers? it's been useful in the past
- we already split out per project...
- stashed media?
- archived / deleted files?
- when migrating files into swift (for the initial deploy) can we keep the current timestamp?
- this is important, but we can live without it if it's a realy PITA