From Wikitech

mailing list notes

dump process... again... *sigh*

  • [live slave db]:
    1) internally consistent sql db dumps for our own backup purposes, use gzip compression
    2) gzipped xml stub archives
  • [secondary db]:
    3) gzipped xml full archives
  • [xml-based]: -> can run simultaneously
    4) gzip -> bzip2
    5) gzip - 7zip
    6) search index update
    7) yahoo xml [fixme!]
  • [cleanup]:
    archive or remove older dumps

  1. one master process
  2. multiple slave processes
  3. be sensitive to status changes -- added, removed, private wikis during runs


  • Do we have enough space on benet for two complete, clean versions?
    • If not, we need to pull a bigger box to run on.
  • What about historical archives? How much should we preserve?
    • Recommend at *least* keeping the current-articles and full-history .7z versions of each dump around. Online if possible, offline if not.
    • Where do we have space for historical archives?