Dumps/Known issues and wish list

From Wikitech
Jump to: navigation, search

(This needs cleanup.)

Missing features

Currently, image tarballs *are* being made (if off-site).

There's an extension that can produce static HTML dumps separately, but it would take a long time to complete on today's en wiki. So it's languishing. Parsoid -> HTML may be the solution for this.

Really really need better import tools. Working on that. Something that took advantage of multiple cores to write multiple sql files would be good.

"Incremental" dumps? We have the adds-changes dumps which are a starting point.

A wish list for the dumps is available here.

The list of outstanding bugs is here.

Notes

Failures of dumpPages.php should be detected, but indirectly from the failure of mwdumper to parse its XML output. <-- current?

  • The page XML dumps should be consistent, all three outputs draw from one input, which is drawn from one long SQL transaction plus supplementary data loads which should be independent of changes. Weeelll.. we don't lock the tables, so don't count on this either.
  • The other SQL dumps are not going to be 100% time-consistent. The only way to dal with this would be to have a set of slaves that were dedicated for dumps runs, that seems a bit overkill.