External storage/Text storage data
Appearance
(Redirected from Text storage data)
This page contains historical information. It may be outdated or unreliable.
2011See also: External storage
Raw Data
Text row types as of 2010-02-18. All databases.
Count Type ------------------------------------------------ 9 0,external/simple pointer 435 0/[none] 1482941 [none]/[none] 74069 external,object/simple pointer 56103275 external,utf-8/CGZ pointer 329027392 external,utf-8/DHB pointer 472766 external,utf8/CGZ pointer 7409721 external,utf8/DHB pointer 2890300 external/CGZ pointer 12218 external/simple pointer 4113780 gzip,external/simple pointer 968957 gzip/[none] 178234 object,external/simple pointer 387 object,utf-8/ConcatenatedGzipHistoryBlob 1413 object,utf-8/HistoryBlobStub 216694 object/concatenatedgziphistoryblob 5842435 object/historyblobcurstub 1121994 object/historyblobstub 1 utf-8,external/simple pointer 464549188 utf-8,gzip,external/simple pointer 17076928 utf-8,gzip/[none] 1269 utf-8/[none]
Text row types as of 2011-08-31. All wikis: (see RT:1300 for details on how these numbers were generated.)
Count Type ------------------------------------------ 9 0,external/simple pointer 437 0/[none] 1473028 [none]/[none] 64751783 external,utf-8/CGZ pointer 363080206 external,utf-8/DHB pointer 484000 external,utf8/CGZ pointer 7579959 external,utf8/DHB pointer 1180404 external/CGZ pointer 12218 external/simple pointer 9905103 gzip,external/simple pointer 968337 gzip/[none] 181328 object,external/simple pointer 387 object,utf-8/ConcatenatedGzipHistoryBlob 1413 object,utf-8/HistoryBlobStub 219570 object/concatenatedgziphistoryblob 5866400 object/historyblobcurstub 1046202 object/historyblobstub 1 utf-8,external/simple pointer 797612169 utf-8,gzip,external/simple pointer 17173393 utf-8,gzip/[none] 1269 utf-8/[none]
Change from 2010-02-18 to 2011-08-31:
2010-02-18 2011-08-31 Diff Type --------------------------------------------------------------------- 9 9 0 0,external/simple pointer 435 437 2 0/[none] 1482941 1473028 -9913 [none]/[none] 74069 n/a n/a external,object/simple pointer 56103275 64751783 8648508 external,utf-8/CGZ pointer 329027392 363080206 34052814 external,utf-8/DHB pointer 472766 484000 11234 external,utf8/CGZ pointer 7409721 7579959 170238 external,utf8/DHB pointer 2890300 1180404 -1709896 external/CGZ pointer 12218 12218 0 external/simple pointer 4113780 9905103 5791323 gzip,external/simple pointer 968957 968337 -620 gzip/[none] 178234 181328 3094 object,external/simple pointer n/a 387 n/a object,utf-8/ConcatenatedGzipHistoryBlob n/a 1413 n/a object,utf-8/HistoryBlobStub 1800 n/a n/a object,utf-8/[none] 216694 219570 2876 object/concatenatedgziphistoryblob 5842435 5866400 23965 object/historyblobcurstub 1121994 1046202 -75792 object/historyblobstub 1 1 0 utf-8,external/simple pointer 464549188 797612169 333062981 utf-8,gzip,external/simple pointer 17076928 17173393 96465 utf-8,gzip/[none] 1269 1269 0 utf-8/[none]
Analysis
On the changes from 2010-2011
The rise in "object/historyblobcurstub" doesn't really make sense. The rise in "gzip,external/simple pointer" is concerning.
Description of fields and values
- [none]/[none]
- Uncompressed text, legacy encoding
- 0/[none]
- Uncompressed text, wrong flags due to short-lived bug, never cleaned up
- 0,external/simple pointer
- As above plus MTE
- gzip/[none]
- Compressed text with legacy encoding. Possibly created with Brion's original CO.
- gzip,external/simple pointer
- As above plus MTE
- utf-8/[none]
- Uncompressed MW 1.5+
- utf-8,external/simple pointer
- As above plus MTE
- utf-8,gzip/[none]
- Compressed MW 1.5+, probably generated directly by MW
- utf-8,gzip,external/simple pointer
- Either as above plus MTE, or directly generated by MW (predominant non-recompressed type)
- object,utf-8/ConcatenatedGzipHistoryBlob
- Presumably created by a brief enwiki-only run of CO, in MW 1.5+.
- object,utf-8/HistoryBlobStub
- Stubs for the above CO run
- object/concatenatedgziphistoryblob
- Object created by CO, MW<1.5
- object,external/simple pointer
- As above plus MTE
- object/historyblobcurstub
- Created by the 1.5 upgrade script, a reference to the cur table.
- object/historyblobstub
- Pointer to a CGZ object, created by CO, MW<1.5
- external,object/simple pointer
- Possibly JOMTE
- external/simple pointer
- JOMTE?
- external,utf-8/CGZ pointer
- Late CO, RS or RCT
- external,utf-8/DHB pointer
- RCT
- external,utf8/CGZ pointer
- RCT with buggy encoding name, <r45205
- external,utf8/DHB pointer
- RCT <r45205
- external/CGZ pointer
- RS. Perhaps CO in MW<1.5 also created these.
Legend
- CO
- compressOld.php.
- MTE
- moveToExternal.php.
- MW
- MediaWiki
- JOMTE
- JeLuF's original move to external. I think there was an SQL script or something that he used to move some text when external storage was set up initially, I can't find it now.
- RCT
- recompressTracked.php. The latest and greatest recompression script.
- RS
- resolveStubs.php
How these stats were generated
storageTypeStatsDiff.py and storageTypeStatsSum.py exist in svn
To collect the stats, gather info for every wiki db (this step takes about 24 hours):
ben@hume:~$ cd /home/w/bin/ ben@hume:bin$ ./foreachwiki maintenance/storage/storageTypeStats.php > /tmp/storageTypeStats.log ben@hume:bin$ scp /tmp/storageTypeStats.log fenari:
To sum the stats for each wiki, this output is sent through storageTypeStatsSum.py:
ben@fenari:~$ cd svn/extensions/WikimediaMaintenance/storage/ ben@hume:storage$ ./storageTypeStatsSum.py ~/storageTypeStats.log > current-YYYY-MM-DD
To calculate the differences, grab the previous stats from this page, store them in a date-named file and compare them:
ben@hume:storage$ cat <<EOOLDSTATS > <old-date> <paste in content from this wiki page> EOOLDSTATS ben@hume:storage$ ./storageTypeStatsDiff.py <old-date> <current-date> > /tmp/storageDiffs.log ben@hume:storage$ rm <old-date> <current-date>
paste the new values and the diff into this wiki page
Bugs
- Bug 950: botched conversion from latin1 to UTF-8 on es.wiktionary.org. See the historical worksheet compression corruption.
- bug 22624 compressOld.inc with CGZ may have been run as early as October 2004. It wasn't until December 2004 that r6640 was committed, which prevented CGZ blobs from being moved to the archive table. The English Wikipedia archive table now has 892 CGZ blobs, 1541 HistoryBlobStub objects, and 510 "external,object" rows. These all need to be fixed urgently, since RCT will destroy them.