Mailman/Mailman2/Mailman encoding bug

From Wikitech

Two messages in mediawiki-l archives from February 2006 triggered this exciting error when rebuilding archives with 'wipe':

Updating HTML for article 5494
Pickling archive state into /usr/local/mailman/archives/private/mediawiki-l/pipermail.pck
Traceback (most recent call last):
  File "bin/arch", line 200, in ?
    main()
  File "bin/arch", line 188, in main
    archiver.processUnixMailbox(fp, start, end)
  File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 585, in processUnixMailbox
    self.add_article(a)
  File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 626, in add_article
    filename))
  File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 1116, in write_article
    f.write(article.as_text())
  File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 579, in as_text
    '\g<1>' + _(' at ') + '\g<2>', body)
  File "/usr/local/python2.4/lib/python2.4/sre.py", line 142, in sub
    return _compile(pattern, 0).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 68: ordinal not in range(128)

As a workaround, I added this hack to skip over the affected items:

--- /usr/local/mailman/Mailman/Archiver/HyperArch.py.orig       2006-03-31 18:11:26.000000000 +0000
+++ /usr/local/mailman/Mailman/Archiver/HyperArch.py    2006-03-31 18:24:56.000000000 +0000
@@ -575,8 +575,11 @@
             otrans = i18n.get_translation()
             try:
                 i18n.set_language(self._lang)
-                body = re.sub(r'([-+,.\w]+)@([-+.\w]+)',
-                              '\g<1>' + _(' at ') + '\g<2>', body)
+                try:
+                    body = re.sub(r'([-+,.\w]+)@([-+.\w]+)',
+                                  '\g<1>' + _(' at ') + '\g<2>', body)
+                except UnicodeDecodeError:
+                    body = 'Mailman has a bug and Unicode decoding broke here.'
             finally:
                 i18n.set_translation(otrans)
         return NL.join(headers) % d + '\n\n' + body + '\n'