VRT System/Troubleshooting

From Wikitech

Troubleshooting

Mail delivery

  • When user/group for the exim pipe were incorrect, otrs.Postmaster.pl logged permissions to /var/log/mail.log about errors on attempts to write in /opt/otrs/var/tmp/CacheFileStorable. We fixed this by configuring the exim pipe to use group=www-data.
  • Mail hosts need mysql access to the otrs database. If mx IP addresses change or the database is inaccessible, mail defers on whichever mx is trying to do an address lookup. When we saw this happen, exim wasn't very informative about why. When in doubt double-check mysql access from the mx's command line.
  • Spamassassin runs locally, and logs in /var/log/mail.log.

Apache permissions errors

  • as user otrs run:
/opt/otrs/bin/otrs.SetPermissions.pl --otrs-user=otrs --otrs-group=otrs --web-user=www-data --web-group=www-data /opt/otrs

SpamAssassin stops reporting Bayes results

  • This happened 2014-04-24, 2016-08-06, 2016-12-21 and we discovered it was unhappy about the bayes database.
  • /var/log/syslog was full of this:
bayes db version 0 is not able to be used, aborting! at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 203, <GEN88>
  • We tried backup/restore the database (the verify failed), and the database shrank from ~24M to ~14M and SA stopped complaining. But SA continued to pass through mail with no Bayes results (the BAYES_XX where XX in [00,90] Header added to the message was missing)
  • So we moved aside the old database, modifed otrs.TicketExport2Mbox.pl not to skip previously-seen messages, and created one-time GenericAgent [within OTRS] jobs to re-export a couple of days worth of ham/spam. Then we ran train_spamassaassin manually to train on all this data. Note otrs.TicketExport2Mbox now has --rebuild mode to support this process.
  • On the 2016-08-06 incident the log statement below was found in the logs:
  Aug  6 09:56:59.752 [1619] dbg: bayes: not available for scanning, only 126 ham(s) in bayes DB < 200

after running a:

 sudo -u debian-spamd spamassassin -D bayes < /tmp/sample_email.eml

and a

 sudo -u debian-spamd sa-learn --dump magic

confirmed it.

The fix was the re-exporting and training of spamassassin as mentioned above. Extra care should be take to make sure the spam/ham messages exported are above 200 in every case.

  • On the 2016-12-21 incident both the hams and the spams in the database were below 200. That was not logged however as the message about the hams above, leading the investigation off track for a while. Exporting quite a few messages and training spamassasin on them as above fixed the issue. The database in this case was NOT being marked as corrupted by db_verify but it was truncated in the end just for good measure manually