Mail

From Wikitech
Jump to navigation Jump to search
This page may be outdated or contain incorrect details. Please update it if you can.

Overview of Wikimedia Mail

This section provides a high-level overview of how mail at Wikimedia is treated. More detail on each of these sections exist in the rest of this page.

Product mail

Mail that is produced by MediaWiki. Please fill in this section.

Lists

All mail to @lists.wikimedia.org is handled by Mailman running on https://lists.wikimedia.org/. Public archived lists, private archived lists, and private unarchived lists are located there. There's some sort of synchronization to gmane.org.

Foundation mail

Mail for employees etc. All mail to @wikimedia.org domains is delivered to mx1001.wikimedia.org/mx2001.wikimedia.org (the MX record for wikimedia.org). From there, a few things happen:

  • The recipient is checked against aliases in /etc/exim4/aliases/wikimedia.org. These are mostly ops-related (e.g., noc@) or older aliases. New aliases for ops should go here; new aliases for the rest of the organization should be created in Google.
  • The recipient is checked against LDAP. LDAP contains a few different address types:
    • most employees' mail is forwarded to user accounts at Google;
    • newer aliases are forwarded to Google Groups;
    • some legacy mailing lists are forwaded to @lists.wikimedia.org;
    • some employees are identified as IMAP accounts and their mail is forwarded to sanger.
  • There's a SQLite database that contains filters and addresses to forward to sanger.
    • Use wmfmailadmin on sanger to modify these filters and such – it's rsynced back to mchenry.
  • Mail for Request Tracker is forwarded to streber.

HowTo

This section lists some commonly needed actions and how to perform them.

Modify aliases

Right now mx1001 is the mail relay which also does all ops-related aliasing and some older aliases for the rest of the foundation. All domains use separate aliases files. Each domain has its own alias file in /etc/exim4/aliases/, maintained in the puppet private repo.

New ops-oriented aliases (e.g., noc@, etc.) should be created in alias files, maintained in the puppet private repo. New requests for general foundation-related aliases should be redirected to OIT and be created as a Google Group.

To add/modify/remove an alias, simply edit the corresponding text file on polonium, e.g., /etc/exim4/aliases/wikimedia.org. No additional steps are necessary.

To use the same aliases for multiple domains you can use symbolic links, however be careful because unqualified targets (i.e., mail addresses without a domain, like noc) that are not listed in the same alias file (for example, OTRS queues) may not work as they do not exist in the symbolically linked domain. Use fully qualified addresses in that case.

IMAP account management

Adding / removing a VRT System queue and mail addresses

Just add the queue in the VRT System with appropriate mail addresses and be happy. exim will automatically see that the queue exists or has disappeared, and no involvement from Wikimedia admins is necessary.

Under some circumstances it's possible that, due to negative caching at the secondary MXes, a new mail address will only start working after up to two hours.

Adding / removing mail domains

Set up DNS MX records with mchenry.wikimedia.org as the primary MX, and lists.wikimedia.org as secondary, and things should already start to work. You'll probably want to add an alias file on the primary mail relay though, or no mail will be accepted.

If you don't want to rely on DNS MX records alone, you can also add the domain to the file /etc/exim4/local_domains on the primary mail relay, and /etc/exim4/relay_domains on the secondary mail relays, but this is not a requirement.

Searching the logs

Exim 4's main log file is /var/log/exim4/mainlog. Using exigrep instead of grep may be helpful, as it combines (scattered) log lines per mail transaction.

Design decisions

Reliable mail delivery first
Spam filtering and other tricks are needed, but reliable mail delivery for genuine mail should have a higher priority. False positives in mail rejects and incompatibilities should be kept to a minimum.
Black box mail system, no user shell logins
Few users would make good use of this anyway. Greatly simplifies network and host security, allows the use of some (non-critical) non-standardized extensions between software components for greater performance, interoperability and features because it doesn't have to support whatever shell users might install to access things directly.
IMAP only, no POP3
IMAP has good client support nowadays, and for a large part solves the problem of having multiple clients. Also backups can be done centrally on the server side, and multiple folders with server side mail filtering might be supported.
Support for mail submission
Through SMTP authentication we can allow our users to submit mails through the mail server, without them having to configure an outgoing mail server for whatever network they reside on. Can support multiple ports/protocols to evade firewalls.
SSL/TLS access only, no plain-text
Although client support for this is not 100% yet, especially on mobile devices, the risks of using plain-text protocols is too high, especially with users visiting conferences and other locations with insecure wireless networks.
Quota support
Although we can set quotas widely especially for those who need it, quotas should be implemented to protect the system.
Spam and virus filtering
Is unfortunately necessary. Whether this should be global or per-user is to be determined.
Multi-domain support
We have many domains, and the mail setup should be able to distinguish between domains where necessary.
Web access
Some form of web-mail would be nice, although not critical at first and can be implemented at later stages.
Backups
At least daily, with snapshots.
Cold failover
Setting up a completely redundant system is probably a bit overkill at this stage, but we should make it easy and quick to set up a new mail system on other hardware in case of major breakage.
Documentation
Although not all aspects of the involved software can be described of course, the specifics of the Wikimedia setup should be properly documented and HOWTOs for commonly needed tasks should be provided.

Software

MTA
Exim : Great flexibility, very configurable, reliable, secure.
IMAP server
Dovecot : Fast, secure, flexible.

Formats used

Maildir
Safe, convenient format, moderately good performance, good software support.
Password and user databases
sqlite - Indexed file format, powerful SQL queries, no full-blown RDBMS needed. Easy maintenance, good software support, replication support. Also easy to change to MySQL/PostgreSQL should that ever be necessary. Supported by both Exim and Dovecot.
Other data lookups
either flat-file for small lists, or cdb for larger, indexed lookups.

Mailbox storage and mail delivery

Ext3 as file system
ReiserFS may be a bit faster, but Ext3 is more reliable. Make sure directory indexes are enabled.
LVM
For easy resizing, moving of data to other disks, and snapshots for backups.
RAID-1
The new mail servers have hardware RAID controllers, we'll probably use them.
Dovecot's "deliver" as LDA
Though Exim has a good internal Maildir "transport", the use of Dovecot's LDA allows it to use and update the Dovecot specific indexing for greater performance. This actually restricts some Exim flexibility because no appendfile features (quotas) can be used, forcing the use of deliver counterparts. The performance benefits were only marginal anyway, due to Dovecot's use of dnotify, so use Exim's own Maildir delivery.
fcntl() and dot-file locking
Greatest common divisors.
Maildir++ quotas
Standard, reasonably fast.

Authentication

PLAIN authentication
Universally supported for both IMAP and SMTP. Encrypted connections are used exclusively, so no elaborate hashing schemes needed.
SMD5 or SSHA password scheme
Salted hashing.
SMTP authentication through either Exim's Dovecot authenticator, or using direct lookups
Exim 4.64 has support for directly authenticating against Dovecot's authenticator processes, though this version is not in Ubuntu Feisty yet, so needs backporting. If direct lookups from Exim's authenticators are easy enough, use that. Also depends on the security model.

Layout

The mail setup consists of 2 general mail servers, plus a mailing lists server and a VRT System server. The two general mail servers are mchenry and sanger.

Wikimedia mail setup

One server (mchenry) acts as relay; it accepts mail connections from outside, checks them for spam, viruses and other policy checks, and then queues and/or forwards to the appropriate internal mail server. It also accepts mail destined for outside domains from internal servers, including the application servers.

The other server, sanger, is the IMAP server. It accepts mail from mchenry and delivers it to local user mailboxes. Outgoing mail from SMTP authenticated accounts are also accepted on this server, and forwarded to mchenry, where it's queued and sent out. Web mail and other supportive applications related to user mail accounts and their administration will also run on sanger.

Lily, the mailing lists server, also acts as a secondary MX and forwards non-mailing list mail to mchenry. In case of downtime of mchenry, it might be able to send partial (IMAP account) mail to sanger directly, depending on the added complexity of the configuration. During major hardware failure of sanger, mchenry (with identical hardware) should be able to be setup as IMAP server.

Configuration details

Account database

The user & password account database on the IMAP server is stored in a SQLite database. This format is fast and convenient to use, and can easily be moved to MySQL or PostgreSQL should that be necessary later.

Initially, the schema of this database is intentionally kept simple, because simplicity is good. We could extend it with many tables supporting domains, aliases, other data and do a whole lot of joins to make it work, but right now we don't need that. For example, aliases are simply kept in a text file on the primary mail relay, which works well. If we ever need more features, it'll be easy to adapt the schema to the new situation.

Schema

CREATE TABLE account (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    localpart VARCHAR(128) NOT NULL,
    domain VARCHAR(128) NOT NULL,
    password VARCHAR(64) NOT NULL,
    quota INTEGER DEFAULT '0' NOT NULL,
    realname VARCHAR(64) NULL,
    active BOOLEAN DEFAULT '1' NOT NULL,
    filter BLOB NULL,
    UNIQUE (localpart, domain)
);

Mail relay

The current mail relay is mchenry.

As a mail relay needs to do a lot of DNS lookups, it's a good place for a DNS resolver, and therefore mchenry is pmtpa's secondary DNS recursor - although mchenry uses its own resolver as primary.

mchenry uses Exim 4, the standard Ubuntu Feisty exim4 exim4-daemon-heavy package. This package does some stupid things like running under a Debian-exim user, but not enough to warrant running our own modified version. All configuration lives in /etc/exim4, where exim4.conf is Exim's main configuration file.

The following domain and host lists are defined near the top of the configuration file:

# Standard lists
hostlist wikimedia_nets = <; 66.230.200.0/24 ; 145.97.39.128/26 ; 203.212.189.192/26 ; 211.115.107.128/26 ; 2001:610:672::/48
domainlist system_domains = @
domainlist relay_domains =
domainlist legacy_mailman_domains = wikimedia.org : wikipedia.org
domainlist local_domains = +system_domains : +legacy_mailman_domains : lsearch;CONFDIR/local_domains : @mx_primary/ignore=127.0.0.1

system_domains is a list for domains related to the functioning of the local system, e.g. mchenry.wikimedia.org and associated system users. It has little relevance to the rest of the Wikimedia mail setup, but makes sure that mail submitted by local software is handled properly.

relay_domains is a list for domains that are allowed to be relayed through this host.

local_domains is a compound list of all domains that are in some way processed locally. They are not routed using the standard dnslookup router. Besides the domains listed in /etc/exim4/local_domains, mail will also accepted for any domain which has mchenry (or, one of its interface IP addresses) listed in DNS as the primary MX. This could get abused by people having control over some arbitrary DNS zone of course, but since typically no alias file for it will exist, no mail address will be accepted in that case anyway.

For content scanning, temporary mbox files are written to /var/spool/exim4/scan, and deleted after scanning. To improve performance somewhat, this directory is mounted as a tmpfs filesystem, using the following line in /etc/fstab:

tmpfs   /var/spool/exim4/scan   tmpfs   defaults        0       0

Resource limits

To behave gracefully under load, some resource limits are applied in the main configuration section:

# Resource control
check_spool_space = 50M

No mail delivery if there's less than 50MB free.

deliver_queue_load_max = 75.0
queue_only_load = 50.0

No mail delivery if system load is > 75, and queue-only (without immediate delivery) when load is > 50.

smtp_accept_max = 100
smtp_accept_max_per_host = ${if match_ip{$sender_host_address}{+wikimedia_nets}{50}{5}}

Accept maximally 100 SMTP connections simultaneously, max. 5 from the same host. Unless it's a host from a Wikimedia network, then a higher limit applies.

smtp_reserve_hosts = <; 127.0.0.1 ; ::1 ; +wikimedia_nets

Reserve SMTP connection slots for our own servers.

smtp_accept_queue_per_connection = 500

If more than 500 mails are sent in one connection, queue them without immediate delivery.

smtp_receive_timeout = 1m

Drop the connection if an SMTP line was not received within a 1 minute timeout.

remote_max_parallel = 25

Invoke at most 25 parallel delivery processes.

smtp_connect_backlog = 32

TCP SYN backlog parameter.

Aliases

Each Wikimedia domain (wikimedia.org, wikipedia.org, wiktionary.org, etc...) is now distinct and has its own aliases file, under /etc/exim4/aliases/. Alias files use the standard format. Unqualified address targets in the alias file (local parts without domain) are qualified to the same domain. Special :fail: and :defer: targets and pipe commands are also supported, see http://www.exim.org/exim-html-4.66/doc/html/spec_html/ch22.html#SECTspecitredli.

The following router takes care of this. It's run for all domains in the +local_domains domain list defined near the top of the Exim configuration file. It checks whether the file /etc/exim4/aliases/$domain exists, and then uses it to do an alias lookup.

# Use alias files /etc/exim4/aliases/$domain for domains like
# wikimedia.org, wikipedia.org, wiktionary.org etc.

aliases:
       driver = redirect
       domains = +local_domains
       require_files = CONFDIR/aliases/$domain
       data = ${lookup{$local_part}lsearch*{CONFDIR/aliases/$domain}}
       qualify_preserve_domain
       allow_fail
       allow_defer
       forbid_file
       include_directory = CONFDIR
       pipe_transport = address_pipe

If the exact address is not found in the alias file, it will do another lookup for the key *, so catchalls can be made for specific domains as well.

LDAP accounts and aliases

As the office is now putting staff data in LDAP, the mail relay has been configured to use that as the primary mail account database. Two Exim routers have been added for this, one to lookup mail accounts (which are forwarded to Google Apps), and one for aliases in LDAP:

# LDAP accounts
ldap_account:
       driver = manualroute
       domains = wikimedia.org
       condition = ${lookup ldap \
                       {user="cn=eximagent,ou=other,dc=corp,dc=wikimedia,dc=org" pass=LDAPPASSWORD \
                       ldap:///ou=people,dc=corp,dc=wikimedia,dc=org?mail?sub?(&(objectClass=inetOrgPerson)(mail=${quote_ldap:$local_part}@$domain)(x121Address=1))} \
                       {true}fail}
       local_part_suffix = +*
       local_part_suffix_optional
       transport = remote_smtp
       route_list = *  aspmx.l.google.com

For mail addresses in domain wikimedia.org an LDAP query is done on the default LDAP servers, under the given base DN. Only the mail attribute is returned on successful lookup. The scope is set to sub (so a full subtree search is done), and the filter specifies that only inetOrgPerson objects should match, with the mail attribute matching the exact mail address being tested, and only if the x121Address field exists and is set to 1. If all these conditions are met, the mail is forwarded to Google Apps.

Mail addresses with an optional local part suffix of the form +whatever are also accepted the same as without the suffix, and are forwarded unmodified.

ldap_alias:
       driver = redirect
       domains = wikimedia.org
       data = ${lookup ldap \
                       {user="cn=eximagent,ou=other,dc=corp,dc=wikimedia,dc=org" pass=LDAPPASSWORD \
                       ldap:///ou=people,dc=corp,dc=wikimedia,dc=org?mail?sub?(&(objectClass=inetOrgPerson)(initials=${quote_ldap:$local_part}@$domain))} \
                       {$value}fail}

Aliases are lookup in LDAP as well using the initials attribute, and are rewritten to their canonical form as returned in the mail attribute.

IMAP mail

Mail destined for IMAP accounts on the IMAP server should be recognized and routed specially by the mail relay. Therefore the mail relay has a local copy of the accounts database, and uses a manualroute to route those mail addresses to the IMAP server:

imap:
       driver = manualroute
       domains = +local_domains
       condition = ${lookup sqlite{USERDB \
               SELECT * FROM account WHERE localpart='${quote_sqlite:$local_part}' AND domain='${quote_sqlite:$domain}'}}
       transport = remote_smtp
       route_list = *  sanger.wikimedia.org

RT

RT is implemented on a separate domain (rt.wikimedia.org) and a separate server (streber). From the domain name, Exim knows to forward to it:

domainlist rt_domains = rt.wikimedia.org
# Send RT mails to the RT server
rt:
       driver = manualroute
       domains = +rt_domains
       route_list = * streber.wikimedia.org byname
       transport = remote_smtp


VRT System

For VRT System, the mail relay queries the VRT System MySQL servers directly to check the existence of an mail address. This implies that newly created queues / mail addresses will start to work immediately and no involvement from Wikimedia admins is needed.

The MySQL servers are specified near the top of the Exim configuration file:

# MySQL lookups
hide mysql_servers = srv7.wikimedia.org/otrs/exim/password : \
                     srv8.wikimedia.org/otrs/exim/password

These servers will be queried in turn. If neither of these servers respond, or respond with an error, the mail will be deferred. A MySQL user account "exim" with (just) SELECT privileges on the system_address table of the otrs database needs to exist, which is accessible from the mail relay (mchenry.wikimedia.org).

The following router does the actual aliasing of the VRT address to otrs@ticket.wikimedia.org, if the queue address exists in the database:

# Query the VRT MySQL server(s) for the existence of the queue address
# $local_part@$domain, and alias to otrs@ticket.wikimedia.org if
# successful.

otrs:
       driver = redirect
       domains = +local_domains
       condition = ${lookup mysql{SELECT value0 FROM system_address WHERE value0='${quote_mysql:$local_part@$domain}'}{true}fail}
       data = otrs@ticket.wikimedia.org

In the new VRT setup, this is not done by rewriting the address, but delivering the message with unmodified recipients directly to the VRT server with a manualroute router:

otrs:
       driver = manualroute
       domains = +local_domains
       condition = ${lookup mysql{SELECT value0 FROM system_address WHERE value0='${quote_mysql:$local_part@$domain}'}{true}fail}
       route_list = *  williams.wikimedia.org  byname
       transport = remote_smtp

SpamAssassin

SpamAssassin is installed using the default Ubuntu spamassassin package. A couple of configuration changes were made.

By default, spamd, if enabled, runs as root. To change this:

# adduser --system --home /var/lock/spamassassin --group --disabled-password --disabled-login spamd

The following settings were modified in /etc/default/spamassassin:

# Change to one to enable spamd
ENABLED=1

User preferences are disabled, spamd listens on the loopback interface only, and runs as user/group spamd:

OPTIONS="--max-children 5 --nouser-config --listen-ip=127.0.0.1 -u spamd -g spamd"

Run spamd with nice level 10:

# Set nice level of spamd
NICE="--nicelevel 10"

In /etc/spamassassin/local.cf, the following settings were changed:

trusted_networks 66.230.200.0/24 145.97.39.128/26 203.212.189.192/26 211.115.107.128/26

...so SpamAssassin knows which hosts it can trust.


We also enable the SARE (SpamAssassin Rules Emporium) repository in order to catch more spam (http://saupdates.openprotect.com/)

gpg --keyserver pgp.mit.edu --recv-keys BDE9DC10
gpg --armor -o pub.gpg --export BDE9DC10 
sa-update --import pub.gpg
sa-update --allowplugins --gpgkey D1C035168C1EBC08464946DA258CDB3ABDE9DC10 --channel saupdates.openprotect.com

in /etc/cron.daily/spamassassin, change the following so that the rules are updated daily:

#sa-update || exit 0
sa-update --allowplugins --gpgkey D1C035168C1EBC08464946DA258CDB3ABDE9DC10 --channel saupdates.openprotect.com --channel updates.spamassassin.org || exit 0 

We also want to speed up the spamassassin process as much as we can. To that end, it helps to compiles all the rules. This is taken care of in the cron.daily file, but r2ec is missing. So it needs to be installed:

apt-get install re2c
sa-compile

One of the more useful feature in spam-fighting is the Bayesian filter. It allows spamassassin to detect spam regardless of its rules. However, it needs to be enabled: In /etc/spamassassin/local.cf, the following settings were changed:

use_bayes 1
bayes_auto_learn 1
bayes_path /etc/spamassassin/bayes/bayes  # <- This is important. In a virtual environment, omitting this line renders the Bayesian filter useless. 
bayes_ignore_header X-Bogosity
bayes_ignore_header X-Spam-Flag
bayes_ignore_header X-Spam-Status

...so Bayes is able to learn and stores its database in a central location and benefits everyone

In order to improve on Bayes usefulness, we want it to learn from the users what is Spam and what is Ham. To that end, we need to teach the Bayesian filter. To that end, we are adding two folders in each user's INBOX (on Sanger): =NOTE=: You need to use maildirmake.dovecot to crete this directory, and chown vmail:vmail them.

.INBOX.Bayes_Ham/
.INBOX.Bayes_Spam/

We also need to 'encourage everybody to use these folders, so we add these two lines at the end of everyboy's subscriptions file (/var/vmail/wikimedia.org/USERNAME/subscriptions).

INBOX.Bayes_Ham
INBOX.Bayes_Spam 

Of course once people start filling these new mailboxes, we need to process them. The more interesting part is that the mailboxes are on Sanger and the Spam filtering happens on McHenry. We therefore need to move the messages in these folders over to McHenry so that sa-learn can be ran against them. Here is a small shell script meant to be ran as user vmail on Sanger and located in /usr/local/bin/GatherBayesData.sh

#!/bin/bash

cd /var/vmail/wikimedia.org
echo "Archiving users' Bayes mailboxes..."
tar zcvf /tmp/BayesLearning_`date +%Y-%m-%d--%H`.tgz `find ./ -name *Bayes_[H,Sp]* -type d` >/dev/null 2>&1
echo "Transmitting Spam/Ham to McHenry... ";
scp /tmp/BayesLearning_`date +%Y-%m-%d--%H.tgz` vmail@mchenry:. && rm  /tmp/BayesLearning_`date +%Y-%m-%d--%H.tgz` >/dev/null 2>&1
for user in `ls -d */ | sed s/.$//g` ; do
        echo ""
        echo "=== WORKING WITH $user's MAILBOX ==="
        SpamFolder=`find ./$user/ -type d -name *Bayes_Spam* -exec basename {} \;` ;
        HamFolder=`find ./$user/ -type d -name *Bayes_Ham* -exec basename {} \; `;
        if [ -d "$user/$SpamFolder/" ]
        then
                echo "Found Bayes mailboxe(s): SPAM: $SpamFolder." 
                echo "Purging Spam..."
                # Purging SPAM folder:
                find $user/$SpamFolder/{new,cur}/ -type f -exec rm -f {} \;
        else
                echo "No SPAM Bayes mailboxe(s) found."
                continue
        fi

        if [ -d "$user/$HamFolder/" ]
        then
                echo "Found Bayes mailboxe(s): HAM: $HamFolder." 
                echo "Moving Ham messages back to their original place..."
                find $user/$HamFolder/cur/ -type f -exec mv {} $user/cur/ \;
                find $user/$HamFolder/new/ -type f -exec mv {} $user/new/ \;
        else
                echo "No HAM Bayes mailboxe(s) found."
                continue
        fi
done

This script is pretty self-explanatory:

  1. Create an archive of everybody's HAM and SPAM folders
  2. Sends said archive to McHenry for further processing
  3. move the HAM message back in the user's INBOX (while respecting the status of the message as read / unread)
  4. Permanently delete the SPAM.

On the receiving end (McHenry) we also have a little bash script that passes messages to sa-learn: /var/vmail/process_bayes.sh

  1. !/bin/bash

cd /var/vmail [ $(ls -A *.tgz) ] || (echo "Nothing to process..." ; exit 0 ); for file in `ls *.tgz` ; do

       echo "Processing $file...";
       echo "Creating temp dir: ./tmp_bayes ";
       mkdir ./tmp_bayes;
       echo "Extracting archive...";
       tar -C ./tmp_bayes -zxf $file;
       echo "Analyzing HAM / SPAM for each user/"
       cd tmp_bayes;
       for user in `ls -d */ | sed s/.$//g` ; do
               echo ""
               echo "=== WORKING ON $user's MAILBOX ==="
               SpamFolder=`find $user -type d -name *Bayes_Spam* -exec basename {} \;`
               HamFolder=`find $user -type d -name *Bayes_Ham* -exec basename {} \;`
               echo "Found the following Bayes Mailboxes: SPAM: $user/$SpamFolder  |  HAM: $user/$HamFolder."
               echo "Learning Ham form $user:";
               sa-learn --ham $user/$HamFolder/{cur,new}/*;
               echo "Learning Spam from $user:";
               sa-learn --spam $user/$SpamFolder/{cur,new}/*;
       done
       echo "Completed analysis from $file.";
       cd ..;
       rm -rf ./tmp_bayes;
       rm -f $file;

done

Once again the script is pretty self-explanatory...

  1. look for an archive in /var/vmail (where Sanger sends the daily archive)
  2. untar said file in a temp directory
  3. crawl every user's directory for Spam and Ham message
  4. send those messages to sa-learn
  5. clean up.

Both these scripts are in cron so that they run every night at 10:30 / 11:00PM (PST):

  • Sanger: /etc/cron.d/Bayes-collector:
MAILTO=fvassard@wikimedia.org
SHELL=/bin/bash

30 5 * * * vmail /usr/local/bin/GatherBayesData.sh 
  • McHenry: Root's crontab
MAILTO=fvassard@wikimedia.org
0 6 * * * /var/vmail/process_bayes.sh


The default X-Spam-Report headers are very long because they contain a "content preview", which is rather useless in our setup. This can be modified:

# Do not include the useless content preview
clear_report_template
report Spam detection software, running on the system "_HOSTNAME_", has
report identified this incoming email as possible spam. If you have any
report questions, see _CONTACTADDRESS_ for details.
report
report Content analysis details:   (_SCORE_ points, _REQD_ required)
report
report " pts rule name              description"
report  ---- ---------------------- --------------------------------------------------
report _SUMMARY_

In Exim, SpamAssassin is called from the DATA ACL for domains in domain list spamassassin_domains. exim4.conf:

domainlist spamassassin_domains = *
acl_smtp_data = acl_check_data
acl_check_data:
        # Let's trust local senders to not send out spam
        accept hosts = +wikimedia_nets
               set acl_m0 = trusted relay

        # Run through spamassassin
        accept endpass
               acl = spamassassin

spamassassin:

        # Only run through SpamAssassin if requested for this domain and
        # the message is not too large
        accept condition = ${if >{$message_size}{400K}}

        # Add spam headers if score >= 1
        warn spam = nonexistent:true
             condition = ${if >{$spam_score_int}{10}{1}{0}}
             set acl_m0 = $spam_score ($spam_bar)
             set acl_m1 = $spam_report

        # Reject spam at high scores (> 12)
        deny message = This message scored $spam_score spam points.
             spam = nonexistent/defer_ok
             condition = ${if >{$spam_score_int}{120}{1}{0}}

        accept

First, not listed in spamassassin_domains is accepted, as well as mails bigger than 400 KB. Then a Spam check is done using the local spamd daemons. If that results in a score of minimum 1, two ACL variables are for adding X-Spam-Score: and X-Spam-Report: headers later, in the system filter. If the spam score is 12 or higher, the mail is rejected outright.

System filter

All mail is run through a so called "system filter" that can do certain checks on the mail, and determine actions. A system filter is run once for a mail, and applies to all recipients.

The system filter is set using the main configuration option:

system_filter = CONFDIR/system_filter

In our setup the system filter is used to remove any untrusted spam checker headers, and to add our spam headers to the message. The file /etc/exim4/system_filter has the following content:

# Exim filter

if first_delivery then
    if $acl_m0 is not "trusted relay" then
        # Remove any SpamAssassin headers and add local ones
        headers remove X-Spam-Score:X-Spam-Report:X-Spam-Checker-Version:X-Spam-Status:X-Spam-Level
    endif
    if $acl_m0 is not "" and $acl_m0 is not "trusted relay" then
        headers add "X-Spam-Score: $acl_m0"
        headers add "X-Spam-Report: $acl_m1"
    endif
endif

Mailing lists

Mailing lists now live on a dedicated mailing lists server (lily) on a dedicated mail domain lists.wikimedia.org. However, mail for the old addresses such as info-en@wikipedia.org still come in and should be rewritten to the new addresses, and then forwarded to the mailing lists server.

Near the top of the Exim configuration file a domain list is defined, which contains mail domains that can contain these old addresses:

domainlist legacy_mailman_domains = wikimedia.org : wikipedia.org : mail.wikimedia.org : mail.wikipedia.org

The following router, near the end of the routers section, checks if a given local part exists in the file /etc/exim4/legacy_mailing_lists, and rewrites it to the new address if it does, to be routed via the normal DNS MX/SMTP routers/transports. Since Mailman does not distinguish between domains, only a single local parts file for all legacy Mailman domains exists. This file only needs to contain the mailing list names; all suffixes are handled by the router.

# Alias old mailing list addresses to @lists.wikimedia.org on lily

legacy_mailing_lists:
       driver = redirect
       domains = +legacy_mailman_domains
       data = $local_part$local_part_suffix@lists.wikimedia.org
       local_parts = lsearch;CONFDIR/legacy_mailing_lists
       local_part_suffix = -bounces : -bounces+* : \
                               -confirm+* : -join : -leave : \
                               -owner : -request : -admin : \
                               -subscribe : -unsubscribe
       local_part_suffix_optional

Wiki mail

The application servers send out mail for wiki password reminders/changes, and e-mail notification on changes if enabled. These automated mass mailings are also accepted by the mail relay, mchenry, but are treated somewhat separately. To minimize the chance of external mail servers blocking mchenry's regular mail because of mass emails, these "wiki mails" are sent out using a separate IP.

Near the top of the configuration a macro is defined for the IP address to accept incoming wiki mail, and to use for sending it out to the world:

WIKI_INTERFACE=66.230.200.216

A hostlist is defined for the IP ranges that are allowed to relay from:

hostlist relay_from_hosts = <; @[] ; 66.230.200.0/24 ; 10.0.0.0/16

The rest of the configuration file uses the incoming interface address to distinguish wiki mail from regular mail. Therefore care must be taken that external hosts cannot connect using this interface address. A SMTP Connect ACL takes care of this:

# Policy control
acl_smtp_connect = acl_check_connect
acl_check_connect:
       # Deny external connections to the internal bulk mail submission
       # interface

       deny condition = ${if match_ip{$interface_address}{WIKI_INTERFACE}{true}{false}}
            ! hosts = +wikimedia_nets

       accept

Wiki mail gets picked up by the first router, selecting on incoming interface address and a specific header inserted by MediaWiki:

# Route mail generated by MediaWiki differently

wiki_mail:
       driver = dnslookup
       domains = ! +local_domains
       condition = ${if and{{match_ip{$interface_address}{WIKI_INTERFACE}}{eqi{$header_X-Mailer:}{MediaWiki mailer}}}}
       errors_to = wiki@wikimedia.org
       transport = bulk_smtp
       ignore_target_hosts = <; 0.0.0.0 ; 127.0.0.0/8 ; 0::0/0 ; 10/8 ; 172.16/12 ; 192.168/16
       no_verify

The router directs to a separate SMTP transport, bulk_smtp. no_verify is set because mails from the application servers are not verified anyway, to be as liberal as possible with incoming mails and keep the queues on the application servers small. Queue handling should be done on the mail relay. For other mail, this router is not applicable so is not needed for verification either.

The envelope sender is forced to wiki@wikimedia.org, as it may have been set to something else by sSMTP.

The bulk_smtp transport sets a different outgoing interface IP address, and a separate HELO string:

# Transport for sending out automated bulk (wiki) mail

bulk_smtp:
       driver = smtp
       hosts_avoid_tls = <; 0.0.0.0/0 ; 0::0/0
       interface = WIKI_INTERFACE
       helo_data = wiki-mail.wikimedia.org

Wiki mail also has a shorter retry/bounce time than regular mail; only 8 hours:

begin retry

*       *       senders=wiki@wikimedia.org      F,1h,15m; G,8h,1h,1.5

Postmaster

For any local domain, postmaster@ should be accepted even if it's forgotten in alias files. A special redirect router takes care of this:

# Redirect postmaster@$domain if it hasn't been accepted before

postmaster:
       driver = redirect
       domains = +local_domains
       local_parts = postmaster
       data = postmaster@$primary_hostname
       cannot_route_message = Address $local_part@$domain does not exist

Internal address rewriting

Internal servers in the .pmtpa.wmnet domain sometimes send out mail, which gets rejected by mail servers in the outside world. Sender domain address verification cannot resolve the domain .pmtpa.wmnet, and the mail gets rejected. To solve this, mchenry rewrites the Envelope From to root@wikimedia.org for any mail that has a .pmtpa.wmnet sender address:

#################
# Rewrite rules #
################# 

begin rewrite

# Rewrite the envelope From for mails from internal servers in *.pmtpa.wmnet,
# as they are usually rejected by sender domain address verification.
*@*.pmtpa.wmnet root@wikimedia.org      F

Secondary mail relay

lily is Wikimedia's secondary mail relay. It should do the same policy checks on incoming mail as the primary mail relay, so make sure its ACLs are equivalent for the relevant domains.

Lily does not have a copy/cache of the local parts which are accepted by the primary relay, as that is a dynamic process. Instead, it uses recipient address verification callouts, i.e. it asks the primary mail relay whether a recipient address would be accepted or not. In case the primary mail relay is unreachable, or does not respond within 5-30s, the address is assumed to exist and the mail is accepted - it is, after all, a backup MX. Callouts are cached, so resources are saved for frequently appearing destination addresses.

Relay domains

Secondary mail relays will relay for any domain for which the following holds:

  1. The domain is listed in a static text file of domains: /etc/exim4/relay_domains, or
  2. The secondary mail relay is listed as a secondary MX in DNS for the domain, and
  3. The higher priority MXes are in a configured list of allowed primaries

The latter is to prevent abuse; we don't really want people with control over a DNS zone abusing our mail servers as backup MXes.

Near the top of the configuration file, two domain lists are defined for domains to relay for:

domainlist relay_domains = lsearch;CONFDIR/relay_domains
domainlist secondary_domains = @mx_secondary/ignore=127.0.0.1

relay_domains contains domains explicitly listed in the text file /etc/exim4/relay_domains, and secondary_domains queries DNS whether the local host is listed as a secondary MX. Note: the two lists will usually overlap.

A host list is defined with accepted primary mail relays. This list should only contain IPs, and are the only IP addresses where @mx_secondary domains will be relayed to. For domains explicitly configured in relay_domains, it doesn't matter what the primary MX is.

@mx_secondary domains use a separate dnslookup router, to check the higher priority MX records:

# Relay @mx_secondary domains only to these hosts
hostlist primary_mx = 66.230.200.240
# Route relay domains only if the higher prio MXes are in the allowed list

secondary:
       driver = dnslookup
       domains = ! +relay_domains : +secondary_domains
       transport = remote_smtp
       ignore_target_hosts = ! +primary_mx
       cannot_route_message = Primary MX(s) for $domain not in the allowed list
       no_more

All relevant (= higher priority) MX records not in hostlist primary_mx are removed from the list for consideration by Exim. In case there are no higher priority MX records which coincide with the primary_mx list, the MX list will be empty and the router will decline. As this router is run during address verification in the SMTP session as well, the RCPT command will be rejected.

Exim's dnslookup router has a precondition check check_secondary_mx. However, the secondary_domains domainlist serves the same purpose, and using both at the same time in fact doesn't work, as by the time the check_secondary_mx check is run, Exim will already have removed the local host from the MX list (due to ignore_target_hosts), and the router will decline to run.

Note: this router should not be run for domains in domainlist relay_domains, as for those domains, the MX rules need not to be as stringent. They can be handled by the regular dnslookup router:

# Route non-local domains (including +relay_domains) via DNS MX and A records

dnslookup:
       driver = dnslookup
       domains = ! +local_domains
       transport = remote_smtp
       ignore_target_hosts = <; 0.0.0.0 ; 127.0.0.0/8 ; 10/8 ; 172.16/12 ; 192.168/16
       cannot_route_message = Cannot route to remote domain $domain
       no_more

IMAP server

The IMAP server is sanger. It only receives e-mail destined for its IMAP accounts; other mail is handled by mchenry. Outgoing mail is not sent directly, but routed via the mail relays, so the IMAP server should never build up a large mail queue itself.

Mail storage uses a single system user account vmail, which has been created with the command

# adduser --system --home /var/vmail --no-create-home --group --disabled-password --disabled-login vmail

Mail is stored under the directory /var/vmail, which should be created with the correct permissions:

# mkdir /var/vmail
# chown root:vmail /var/vmail
# chmod g+s /var/vmail

User Debian-exim needs to be part of the vmail group to access the mail directories:

# gpasswd -a Debian-exim vmail

TLS support

For SMTP mail submissions we require authentication over TLS/SSL. To make Exim support server-side TLS connections, a SSL certificate and private key need to be installed. In the main configuration file, set the following two configuration settings:

tls_certificate = /etc/ssl/certs/wikimedia.org.pem
tls_privatekey = /etc/ssl/private/wikimedia.org.key

The private key file should have file permissions set as restricted as possible, but Exim (running as user Debian-exim) should be able to read it. Therefore Debian-exim has been added to the ssl-cert group.

To advertise TLS to all connecting hosts, use:

tls_advertise_hosts = *

To start TLS by default on the SMTPS port, set:

tls_on_connect_ports = 465

There can be a problem with draining the random entropy pool on not very busy servers. Exim in Debian/Ubuntu is linked to GnuTLS instead of OpenSSL, and uses /dev/random. When it tries to regenerate the gnutls-params Diffie Hellman paramaters file, it can get blocked for being out of random entropy and thereby delaying all mails until more random entropy is available. To avoid this, make sure that the Exim cron job can regenerate the parameters file outside Exim, using the certtool command:

# apt-get install gnutls-bin

Local mail submissions

There is a problem with mail submitted through the IMAP server with destinations that are local. All aliasing happens on the mail relay, so a mail to a mail address that exists as a local IMAP account would just be delivered locally, and never go to any of the aliases that might exist for the same mail address on the mail relay. Therefore we force all local mail submissions (recognizable by $received_protocol matching /e?smtpsa$/) to go via the mail relay(s). All routers that might handle such an address locally get an extra condition:

condition = ${if !match{$received_protocol}{\Nsmtpsa$\N}}

Because this condition is used identically on multiple routers, it's been defined as a macro NOT_LOCALLY_SUBMITTED at the top of the configuration file:

NOT_LOCALLY_SUBMITTED=${if !match{$received_protocol}{\Nsmtpsa$\N}}

Routers can thus use:

condition = NOT_LOCALLY_SUBMITTED

User filters

The second router, after system_aliases, applies only to IMAP accounts. It checks whether an IMAP account exists with the specified mail address, and whether that account has a custom user filter. A user filter can be an Exim filter, or a Sieve filter, and is meant to provide more or less the same functionality as procmail filters, i.e. sorting out mail into subfolders, rejecting based on certain criteria and the like.

User filters are loaded as text BLOBs into the account database, and can be changed using wmfmailadmin. If an account's filter field is set to NULL, the Exim setup will revert to a default filter, loaded from the file /etc/exim4/default_user_filter:

# Exim filter

if $h_X-Spam-Score matches "\\N\(\\+{5,}\)\\N" then
    save .Junk/
endif

This filter simply sorts mails classified by SpamAssassin as spam (score 5.0 or higher), into the Junk folder.

The router has some extra filter options set to deny the usage of certain functionality in filters that might compromise system security.

# Run a custom user filter, e.g. to sort mail into subfolders
# By default Exim filter CONFDIR/default_user_filter is run,
# which sorts mail classified spam into the Junk folder

user_filter:
       driver = redirect
       domains = +local_domains
       condition = NOT_LOCALLY_SUBMITTED
       router_home_directory = VMAIL/$domain/$local_part
       address_data = ${lookup sqlite{USERDB \
               SELECT id, filter NOTNULL AS hasfilter \
               FROM account \
               WHERE localpart='${quote_sqlite:$local_part}' \
                       AND domain='${quote_sqlite:$domain}' \
                       AND active='1'}{$value}fail}
       data = ${if eq{${extract{hasfilter}{$address_data}}}{1}{ \
               ${lookup sqlite{USERDB \
               SELECT filter \
               FROM account \
               WHERE id='${quote_sqlite:${extract{id}{$address_data}}}'}}} \
               {${readfile{CONFDIR/default_user_filter}}}}
       allow_filter
       forbid_filter_dlfunc
       forbid_filter_existstest
       forbid_filter_logwrite
       forbid_filter_lookup
       forbid_filter_perl
       forbid_filter_readfile
       forbid_filter_readsocket
       forbid_filter_run
       forbid_include
       forbid_pipe
       user = vmail
       group = vmail
       directory_transport = maildir_delivery
       reply_transport = reply_transport # added for autoreply support
       no_verify

The address_data query checks whether a matching account exists and is active. If it is, the id of the account will be stored in $address_data, along with a boolean value that represents the existence of a custom filter in the account. If the query fails because no matching account is found, the string expansion is forced to fail and the user_filter router is skipped. In the data query, the data previously looked up and stored in $address_data is used. If a custom filter exists for the account, it's looked up in an SQL query. Otherwise the default user filter file is read.

If the filter chooses to decline handling the mail, e.g. because no special action is required (it's not spam), then control is passed to the next router which will handle a normal INBOX Maildir delivery.

IMAP delivery

The next router handles delivery to local mail boxes. If a given mail address exists in the SQLite database, it's handed to the dovecot_delivery transport:

# Delivery to a Maildir mail box.

local_user:
       driver = accept
       domains = +local_domains
       condition = NOT_LOCALLY_SUBMITTED
       local_part_suffix = +*
       local_part_suffix_optional
       address_data = ${lookup sqlite{USERDB \
               SELECT id, quota \
               FROM account \
               WHERE localpart='${quote_sqlite:$local_part}' \
                       AND domain='${quote_sqlite:$domain}' \
                       AND active='1'}{$value}fail}
       transport = maildir_delivery
       transport_home_directory = VMAIL/$domain/$local_part
       transport_current_directory = VMAIL

The local_part_suffix options accept an optional suffix to the local part, e.g. mark+something@

This router is accompanied by the maildir_delivery appendfile transport, which delivers a message to a Maildir mail box:

# Exim appendfile transport for Maildir delivery

maildir_delivery:
       driver = appendfile
       maildir_format
       directory = ${if def:address_file{$address_file}{$home}}
       create_directory
       create_file = belowhome
       delivery_date_add
       envelope_to_add
       return_path_add
       user = vmail
       group = vmail
...

The transport only delivers to Maildir directories (maildir_format), determined by the directory parameter: if $address_file is defined, because it's been set by the user_filter router, then the path in that variable is used. Otherwise it uses the (transport) home directory as set by the local_user router.

If a (sub folder or top level) Maildir directory does not exist yet, it's created by Exim given that it's in or below the specified home directory (create_directory, create_file). The headers Delivery-date, Envelope-to and Return-path are added to the message before delivery. The delivery process runs as uid/gid vmail.

The second part of the transport implements quota support:

...
       # Quota support
       quota = ${if !eq{$received_protocol}{local}{${extract{quota}{$address_data}{${value}K}{0}}}}
       quota_is_inclusive = false
       quota_warn_threshold = 100%
       quota_warn_message = ${expand:${readfile{CONFDIR/quota_warn_message}}}
       maildir_use_size_file
       maildir_quota_directory_regex = ^(?:cur|new|\.(?!Trash).*)$
       maildir_tag = ,S=$message_size

The quota limit is stored in the SQLite database in the quota column, and is stored as kilobytes (this is enforced by the Dovecot plugins). The quota limit, if any, is stored as a keyed field in the $address_data variable by the routers earlier, and thus can be extracted by the transport. This is only done for messages that do not have protocol local. System warnings such as those generated by Exim use protocol local, and therefore get a quota limit of 0 and are allowed through regardlessly.

The quota enforced is not inclusive (quota_is_inclusive), which means that the quota limit is only enforced after it has been exceeded. This is because otherwise a confusing situation could arise where big messages can not be delivered because they would exceed the total mailbox quota size, whereas smaller messages would be let through. This behaviour is a little more consistent with what the user would expect.

Once the user fully exceeds the quota limit (quota_warn_threshold), a warning message as specified in the file /etc/exim4/quota_warn_message is sent to tell the user to clean up (quota_warn_message).

Some Maildir++ extensions are used: Exim uses a maildirsize file in the Maildir to more efficiently keep track of the total size of the mail box, rather than doing a stat() on all files (maildir_use_size_file). Also, a suffix is appended to all Maildir filenames with the size of the message, so a stat() can again be avoided by both Exim and Dovecot, a readdir() is enough. Because this is a black box mail system, this poses no security problems.

The Trash folder is exempted from quota calculation, as this may cause problems when the user actually wants to clean up the mail box.

A small problem exists with the exemption of protocol local messages: in that case the quota is set to 0, which also makes Exim write this quota to the maildirsize file, until the next non-local message is delivered. However, Dovecot doesn't read the quota from this file but also retrieves it directly from the database, so this is not likely to cause any problems.

User left

In some circumstances we want to provide an automatic reply (bounce) to mails for accounts of users that have left the organization. This is implemented using an accept router and an autoreply transport:

The router simply accepts the message if it's for a local domain, and was not submitted locally, and hands it over to the left_message transport:

# Bounce/auto-reply messages for users that have left

user_left:
       driver = accept
       domains = +local_domains
       condition = NOT_LOCALLY_SUBMITTED
       require_files = CONFDIR/userleft/$domain/$local_part
       transport = left_message

The transport takes the message, and wraps it in a new bounce-style message, using the expanded template file /etc/exim4/userleft/$domain/$local_part.

# Autoreply bounce transport for users that have left the organization

left_message:
       driver = autoreply
       file = CONFDIR/userleft/$domain/$local_part
       file_expand
       return_message
       from = Wikimedia Foundation <postmaster@wikimedia.org>
       to = $sender_address
       reply_to = office@wikimedia.org
       subject = User ${quote_local_part:$local_part}@$domain has left the organization: returning message to sender

So, for each user that leaves the organization, the corresponding account must be set to inactive (not deleted!), and a file /etc/exim4/userleft/$domain/$local_part must be created. An example template file is available in /etc/exim4/userleft/TEMPLATE.

Vacation auto-reply

In order to enable the auto-reply feature, a transporter needs to be defined:

reply_transport:
	driver = autoreply

Smart host

The last Exim router in the configuration file handles (outgoing) mail not destined for the local server; it sends mail for all domains to mchenry.wikimedia.org, or lists.wikimedia.org if the former is down.

# Send all mail not destined for the local machine via a set of
# mail relays ("smart hosts")

smart_route:
       driver = manualroute
       transport = remote_smtp
       route_list = *  mchenry.wikimedia.org:lists.wikimedia.org

SMTP authentication

We want our IMAP account users be able to send mail through our mail servers from wherever they are, regardless the network they are on. Therefore we use SMTP authentication, supported by most modern mail clients. TLS is used and enforced to encrypt the connection, so the password cannot be sniffed on the wire.

In Exim, SMTP authentication is controlled through the authenticators in the equally named section of the configuration file. The plaintext driver can take care of both the PLAIN and the LOGIN authentication standards.

# PLAIN authenticator
# Expects the password field to contain a "LDAP format" hash. Only
# (unsalted) {md5}, {sha1}, {crypt} and {crypt16} are supported.

plain:
       driver = plaintext
       public_name = PLAIN
       server_prompts = :
       server_condition = ${lookup sqlite{USERDB \
               SELECT password \
               FROM account \
               WHERE localpart||'@'||domain='${quote_sqlite:$auth2}' \
                       AND active='1'} \
               {${if crypteq{$auth3}{$value}}}{false}}
       server_set_id = $auth2
       server_advertise_condition = ${if def:tls_cipher}

With the PLAIN mechanism, three parameters ($auth1, $auth2, $auth3) are expected from the client. The first one should be empty, the second one should contain the username, the third one the plaintext password.

server_condition is a string expansion that should return either true or false, depending on whether the username and password can be verified to match with those in the user database. It does a SQL lookup in the SQLite database. If the lookup fails, false is returned. If the lookup succeeds, the password is matched using Exim's crypteq function, which supports the crypt, crypt16, md5 and sha1 hashes. The type of hash is expected to be prepended to the hash in curly brackets, e.g. "{SHA1}" - a format which Dovecot also uses.

Unfortunately none of the salted password hash schemes could be used, as for all commonly used formats, either Exim or Dovecot didn't support it. This can be remedied in the future, either by using the Dovecot authenticator in Exim 4.64, or by adding a base64 decoder to Exim's string expansion functions.

The server_set_id is set to the given username, and is the id used by Exim to identify this authenticated connection (for example, in log lines).

server_advertise_condition controls when the SMTP AUTH feature is advertised to connecting hosts in the EHLO reply. This is only done when a TLS encrypted connection has already been established, and thus $tls_cipher will be non-empty. Exim automatically refuses AUTH commands if the AUTH feature had not been advertised.

Dovecot deliver

Dovecot deliver is no longer used, instead Exim's own Maildir delivery transport is used because this allowed for more flexibility with quota and subfolder filtering.

The Dovecot configuration file path is /etc/dovecot/dovecot.conf. The Dovecot LDA needs to be able to read it while running under uid vmail, so the default file permissions are changed:

# chgrp vmail /etc/dovecot/dovecot.conf
# chmod g+r /etc/dovecot/dovecot.conf

If deliver is given a -d username argument, it will attempt an auth DB lookup, which is unnecessary as Exim can provide it with all relevant information. Therefore this argument should not be used.

The postmaster_address option needs to be set for deliver to work:

protocol lda {
  # Address to use when sending rejection mails.
  postmaster_address = postmaster@wikimedia.org
...

deliver needs to know where, and in what format to store mail. As it only has the home directory to work with, use that:

  # Deliver doesn't have username / address info but receives the home
  # directory from Exim in $HOME
  mail_location = maildir:%h

As the LDA is run under the restricted uid/gid vmail, it can't log to Dovecot's default log files without root permissions, so a separate log file is used:

...
  log_path = /var/log/dovecot-deliver.log
  info_log_path = /var/log/dovecot-deliver.log
}

User database syncing

In order to know what accounts exist on the IMAP server, the primary mail relay mchenry must have a (partial) copy of the accounts database. The SQLite database on sanger is rsynced every 15 minutes to mchenry by the CRON job /etc/cron.d/rsync-userdb:

*/15 * * * *    root    rsync -a /var/vmaildb/ mchenry-rsync:/var/vmaildb

The relevant ssh keys are in /root/.ssh/rsync, and setup in /root/.ssh/config.

Mail box cleanup

Mail boxes are automatically moved out of the way (daily) once an account ceases to exist completely in the account database. To handle this, a small script has been written, mbcleanup.py, available in SVN in the wmfmailadmin directory. Its run daily from /etc/cron.daily/mailbox-cleanup. It takes three arguments, account db path, mailboxes root path and backup root path, respectively. From the account database it pulls a list of all existing accounts, and compares this with a set of mail boxes it finds from the two level directory structure under mailbox root path (ignoring .dot-directories and permission denieds). Superfluous mailboxes are then moved to the backup directory with a timestamp appended.

See also

  • Mailing lists for the setup of the mailing lists server.
  • Dovecot for a detailed setup of the IMAP server.
  • OTRS for specifics on the OTRS setup.

Puppet configuration

Troubleshooting

"Exim SMTP" Alerts

This checks the availability of the smtp service (on port tcp/25) and the certificate validity on a given host.

Alert troubleshooting tips:

  • Ensure that Exim is running
  • Check Exim logs (in particular /var/log/exim4/paniclog) for signs of distress
  • Ensure that the certificate served by Exim on port 25 has not expired, or been revoked.

"Exim queue" Alerts

This checks the number of messages that currently can not be delivered and have been queued for later delivery.

Alert troubleshooting tips:

Review the mail queue and logs looking for a common cause of deferred messages. Frequent causes for deferred messages are:

  • User(s) with problems on their delivery server (full inbox, deleted account, etc.)
  • A remote mail system is down (e.g. mail to example.org is down due to an outage example.org is working to fix)
  • DNS blocklist. Mail from our system is being blocked/deferred due to one of our mail server IPs being listed on an RBL.

Often times the queue alerts are an early warning, or an effect of a service interruption outside our control, and will resolve by themselves in time. Still, it's important to confirm the issue is not within our own infrastructure.

External documentation