Overview of Wikimedia Mail
This section provides a high-level overview of how mail at Wikimedia is treated. More detail on each of these sections exist in the rest of this page.
Product mail
Mail that is produced by MediaWiki. Please fill in this section.
Lists
All mail to @lists.wikimedia.org is handled by Mailman running on https://lists.wikimedia.org/. Public archived lists, private archived lists, and private unarchived lists are located there. There's some sort of synchronization to gmane.org.
Foundation mail
Mail for employees etc. All mail to @wikimedia.org domains is delivered to mx1001.wikimedia.org/mx2001.wikimedia.org (the MX record for wikimedia.org). From there, a few things happen:
- The recipient is checked against aliases in
/etc/exim4/aliases/wikimedia.org
. These are mostly ops-related (e.g., noc@) or older aliases. New aliases for ops should go here; new aliases for the rest of the organization should be created in Google. - The recipient is checked against LDAP. LDAP contains a few different address types:
- most employees' mail is forwarded to user accounts at Google;
- newer aliases are forwarded to Google Groups;
- some legacy mailing lists are forwaded to @lists.wikimedia.org;
- some employees are identified as IMAP accounts and their mail is forwarded to sanger.
- There's a SQLite database that contains filters and addresses to forward to sanger.
- Use
wmfmailadmin
on sanger to modify these filters and such – it's rsynced back to mchenry.
- Use
- Mail for Request Tracker is forwarded to streber.
HowTo
This section lists some commonly needed actions and how to perform them.
Modify aliases
Right now mx1001 is the mail relay which also does all ops-related aliasing and some older aliases for the rest of the foundation. All domains use separate aliases files. Each domain has its own alias file in /etc/exim4/aliases/, maintained in the puppet private repo.
New ops-oriented aliases (e.g., noc@, etc.) should be created in alias files, maintained in the puppet private repo. New requests for general foundation-related aliases should be redirected to OIT and be created as a Google Group.
To add/modify/remove an alias, simply edit the corresponding text file on polonium, e.g., /etc/exim4/aliases/wikimedia.org. No additional steps are necessary.
To use the same aliases for multiple domains you can use symbolic links, however be careful because unqualified targets (i.e., mail addresses without a domain, like noc) that are not listed in the same alias file (for example, OTRS queues) may not work as they do not exist in the symbolically linked domain. Use fully qualified addresses in that case.
IMAP account management
Adding / removing a VRT System queue and mail addresses
Just add the queue in the VRT System with appropriate mail addresses and be happy. exim will automatically see that the queue exists or has disappeared, and no involvement from Wikimedia admins is necessary.
Under some circumstances it's possible that, due to negative caching at the secondary MXes, a new mail address will only start working after up to two hours.
Adding / removing mail domains
Set up DNS MX records with mchenry.wikimedia.org as the primary MX, and lists.wikimedia.org as secondary, and things should already start to work. You'll probably want to add an alias file on the primary mail relay though, or no mail will be accepted.
If you don't want to rely on DNS MX records alone, you can also add the domain to the file /etc/exim4/local_domains on the primary mail relay, and /etc/exim4/relay_domains on the secondary mail relays, but this is not a requirement.
Searching the logs
Exim 4's main log file is /var/log/exim4/mainlog. Using exigrep instead of grep may be helpful, as it combines (scattered) log lines per mail transaction.
Design decisions
- Reliable mail delivery first
- Spam filtering and other tricks are needed, but reliable mail delivery for genuine mail should have a higher priority. False positives in mail rejects and incompatibilities should be kept to a minimum.
- Black box mail system, no user shell logins
- Few users would make good use of this anyway. Greatly simplifies network and host security, allows the use of some (non-critical) non-standardized extensions between software components for greater performance, interoperability and features because it doesn't have to support whatever shell users might install to access things directly.
- IMAP only, no POP3
- IMAP has good client support nowadays, and for a large part solves the problem of having multiple clients. Also backups can be done centrally on the server side, and multiple folders with server side mail filtering might be supported.
- Support for mail submission
- Through SMTP authentication we can allow our users to submit mails through the mail server, without them having to configure an outgoing mail server for whatever network they reside on. Can support multiple ports/protocols to evade firewalls.
- SSL/TLS access only, no plain-text
- Although client support for this is not 100% yet, especially on mobile devices, the risks of using plain-text protocols is too high, especially with users visiting conferences and other locations with insecure wireless networks.
- Quota support
- Although we can set quotas widely especially for those who need it, quotas should be implemented to protect the system.
- Spam and virus filtering
- Is unfortunately necessary. Whether this should be global or per-user is to be determined.
- Multi-domain support
- We have many domains, and the mail setup should be able to distinguish between domains where necessary.
- Web access
- Some form of web-mail would be nice, although not critical at first and can be implemented at later stages.
- Backups
- At least daily, with snapshots.
- Cold failover
- Setting up a completely redundant system is probably a bit overkill at this stage, but we should make it easy and quick to set up a new mail system on other hardware in case of major breakage.
- Documentation
- Although not all aspects of the involved software can be described of course, the specifics of the Wikimedia setup should be properly documented and HOWTOs for commonly needed tasks should be provided.
Software
- MTA
- Exim : Great flexibility, very configurable, reliable, secure.
- IMAP server
- Dovecot : Fast, secure, flexible.
Formats used
- Maildir
- Safe, convenient format, moderately good performance, good software support.
- Password and user databases
- sqlite - Indexed file format, powerful SQL queries, no full-blown RDBMS needed. Easy maintenance, good software support, replication support. Also easy to change to MySQL/PostgreSQL should that ever be necessary. Supported by both Exim and Dovecot.
- Other data lookups
- either flat-file for small lists, or cdb for larger, indexed lookups.
Mailbox storage and mail delivery
- Ext3 as file system
- ReiserFS may be a bit faster, but Ext3 is more reliable. Make sure directory indexes are enabled.
- LVM
- For easy resizing, moving of data to other disks, and snapshots for backups.
- RAID-1
- The new mail servers have hardware RAID controllers, we'll probably use them.
Dovecot's "deliver" as LDAThough Exim has a good internal Maildir "transport", the use of Dovecot's LDA allows it to use and update the Dovecot specific indexing for greater performance.This actually restricts some Exim flexibility because no appendfile features (quotas) can be used, forcing the use of deliver counterparts. The performance benefits were only marginal anyway, due to Dovecot's use of dnotify, so use Exim's own Maildir delivery.- fcntl() and dot-file locking
- Greatest common divisors.
- Maildir++ quotas
- Standard, reasonably fast.
Authentication
- PLAIN authentication
- Universally supported for both IMAP and SMTP. Encrypted connections are used exclusively, so no elaborate hashing schemes needed.
- SMD5 or SSHA password scheme
- Salted hashing.
- SMTP authentication through either Exim's Dovecot authenticator, or using direct lookups
- Exim 4.64 has support for directly authenticating against Dovecot's authenticator processes, though this version is not in Ubuntu Feisty yet, so needs backporting. If direct lookups from Exim's authenticators are easy enough, use that. Also depends on the security model.
Layout
The mail setup consists of 2 general mail servers, plus a mailing lists server and a VRT System server. The two general mail servers are mchenry and sanger.
One server (mchenry) acts as relay; it accepts mail connections from outside, checks them for spam, viruses and other policy checks, and then queues and/or forwards to the appropriate internal mail server. It also accepts mail destined for outside domains from internal servers, including the application servers.
The other server, sanger, is the IMAP server. It accepts mail from mchenry and delivers it to local user mailboxes. Outgoing mail from SMTP authenticated accounts are also accepted on this server, and forwarded to mchenry, where it's queued and sent out. Web mail and other supportive applications related to user mail accounts and their administration will also run on sanger.
Lily, the mailing lists server, also acts as a secondary MX and forwards non-mailing list mail to mchenry. In case of downtime of mchenry, it might be able to send partial (IMAP account) mail to sanger directly, depending on the added complexity of the configuration. During major hardware failure of sanger, mchenry (with identical hardware) should be able to be setup as IMAP server.
Configuration details
Account database
The user & password account database on the IMAP server is stored in a SQLite database. This format is fast and convenient to use, and can easily be moved to MySQL or PostgreSQL should that be necessary later.
Initially, the schema of this database is intentionally kept simple, because simplicity is good. We could extend it with many tables supporting domains, aliases, other data and do a whole lot of joins to make it work, but right now we don't need that. For example, aliases are simply kept in a text file on the primary mail relay, which works well. If we ever need more features, it'll be easy to adapt the schema to the new situation.
Schema
CREATE TABLE account ( id INTEGER PRIMARY KEY AUTOINCREMENT, localpart VARCHAR(128) NOT NULL, domain VARCHAR(128) NOT NULL, password VARCHAR(64) NOT NULL, quota INTEGER DEFAULT '0' NOT NULL, realname VARCHAR(64) NULL, active BOOLEAN DEFAULT '1' NOT NULL, filter BLOB NULL, UNIQUE (localpart, domain) );
Mail relay
The current mail relay is mchenry.
As a mail relay needs to do a lot of DNS lookups, it's a good place for a DNS resolver, and therefore mchenry is pmtpa's secondary DNS recursor - although mchenry uses its own resolver as primary.
mchenry uses Exim 4, the standard Ubuntu Feisty exim4 exim4-daemon-heavy package. This package does some stupid things like running under a Debian-exim user, but not enough to warrant running our own modified version. All configuration lives in /etc/exim4, where exim4.conf is Exim's main configuration file.
The following domain and host lists are defined near the top of the configuration file:
# Standard lists hostlist wikimedia_nets = <; 66.230.200.0/24 ; 145.97.39.128/26 ; 203.212.189.192/26 ; 211.115.107.128/26 ; 2001:610:672::/48 domainlist system_domains = @ domainlist relay_domains = domainlist legacy_mailman_domains = wikimedia.org : wikipedia.org
domainlist local_domains = +system_domains : +legacy_mailman_domains : lsearch;CONFDIR/local_domains : @mx_primary/ignore=127.0.0.1
system_domains is a list for domains related to the functioning of the local system, e.g. mchenry.wikimedia.org and associated system users. It has little relevance to the rest of the Wikimedia mail setup, but makes sure that mail submitted by local software is handled properly.
relay_domains is a list for domains that are allowed to be relayed through this host.
local_domains is a compound list of all domains that are in some way processed locally. They are not routed using the standard dnslookup router. Besides the domains listed in /etc/exim4/local_domains, mail will also accepted for any domain which has mchenry (or, one of its interface IP addresses) listed in DNS as the primary MX. This could get abused by people having control over some arbitrary DNS zone of course, but since typically no alias file for it will exist, no mail address will be accepted in that case anyway.
For content scanning, temporary mbox files are written to /var/spool/exim4/scan, and deleted after scanning. To improve performance somewhat, this directory is mounted as a tmpfs filesystem, using the following line in /etc/fstab:
tmpfs /var/spool/exim4/scan tmpfs defaults 0 0
Resource limits
To behave gracefully under load, some resource limits are applied in the main configuration section:
# Resource control check_spool_space = 50M
No mail delivery if there's less than 50MB free.
deliver_queue_load_max = 75.0 queue_only_load = 50.0
No mail delivery if system load is > 75, and queue-only (without immediate delivery) when load is > 50.
smtp_accept_max = 100 smtp_accept_max_per_host = ${if match_ip{$sender_host_address}{+wikimedia_nets}{50}{5}}
Accept maximally 100 SMTP connections simultaneously, max. 5 from the same host. Unless it's a host from a Wikimedia network, then a higher limit applies.
smtp_reserve_hosts = <; 127.0.0.1 ; ::1 ; +wikimedia_nets
Reserve SMTP connection slots for our own servers.
smtp_accept_queue_per_connection = 500
If more than 500 mails are sent in one connection, queue them without immediate delivery.
smtp_receive_timeout = 1m
Drop the connection if an SMTP line was not received within a 1 minute timeout.
remote_max_parallel = 25
Invoke at most 25 parallel delivery processes.
smtp_connect_backlog = 32
TCP SYN backlog parameter.
Aliases
Each Wikimedia domain (wikimedia.org, wikipedia.org, wiktionary.org, etc...) is now distinct and has its own aliases file, under /etc/exim4/aliases/. Alias files use the standard format. Unqualified address targets in the alias file (local parts without domain) are qualified to the same domain. Special :fail: and :defer: targets and pipe commands are also supported, see http://www.exim.org/exim-html-4.66/doc/html/spec_html/ch22.html#SECTspecitredli.
The following router takes care of this. It's run for all domains in the +local_domains domain list defined near the top of the Exim configuration file. It checks whether the file /etc/exim4/aliases/$domain exists, and then uses it to do an alias lookup.
# Use alias files /etc/exim4/aliases/$domain for domains like # wikimedia.org, wikipedia.org, wiktionary.org etc. aliases: driver = redirect domains = +local_domains require_files = CONFDIR/aliases/$domain data = ${lookup{$local_part}lsearch*{CONFDIR/aliases/$domain}} qualify_preserve_domain allow_fail allow_defer forbid_file include_directory = CONFDIR pipe_transport = address_pipe
If the exact address is not found in the alias file, it will do another lookup for the key *, so catchalls can be made for specific domains as well.
LDAP accounts and aliases
As the office is now putting staff data in LDAP, the mail relay has been configured to use that as the primary mail account database. Two Exim routers have been added for this, one to lookup mail accounts (which are forwarded to Google Apps), and one for aliases in LDAP:
# LDAP accounts ldap_account: driver = manualroute domains = wikimedia.org condition = ${lookup ldap \ {user="cn=eximagent,ou=other,dc=corp,dc=wikimedia,dc=org" pass=LDAPPASSWORD \ ldap:///ou=people,dc=corp,dc=wikimedia,dc=org?mail?sub?(&(objectClass=inetOrgPerson)(mail=${quote_ldap:$local_part}@$domain)(x121Address=1))} \ {true}fail} local_part_suffix = +* local_part_suffix_optional transport = remote_smtp route_list = * aspmx.l.google.com
For mail addresses in domain wikimedia.org an LDAP query is done on the default LDAP servers, under the given base DN. Only the mail attribute is returned on successful lookup. The scope is set to sub (so a full subtree search is done), and the filter specifies that only inetOrgPerson objects should match, with the mail attribute matching the exact mail address being tested, and only if the x121Address field exists and is set to 1. If all these conditions are met, the mail is forwarded to Google Apps.
Mail addresses with an optional local part suffix of the form +whatever are also accepted the same as without the suffix, and are forwarded unmodified.
ldap_alias: driver = redirect domains = wikimedia.org data = ${lookup ldap \ {user="cn=eximagent,ou=other,dc=corp,dc=wikimedia,dc=org" pass=LDAPPASSWORD \ ldap:///ou=people,dc=corp,dc=wikimedia,dc=org?mail?sub?(&(objectClass=inetOrgPerson)(initials=${quote_ldap:$local_part}@$domain))} \ {$value}fail}
Aliases are lookup in LDAP as well using the initials attribute, and are rewritten to their canonical form as returned in the mail attribute.
IMAP mail
Mail destined for IMAP accounts on the IMAP server should be recognized and routed specially by the mail relay. Therefore the mail relay has a local copy of the accounts database, and uses a manualroute to route those mail addresses to the IMAP server:
imap: driver = manualroute domains = +local_domains condition = ${lookup sqlite{USERDB \ SELECT * FROM account WHERE localpart='${quote_sqlite:$local_part}' AND domain='${quote_sqlite:$domain}'}} transport = remote_smtp route_list = * sanger.wikimedia.org
RT
RT is implemented on a separate domain (rt.wikimedia.org) and a separate server (streber). From the domain name, Exim knows to forward to it:
domainlist rt_domains = rt.wikimedia.org
# Send RT mails to the RT server rt: driver = manualroute domains = +rt_domains route_list = * streber.wikimedia.org byname transport = remote_smtp
@wikimedia.org
Email is routed to Gmail by default. Routing to VRTS is defined in an alias file generated by vrts_aliases.py.
To migrate an address from VRTS to Gmail, ask a VRTS admin to mark the queue associated with the address as inactive in VRTS.
other domains
All routing including VRTS is explicitly defined in Postfix config in the private Puppet repository.
To migrate an address from VRTS to Gmail, edit Postfix config and ask a VRTS admin to mark the queue associated with the address as inactive in VRTS - this needs to be coordinated to avoid dropping mail and unnecessary alerts.
SpamAssassin
SpamAssassin is installed using the default Ubuntu spamassassin package. A couple of configuration changes were made.
By default, spamd, if enabled, runs as root. To change this:
# adduser --system --home /var/lock/spamassassin --group --disabled-password --disabled-login spamd
The following settings were modified in /etc/default/spamassassin:
# Change to one to enable spamd ENABLED=1
User preferences are disabled, spamd listens on the loopback interface only, and runs as user/group spamd:
OPTIONS="--max-children 5 --nouser-config --listen-ip=127.0.0.1 -u spamd -g spamd"
Run spamd with nice level 10:
# Set nice level of spamd NICE="--nicelevel 10"
In /etc/spamassassin/local.cf, the following settings were changed:
trusted_networks 66.230.200.0/24 145.97.39.128/26 203.212.189.192/26 211.115.107.128/26
...so SpamAssassin knows which hosts it can trust.
We also enable the SARE (SpamAssassin Rules Emporium) repository in order to catch more spam (http://saupdates.openprotect.com/)
gpg --keyserver pgp.mit.edu --recv-keys BDE9DC10 gpg --armor -o pub.gpg --export BDE9DC10 sa-update --import pub.gpg sa-update --allowplugins --gpgkey D1C035168C1EBC08464946DA258CDB3ABDE9DC10 --channel saupdates.openprotect.com
in /etc/cron.daily/spamassassin, change the following so that the rules are updated daily:
#sa-update || exit 0 sa-update --allowplugins --gpgkey D1C035168C1EBC08464946DA258CDB3ABDE9DC10 --channel saupdates.openprotect.com --channel updates.spamassassin.org || exit 0
We also want to speed up the spamassassin process as much as we can. To that end, it helps to compiles all the rules. This is taken care of in the cron.daily file, but r2ec is missing. So it needs to be installed:
apt-get install re2c sa-compile
One of the more useful feature in spam-fighting is the Bayesian filter. It allows spamassassin to detect spam regardless of its rules. However, it needs to be enabled: In /etc/spamassassin/local.cf, the following settings were changed:
use_bayes 1 bayes_auto_learn 1 bayes_path /etc/spamassassin/bayes/bayes # <- This is important. In a virtual environment, omitting this line renders the Bayesian filter useless. bayes_ignore_header X-Bogosity bayes_ignore_header X-Spam-Flag bayes_ignore_header X-Spam-Status
...so Bayes is able to learn and stores its database in a central location and benefits everyone
In order to improve on Bayes usefulness, we want it to learn from the users what is Spam and what is Ham. To that end, we need to teach the Bayesian filter. To that end, we are adding two folders in each user's INBOX (on Sanger): =NOTE=: You need to use maildirmake.dovecot to crete this directory, and chown vmail:vmail them.
.INBOX.Bayes_Ham/ .INBOX.Bayes_Spam/
We also need to 'encourage everybody to use these folders, so we add these two lines at the end of everyboy's subscriptions file (/var/vmail/wikimedia.org/USERNAME/subscriptions).
INBOX.Bayes_Ham INBOX.Bayes_Spam
Of course once people start filling these new mailboxes, we need to process them. The more interesting part is that the mailboxes are on Sanger and the Spam filtering happens on McHenry. We therefore need to move the messages in these folders over to McHenry so that sa-learn can be ran against them. Here is a small shell script meant to be ran as user vmail on Sanger and located in /usr/local/bin/GatherBayesData.sh
#!/bin/bash cd /var/vmail/wikimedia.org echo "Archiving users' Bayes mailboxes..." tar zcvf /tmp/BayesLearning_`date +%Y-%m-%d--%H`.tgz `find ./ -name *Bayes_[H,Sp]* -type d` >/dev/null 2>&1 echo "Transmitting Spam/Ham to McHenry... "; scp /tmp/BayesLearning_`date +%Y-%m-%d--%H.tgz` vmail@mchenry:. && rm /tmp/BayesLearning_`date +%Y-%m-%d--%H.tgz` >/dev/null 2>&1 for user in `ls -d */ | sed s/.$//g` ; do echo "" echo "=== WORKING WITH $user's MAILBOX ===" SpamFolder=`find ./$user/ -type d -name *Bayes_Spam* -exec basename {} \;` ; HamFolder=`find ./$user/ -type d -name *Bayes_Ham* -exec basename {} \; `; if [ -d "$user/$SpamFolder/" ] then echo "Found Bayes mailboxe(s): SPAM: $SpamFolder." echo "Purging Spam..." # Purging SPAM folder: find $user/$SpamFolder/{new,cur}/ -type f -exec rm -f {} \; else echo "No SPAM Bayes mailboxe(s) found." continue fi if [ -d "$user/$HamFolder/" ] then echo "Found Bayes mailboxe(s): HAM: $HamFolder." echo "Moving Ham messages back to their original place..." find $user/$HamFolder/cur/ -type f -exec mv {} $user/cur/ \; find $user/$HamFolder/new/ -type f -exec mv {} $user/new/ \; else echo "No HAM Bayes mailboxe(s) found." continue fi done
This script is pretty self-explanatory:
- Create an archive of everybody's HAM and SPAM folders
- Sends said archive to McHenry for further processing
- move the HAM message back in the user's INBOX (while respecting the status of the message as read / unread)
- Permanently delete the SPAM.
On the receiving end (McHenry) we also have a little bash script that passes messages to sa-learn: /var/vmail/process_bayes.sh
- !/bin/bash
cd /var/vmail [ $(ls -A *.tgz) ] || (echo "Nothing to process..." ; exit 0 ); for file in `ls *.tgz` ; do
echo "Processing $file..."; echo "Creating temp dir: ./tmp_bayes "; mkdir ./tmp_bayes; echo "Extracting archive..."; tar -C ./tmp_bayes -zxf $file; echo "Analyzing HAM / SPAM for each user/" cd tmp_bayes; for user in `ls -d */ | sed s/.$//g` ; do echo "" echo "=== WORKING ON $user's MAILBOX ===" SpamFolder=`find $user -type d -name *Bayes_Spam* -exec basename {} \;` HamFolder=`find $user -type d -name *Bayes_Ham* -exec basename {} \;` echo "Found the following Bayes Mailboxes: SPAM: $user/$SpamFolder | HAM: $user/$HamFolder." echo "Learning Ham form $user:"; sa-learn --ham $user/$HamFolder/{cur,new}/*; echo "Learning Spam from $user:"; sa-learn --spam $user/$SpamFolder/{cur,new}/*; done echo "Completed analysis from $file."; cd ..; rm -rf ./tmp_bayes; rm -f $file;
done
Once again the script is pretty self-explanatory...
- look for an archive in /var/vmail (where Sanger sends the daily archive)
- untar said file in a temp directory
- crawl every user's directory for Spam and Ham message
- send those messages to sa-learn
- clean up.
Both these scripts are in cron so that they run every night at 10:30 / 11:00PM (PST):
- Sanger: /etc/cron.d/Bayes-collector:
MAILTO=fvassard@wikimedia.org SHELL=/bin/bash 30 5 * * * vmail /usr/local/bin/GatherBayesData.sh
- McHenry: Root's crontab
MAILTO=fvassard@wikimedia.org 0 6 * * * /var/vmail/process_bayes.sh
The default X-Spam-Report headers are very long because they contain a "content preview", which is rather useless in our setup. This can be modified:
# Do not include the useless content preview clear_report_template report Spam detection software, running on the system "_HOSTNAME_", has report identified this incoming email as possible spam. If you have any report questions, see _CONTACTADDRESS_ for details. report report Content analysis details: (_SCORE_ points, _REQD_ required) report report " pts rule name description" report ---- ---------------------- -------------------------------------------------- report _SUMMARY_
In Exim, SpamAssassin is called from the DATA ACL for domains in domain list spamassassin_domains. exim4.conf:
domainlist spamassassin_domains = *
acl_smtp_data = acl_check_data
acl_check_data: # Let's trust local senders to not send out spam accept hosts = +wikimedia_nets set acl_m0 = trusted relay # Run through spamassassin accept endpass acl = spamassassin spamassassin: # Only run through SpamAssassin if requested for this domain and # the message is not too large accept condition = ${if >{$message_size}{400K}} # Add spam headers if score >= 1 warn spam = nonexistent:true condition = ${if >{$spam_score_int}{10}{1}{0}} set acl_m0 = $spam_score ($spam_bar) set acl_m1 = $spam_report # Reject spam at high scores (> 12) deny message = This message scored $spam_score spam points. spam = nonexistent/defer_ok condition = ${if >{$spam_score_int}{120}{1}{0}} accept
First, not listed in spamassassin_domains is accepted, as well as mails bigger than 400 KB. Then a Spam check is done using the local spamd daemons. If that results in a score of minimum 1, two ACL variables are for adding X-Spam-Score: and X-Spam-Report: headers later, in the system filter. If the spam score is 12 or higher, the mail is rejected outright.
System filter
All mail is run through a so called "system filter" that can do certain checks on the mail, and determine actions. A system filter is run once for a mail, and applies to all recipients.
The system filter is set using the main configuration option:
system_filter = CONFDIR/system_filter
In our setup the system filter is used to remove any untrusted spam checker headers, and to add our spam headers to the message. The file /etc/exim4/system_filter has the following content:
# Exim filter if first_delivery then if $acl_m0 is not "trusted relay" then # Remove any SpamAssassin headers and add local ones headers remove X-Spam-Score:X-Spam-Report:X-Spam-Checker-Version:X-Spam-Status:X-Spam-Level endif if $acl_m0 is not "" and $acl_m0 is not "trusted relay" then headers add "X-Spam-Score: $acl_m0" headers add "X-Spam-Report: $acl_m1" endif endif
Mailing lists
Mailing lists now live on a dedicated mailing lists server (lily) on a dedicated mail domain lists.wikimedia.org. However, mail for the old addresses such as info-en@wikipedia.org still come in and should be rewritten to the new addresses, and then forwarded to the mailing lists server.
Near the top of the Exim configuration file a domain list is defined, which contains mail domains that can contain these old addresses:
domainlist legacy_mailman_domains = wikimedia.org : wikipedia.org : mail.wikimedia.org : mail.wikipedia.org
The following router, near the end of the routers section, checks if a given local part exists in the file /etc/exim4/legacy_mailing_lists, and rewrites it to the new address if it does, to be routed via the normal DNS MX/SMTP routers/transports. Since Mailman does not distinguish between domains, only a single local parts file for all legacy Mailman domains exists. This file only needs to contain the mailing list names; all suffixes are handled by the router.
# Alias old mailing list addresses to @lists.wikimedia.org on lily legacy_mailing_lists: driver = redirect domains = +legacy_mailman_domains data = $local_part$local_part_suffix@lists.wikimedia.org local_parts = lsearch;CONFDIR/legacy_mailing_lists local_part_suffix = -bounces : -bounces+* : \ -confirm+* : -join : -leave : \ -owner : -request : -admin : \ -subscribe : -unsubscribe local_part_suffix_optional
Wiki mail
The application servers send out mail for wiki password reminders/changes, and e-mail notification on changes if enabled. These automated mass mailings are also accepted by the mail relay, mchenry, but are treated somewhat separately. To minimize the chance of external mail servers blocking mchenry's regular mail because of mass emails, these "wiki mails" are sent out using a separate IP.
Near the top of the configuration a macro is defined for the IP address to accept incoming wiki mail, and to use for sending it out to the world:
WIKI_INTERFACE=66.230.200.216
A hostlist is defined for the IP ranges that are allowed to relay from:
hostlist relay_from_hosts = <; @[] ; 66.230.200.0/24 ; 10.0.0.0/16
The rest of the configuration file uses the incoming interface address to distinguish wiki mail from regular mail. Therefore care must be taken that external hosts cannot connect using this interface address. A SMTP Connect ACL takes care of this:
# Policy control acl_smtp_connect = acl_check_connect
acl_check_connect: # Deny external connections to the internal bulk mail submission # interface deny condition = ${if match_ip{$interface_address}{WIKI_INTERFACE}{true}{false}} ! hosts = +wikimedia_nets accept
Wiki mail gets picked up by the first router, selecting on incoming interface address and a specific header inserted by MediaWiki:
# Route mail generated by MediaWiki differently wiki_mail: driver = dnslookup domains = ! +local_domains condition = ${if and{{match_ip{$interface_address}{WIKI_INTERFACE}}{eqi{$header_X-Mailer:}{MediaWiki mailer}}}} errors_to = wiki@wikimedia.org transport = bulk_smtp ignore_target_hosts = <; 0.0.0.0 ; 127.0.0.0/8 ; 0::0/0 ; 10/8 ; 172.16/12 ; 192.168/16 no_verify
The router directs to a separate SMTP transport, bulk_smtp. no_verify is set because mails from the application servers are not verified anyway, to be as liberal as possible with incoming mails and keep the queues on the application servers small. Queue handling should be done on the mail relay. For other mail, this router is not applicable so is not needed for verification either.
The envelope sender is forced to wiki@wikimedia.org, as it may have been set to something else by sSMTP.
The bulk_smtp transport sets a different outgoing interface IP address, and a separate HELO string:
# Transport for sending out automated bulk (wiki) mail bulk_smtp: driver = smtp hosts_avoid_tls = <; 0.0.0.0/0 ; 0::0/0 interface = WIKI_INTERFACE helo_data = wiki-mail.wikimedia.org
Wiki mail also has a shorter retry/bounce time than regular mail; only 8 hours:
begin retry * * senders=wiki@wikimedia.org F,1h,15m; G,8h,1h,1.5
Postmaster
For any local domain, postmaster@ should be accepted even if it's forgotten in alias files. A special redirect router takes care of this:
# Redirect postmaster@$domain if it hasn't been accepted before postmaster: driver = redirect domains = +local_domains local_parts = postmaster data = postmaster@$primary_hostname cannot_route_message = Address $local_part@$domain does not exist
Internal address rewriting
Internal servers in the .pmtpa.wmnet domain sometimes send out mail, which gets rejected by mail servers in the outside world. Sender domain address verification cannot resolve the domain .pmtpa.wmnet, and the mail gets rejected. To solve this, mchenry rewrites the Envelope From to root@wikimedia.org for any mail that has a .pmtpa.wmnet sender address:
################# # Rewrite rules # ################# begin rewrite # Rewrite the envelope From for mails from internal servers in *.pmtpa.wmnet, # as they are usually rejected by sender domain address verification. *@*.pmtpa.wmnet root@wikimedia.org F
Secondary mail relay
lily is Wikimedia's secondary mail relay. It should do the same policy checks on incoming mail as the primary mail relay, so make sure its ACLs are equivalent for the relevant domains.
Lily does not have a copy/cache of the local parts which are accepted by the primary relay, as that is a dynamic process. Instead, it uses recipient address verification callouts, i.e. it asks the primary mail relay whether a recipient address would be accepted or not. In case the primary mail relay is unreachable, or does not respond within 5-30s, the address is assumed to exist and the mail is accepted - it is, after all, a backup MX. Callouts are cached, so resources are saved for frequently appearing destination addresses.
Relay domains
Secondary mail relays will relay for any domain for which the following holds:
- The domain is listed in a static text file of domains: /etc/exim4/relay_domains, or
- The secondary mail relay is listed as a secondary MX in DNS for the domain, and
- The higher priority MXes are in a configured list of allowed primaries
The latter is to prevent abuse; we don't really want people with control over a DNS zone abusing our mail servers as backup MXes.
Near the top of the configuration file, two domain lists are defined for domains to relay for:
domainlist relay_domains = lsearch;CONFDIR/relay_domains domainlist secondary_domains = @mx_secondary/ignore=127.0.0.1
relay_domains contains domains explicitly listed in the text file /etc/exim4/relay_domains, and secondary_domains queries DNS whether the local host is listed as a secondary MX. Note: the two lists will usually overlap.
A host list is defined with accepted primary mail relays. This list should only contain IPs, and are the only IP addresses where @mx_secondary domains will be relayed to. For domains explicitly configured in relay_domains, it doesn't matter what the primary MX is.
@mx_secondary domains use a separate dnslookup router, to check the higher priority MX records:
# Relay @mx_secondary domains only to these hosts hostlist primary_mx = 66.230.200.240
# Route relay domains only if the higher prio MXes are in the allowed list secondary: driver = dnslookup domains = ! +relay_domains : +secondary_domains transport = remote_smtp ignore_target_hosts = ! +primary_mx cannot_route_message = Primary MX(s) for $domain not in the allowed list no_more
All relevant (= higher priority) MX records not in hostlist primary_mx are removed from the list for consideration by Exim. In case there are no higher priority MX records which coincide with the primary_mx list, the MX list will be empty and the router will decline. As this router is run during address verification in the SMTP session as well, the RCPT command will be rejected.
Exim's dnslookup router has a precondition check check_secondary_mx. However, the secondary_domains domainlist serves the same purpose, and using both at the same time in fact doesn't work, as by the time the check_secondary_mx check is run, Exim will already have removed the local host from the MX list (due to ignore_target_hosts), and the router will decline to run.
Note: this router should not be run for domains in domainlist relay_domains, as for those domains, the MX rules need not to be as stringent. They can be handled by the regular dnslookup router:
# Route non-local domains (including +relay_domains) via DNS MX and A records dnslookup: driver = dnslookup domains = ! +local_domains transport = remote_smtp ignore_target_hosts = <; 0.0.0.0 ; 127.0.0.0/8 ; 10/8 ; 172.16/12 ; 192.168/16 cannot_route_message = Cannot route to remote domain $domain no_more
IMAP server
The IMAP server is sanger. It only receives e-mail destined for its IMAP accounts; other mail is handled by mchenry. Outgoing mail is not sent directly, but routed via the mail relays, so the IMAP server should never build up a large mail queue itself.
Mail storage uses a single system user account vmail, which has been created with the command
# adduser --system --home /var/vmail --no-create-home --group --disabled-password --disabled-login vmail
Mail is stored under the directory /var/vmail, which should be created with the correct permissions:
# mkdir /var/vmail # chown root:vmail /var/vmail # chmod g+s /var/vmail
User Debian-exim needs to be part of the vmail group to access the mail directories:
# gpasswd -a Debian-exim vmail
TLS support
For SMTP mail submissions we require authentication over TLS/SSL. To make Exim support server-side TLS connections, a SSL certificate and private key need to be installed. In the main configuration file, set the following two configuration settings:
tls_certificate = /etc/ssl/certs/wikimedia.org.pem tls_privatekey = /etc/ssl/private/wikimedia.org.key
The private key file should have file permissions set as restricted as possible, but Exim (running as user Debian-exim) should be able to read it. Therefore Debian-exim has been added to the ssl-cert group.
To advertise TLS to all connecting hosts, use:
tls_advertise_hosts = *
To start TLS by default on the SMTPS port, set:
tls_on_connect_ports = 465
There can be a problem with draining the random entropy pool on not very busy servers. Exim in Debian/Ubuntu is linked to GnuTLS instead of OpenSSL, and uses /dev/random. When it tries to regenerate the gnutls-params Diffie Hellman paramaters file, it can get blocked for being out of random entropy and thereby delaying all mails until more random entropy is available. To avoid this, make sure that the Exim cron job can regenerate the parameters file outside Exim, using the certtool command:
# apt-get install gnutls-bin
Local mail submissions
There is a problem with mail submitted through the IMAP server with destinations that are local. All aliasing happens on the mail relay, so a mail to a mail address that exists as a local IMAP account would just be delivered locally, and never go to any of the aliases that might exist for the same mail address on the mail relay. Therefore we force all local mail submissions (recognizable by $received_protocol matching /e?smtpsa$/) to go via the mail relay(s). All routers that might handle such an address locally get an extra condition:
condition = ${if !match{$received_protocol}{\Nsmtpsa$\N}}
Because this condition is used identically on multiple routers, it's been defined as a macro NOT_LOCALLY_SUBMITTED at the top of the configuration file:
NOT_LOCALLY_SUBMITTED=${if !match{$received_protocol}{\Nsmtpsa$\N}}
Routers can thus use:
condition = NOT_LOCALLY_SUBMITTED
User filters
The second router, after system_aliases, applies only to IMAP accounts. It checks whether an IMAP account exists with the specified mail address, and whether that account has a custom user filter. A user filter can be an Exim filter, or a Sieve filter, and is meant to provide more or less the same functionality as procmail filters, i.e. sorting out mail into subfolders, rejecting based on certain criteria and the like.
User filters are loaded as text BLOBs into the account database, and can be changed using wmfmailadmin. If an account's filter field is set to NULL, the Exim setup will revert to a default filter, loaded from the file /etc/exim4/default_user_filter:
# Exim filter if $h_X-Spam-Score matches "\\N\(\\+{5,}\)\\N" then save .Junk/ endif
This filter simply sorts mails classified by SpamAssassin as spam (score 5.0 or higher), into the Junk folder.
The router has some extra filter options set to deny the usage of certain functionality in filters that might compromise system security.
# Run a custom user filter, e.g. to sort mail into subfolders # By default Exim filter CONFDIR/default_user_filter is run, # which sorts mail classified spam into the Junk folder user_filter: driver = redirect domains = +local_domains condition = NOT_LOCALLY_SUBMITTED router_home_directory = VMAIL/$domain/$local_part address_data = ${lookup sqlite{USERDB \ SELECT id, filter NOTNULL AS hasfilter \ FROM account \ WHERE localpart='${quote_sqlite:$local_part}' \ AND domain='${quote_sqlite:$domain}' \ AND active='1'}{$value}fail} data = ${if eq{${extract{hasfilter}{$address_data}}}{1}{ \ ${lookup sqlite{USERDB \ SELECT filter \ FROM account \ WHERE id='${quote_sqlite:${extract{id}{$address_data}}}'}}} \ {${readfile{CONFDIR/default_user_filter}}}} allow_filter forbid_filter_dlfunc forbid_filter_existstest forbid_filter_logwrite forbid_filter_lookup forbid_filter_perl forbid_filter_readfile forbid_filter_readsocket forbid_filter_run forbid_include forbid_pipe user = vmail group = vmail directory_transport = maildir_delivery reply_transport = reply_transport # added for autoreply support no_verify
The address_data query checks whether a matching account exists and is active. If it is, the id of the account will be stored in $address_data, along with a boolean value that represents the existence of a custom filter in the account. If the query fails because no matching account is found, the string expansion is forced to fail and the user_filter router is skipped. In the data query, the data previously looked up and stored in $address_data is used. If a custom filter exists for the account, it's looked up in an SQL query. Otherwise the default user filter file is read.
If the filter chooses to decline handling the mail, e.g. because no special action is required (it's not spam), then control is passed to the next router which will handle a normal INBOX Maildir delivery.
IMAP delivery
The next router handles delivery to local mail boxes. If a given mail address exists in the SQLite database, it's handed to the dovecot_delivery transport:
# Delivery to a Maildir mail box. local_user: driver = accept domains = +local_domains condition = NOT_LOCALLY_SUBMITTED local_part_suffix = +* local_part_suffix_optional address_data = ${lookup sqlite{USERDB \ SELECT id, quota \ FROM account \ WHERE localpart='${quote_sqlite:$local_part}' \ AND domain='${quote_sqlite:$domain}' \ AND active='1'}{$value}fail} transport = maildir_delivery transport_home_directory = VMAIL/$domain/$local_part transport_current_directory = VMAIL
The local_part_suffix options accept an optional suffix to the local part, e.g. mark+something@
This router is accompanied by the maildir_delivery appendfile transport, which delivers a message to a Maildir mail box:
# Exim appendfile transport for Maildir delivery maildir_delivery: driver = appendfile maildir_format directory = ${if def:address_file{$address_file}{$home}} create_directory create_file = belowhome delivery_date_add envelope_to_add return_path_add user = vmail group = vmail ...
The transport only delivers to Maildir directories (maildir_format), determined by the directory parameter: if $address_file is defined, because it's been set by the user_filter router, then the path in that variable is used. Otherwise it uses the (transport) home directory as set by the local_user router.
If a (sub folder or top level) Maildir directory does not exist yet, it's created by Exim given that it's in or below the specified home directory (create_directory, create_file). The headers Delivery-date, Envelope-to and Return-path are added to the message before delivery. The delivery process runs as uid/gid vmail.
The second part of the transport implements quota support:
... # Quota support quota = ${if !eq{$received_protocol}{local}{${extract{quota}{$address_data}{${value}K}{0}}}} quota_is_inclusive = false quota_warn_threshold = 100% quota_warn_message = ${expand:${readfile{CONFDIR/quota_warn_message}}} maildir_use_size_file maildir_quota_directory_regex = ^(?:cur|new|\.(?!Trash).*)$ maildir_tag = ,S=$message_size
The quota limit is stored in the SQLite database in the quota column, and is stored as kilobytes (this is enforced by the Dovecot plugins). The quota limit, if any, is stored as a keyed field in the $address_data variable by the routers earlier, and thus can be extracted by the transport. This is only done for messages that do not have protocol local. System warnings such as those generated by Exim use protocol local, and therefore get a quota limit of 0 and are allowed through regardlessly.
The quota enforced is not inclusive (quota_is_inclusive), which means that the quota limit is only enforced after it has been exceeded. This is because otherwise a confusing situation could arise where big messages can not be delivered because they would exceed the total mailbox quota size, whereas smaller messages would be let through. This behaviour is a little more consistent with what the user would expect.
Once the user fully exceeds the quota limit (quota_warn_threshold), a warning message as specified in the file /etc/exim4/quota_warn_message is sent to tell the user to clean up (quota_warn_message).
Some Maildir++ extensions are used: Exim uses a maildirsize file in the Maildir to more efficiently keep track of the total size of the mail box, rather than doing a stat() on all files (maildir_use_size_file). Also, a suffix is appended to all Maildir filenames with the size of the message, so a stat() can again be avoided by both Exim and Dovecot, a readdir() is enough. Because this is a black box mail system, this poses no security problems.
The Trash folder is exempted from quota calculation, as this may cause problems when the user actually wants to clean up the mail box.
A small problem exists with the exemption of protocol local messages: in that case the quota is set to 0, which also makes Exim write this quota to the maildirsize file, until the next non-local message is delivered. However, Dovecot doesn't read the quota from this file but also retrieves it directly from the database, so this is not likely to cause any problems.
User left
In some circumstances we want to provide an automatic reply (bounce) to mails for accounts of users that have left the organization. This is implemented using an accept router and an autoreply transport:
The router simply accepts the message if it's for a local domain, and was not submitted locally, and hands it over to the left_message transport:
# Bounce/auto-reply messages for users that have left user_left: driver = accept domains = +local_domains condition = NOT_LOCALLY_SUBMITTED require_files = CONFDIR/userleft/$domain/$local_part transport = left_message
The transport takes the message, and wraps it in a new bounce-style message, using the expanded template file /etc/exim4/userleft/$domain/$local_part.
# Autoreply bounce transport for users that have left the organization left_message: driver = autoreply file = CONFDIR/userleft/$domain/$local_part file_expand return_message from = Wikimedia Foundation <postmaster@wikimedia.org> to = $sender_address reply_to = office@wikimedia.org subject = User ${quote_local_part:$local_part}@$domain has left the organization: returning message to sender
So, for each user that leaves the organization, the corresponding account must be set to inactive (not deleted!), and a file /etc/exim4/userleft/$domain/$local_part must be created. An example template file is available in /etc/exim4/userleft/TEMPLATE.
Vacation auto-reply
In order to enable the auto-reply feature, a transporter needs to be defined:
reply_transport: driver = autoreply
Smart host
The last Exim router in the configuration file handles (outgoing) mail not destined for the local server; it sends mail for all domains to mchenry.wikimedia.org, or lists.wikimedia.org if the former is down.
# Send all mail not destined for the local machine via a set of # mail relays ("smart hosts") smart_route: driver = manualroute transport = remote_smtp route_list = * mchenry.wikimedia.org:lists.wikimedia.org
SMTP authentication
We want our IMAP account users be able to send mail through our mail servers from wherever they are, regardless the network they are on. Therefore we use SMTP authentication, supported by most modern mail clients. TLS is used and enforced to encrypt the connection, so the password cannot be sniffed on the wire.
In Exim, SMTP authentication is controlled through the authenticators in the equally named section of the configuration file. The plaintext driver can take care of both the PLAIN and the LOGIN authentication standards.
# PLAIN authenticator # Expects the password field to contain a "LDAP format" hash. Only # (unsalted) {md5}, {sha1}, {crypt} and {crypt16} are supported. plain: driver = plaintext public_name = PLAIN server_prompts = : server_condition = ${lookup sqlite{USERDB \ SELECT password \ FROM account \ WHERE localpart||'@'||domain='${quote_sqlite:$auth2}' \ AND active='1'} \ {${if crypteq{$auth3}{$value}}}{false}} server_set_id = $auth2 server_advertise_condition = ${if def:tls_cipher}
With the PLAIN mechanism, three parameters ($auth1, $auth2, $auth3) are expected from the client. The first one should be empty, the second one should contain the username, the third one the plaintext password.
server_condition is a string expansion that should return either true or false, depending on whether the username and password can be verified to match with those in the user database. It does a SQL lookup in the SQLite database. If the lookup fails, false is returned. If the lookup succeeds, the password is matched using Exim's crypteq function, which supports the crypt, crypt16, md5 and sha1 hashes. The type of hash is expected to be prepended to the hash in curly brackets, e.g. "{SHA1}" - a format which Dovecot also uses.
Unfortunately none of the salted password hash schemes could be used, as for all commonly used formats, either Exim or Dovecot didn't support it. This can be remedied in the future, either by using the Dovecot authenticator in Exim 4.64, or by adding a base64 decoder to Exim's string expansion functions.
The server_set_id is set to the given username, and is the id used by Exim to identify this authenticated connection (for example, in log lines).
server_advertise_condition controls when the SMTP AUTH feature is advertised to connecting hosts in the EHLO reply. This is only done when a TLS encrypted connection has already been established, and thus $tls_cipher will be non-empty. Exim automatically refuses AUTH commands if the AUTH feature had not been advertised.
Dovecot deliver
- Dovecot deliver is no longer used, instead Exim's own Maildir delivery transport is used because this allowed for more flexibility with quota and subfolder filtering.
The Dovecot configuration file path is /etc/dovecot/dovecot.conf. The Dovecot LDA needs to be able to read it while running under uid vmail, so the default file permissions are changed:
# chgrp vmail /etc/dovecot/dovecot.conf # chmod g+r /etc/dovecot/dovecot.conf
If deliver is given a -d username argument, it will attempt an auth DB lookup, which is unnecessary as Exim can provide it with all relevant information. Therefore this argument should not be used.
The postmaster_address option needs to be set for deliver to work:
protocol lda { # Address to use when sending rejection mails. postmaster_address = postmaster@wikimedia.org ...
deliver needs to know where, and in what format to store mail. As it only has the home directory to work with, use that:
# Deliver doesn't have username / address info but receives the home # directory from Exim in $HOME mail_location = maildir:%h
As the LDA is run under the restricted uid/gid vmail, it can't log to Dovecot's default log files without root permissions, so a separate log file is used:
... log_path = /var/log/dovecot-deliver.log info_log_path = /var/log/dovecot-deliver.log }
User database syncing
In order to know what accounts exist on the IMAP server, the primary mail relay mchenry must have a (partial) copy of the accounts database. The SQLite database on sanger is rsynced every 15 minutes to mchenry by the CRON job /etc/cron.d/rsync-userdb:
*/15 * * * * root rsync -a /var/vmaildb/ mchenry-rsync:/var/vmaildb
The relevant ssh keys are in /root/.ssh/rsync, and setup in /root/.ssh/config.
Mail box cleanup
Mail boxes are automatically moved out of the way (daily) once an account ceases to exist completely in the account database. To handle this, a small script has been written, mbcleanup.py, available in SVN in the wmfmailadmin directory. Its run daily from /etc/cron.daily/mailbox-cleanup. It takes three arguments, account db path, mailboxes root path and backup root path, respectively. From the account database it pulls a list of all existing accounts, and compares this with a set of mail boxes it finds from the two level directory structure under mailbox root path (ignoring .dot-directories and permission denieds). Superfluous mailboxes are then moved to the backup directory with a timestamp appended.
See also
- Mailing lists for the setup of the mailing lists server.
- Dovecot for a detailed setup of the IMAP server.
- OTRS for specifics on the OTRS setup.
Puppet configuration
Troubleshooting
"Exim SMTP" Alerts
This checks the availability of the smtp service (on port tcp/25) and the certificate validity on a given host.
Alert troubleshooting tips:
- Ensure that Exim is running
- Check Exim logs (in particular /var/log/exim4/paniclog) for signs of distress
- Ensure that the certificate served by Exim on port 25 has not expired, or been revoked.
"Exim queue" Alerts
This checks the number of messages that currently can not be delivered and have been queued for later delivery.
Alert troubleshooting tips:
Review the mail queue and logs looking for a common cause of deferred messages. Frequent causes for deferred messages are:
- User(s) with problems on their delivery server (full inbox, deleted account, etc.)
- A remote mail system is down (e.g. mail to example.org is down due to an outage example.org is working to fix)
- DNS blocklist. Mail from our system is being blocked/deferred due to one of our mail server IPs being listed on an RBL.
Often times the queue alerts are an early warning, or an effect of a service interruption outside our control, and will resolve by themselves in time. Still, it's important to confirm the issue is not within our own infrastructure.
External documentation
- The Exim website and wiki
- The Dovecot website and wiki