Wikimedia JPEG thumbnailing has a number of specificities detailed here.
The engine handling JPEGs in Thumbor is our custom imagemagick engine. The reason for a custom Thumbor engine is to integrate all the special features detailed below.
JPG thumbnails generated by Thumbor use a specific chroma subsampling value defined in the
CHROMA_SUBSAMPLING Thumbor config variable, found in Puppet. In production this is set so that JPEGs generated by Thumbor use 4:2:0 subsampling. It's a popular choice on the web to save file weight while keeping the effect on visual quality minimal.
Wikimedia thumbnails can be requested with a special parameter lowering the compression quality on purpose (to be served to clients with low bandwidth, typically). The compression quality used for those thumbnails is defined in the
QUALITY_LOW Thumbor config variable, found in Puppet.
Historically, since Wikimedia wikis have been consistent in making sure that JPEGs are photographs and diagrams are uploaded as other filetypes, we have been able to visually optimize JPEGs for photographs. This manifests itself with a conditional sharpening logic, supported by a custom Thumbor plugin. This plugin can be applied to any file type (it really just passes the information to the engine, which has to apply it), and we apply it to JPEG originals by default, via the DEFAULT_FILTERS_JPEG Thumbor config variable, found in Puppet. It defines the sharpening value to be applied, as well as the resize ratio that acts as a threshold to apply the sharpening or not.
This technique allows resized JPEGs to be more visually pleasing, with the edge details being more pronounced when JPEGs are drastically resized.
It isn't applied when the original isn't a JPEG, as those can often be text or diagrams, where the conditional sharpening would worsen quality. This is also the reason why it's usually advised to upload photos as JPEGs, otherwise a photograph uploaded as a TIFF will often have soft-looking thumbnails. This constraint might evolve in the future if we have a way for users to define what is a photograph and what isn't. This might be feasible with structured data on commons in the future.
In order to make JPG thumbnails lighter, we reduce the size of the EXIF payload included in thumbnail images.
EXIF field filtering
We strip EXIF data, but in order to conserve attribution information we keep a few fields in thumbnails. The list of which is defined by the
EXIF_FIELDS_TO_KEEP Thumbor config variabe, found in Puppet.
ICC profile substitution
We replace sRGB ICC profiles with Facebook's TinyRGB profile, which achieves the same visual results with a much smaller payload. This mechanism is governed by the
EXIF_TINYRGB_ICC_REPLACE Thumbor config variables, found in Puppet.
We apply the EXIF orienation to thumbnails and strip that EXIF field. This is to avoid inconsistencies in clients that may or may not apply the EXIF rotation. While most EXIF processing in Thumbor is done with exiftool, it's overeager in its interpretation of the orientation fields, which is why we use exiv2 instead, whose conservative interpretation of the orientation field matches Mediawiki's.