Thumbor/JPEG

From Wikitech

Wikimedia JPEG thumbnailing has a number of specificities detailed here.

The engine handling JPEGs in Thumbor is our custom imagemagick engine. The reason for a custom Thumbor engine is to integrate all the special features detailed below.

Chroma subsampling

JPG thumbnails generated by Thumbor use a specific chroma subsampling value defined in the CHROMA_SUBSAMPLING Thumbor config variable, found in Deployment Charts. In production this is set so that JPEGs generated by Thumbor use 4:2:0 subsampling. It's a popular choice on the web to save file weight while keeping the effect on visual quality minimal.

qlow

Wikimedia thumbnails can be requested with a special parameter lowering the compression quality on purpose (to be served to clients with low bandwidth, typically). The compression quality used for those thumbnails is defined in the QUALITY_LOW Thumbor config variable, found in Deployment Charts.

Conditional sharpening

Historically, since Wikimedia wikis have been consistent in making sure that JPEGs are photographs and diagrams are uploaded as other filetypes, we have been able to visually optimize JPEGs for photographs. This manifests itself with a conditional sharpening logic, supported by a custom Thumbor plugin. This plugin can be applied to any file type (it really just passes the information to the engine, which has to apply it), and we apply it to JPEG originals by default, via the DEFAULT_FILTERS_JPEG Thumbor config variable, found in Deployment Charts. It defines the sharpening value to be applied, as well as the resize ratio that acts as a threshold to apply the sharpening or not.

This technique allows resized JPEGs to be more visually pleasing, with the edge details being more pronounced when JPEGs are drastically resized.

It isn't applied when the original isn't a JPEG, as those can often be text or diagrams, where the conditional sharpening would worsen quality. This is also the reason why it's usually advised to upload photos as JPEGs, otherwise a photograph uploaded as a TIFF will often have soft-looking thumbnails. This constraint might evolve in the future if we have a way for users to define what is a photograph and what isn't. This might be feasible with structured data on commons in the future.

EXIF processing

In order to make JPG thumbnails lighter, we reduce the size of the EXIF payload included in thumbnail images.

EXIF field filtering

We strip EXIF data, but in order to conserve attribution information we keep a few fields in thumbnails. The list of which is defined by the EXIF_FIELDS_TO_KEEP Thumbor config variabe, found in Deployment Charts.

ICC profile substitution

We replace sRGB ICC profiles with Facebook's TinyRGB profile, which achieves the same visual results with a much smaller payload. This mechanism is governed by the EXIF_TINYRGB_PATH and EXIF_TINYRGB_ICC_REPLACE Thumbor config variables, found in Deployment Charts.

EXIF orientation

We apply the EXIF orienation to thumbnails and strip that EXIF field. This is to avoid inconsistencies in clients that may or may not apply the EXIF rotation. While most EXIF processing in Thumbor is done with exiftool, it's overeager in its interpretation of the orientation fields, which is why we use exiv2 instead, whose conservative interpretation of the orientation field matches Mediawiki's.