User:Thcipriani/Blubber/Concepts

From Wikitech

Background

Blubber was initially developed to meet the build stage requirements of the Streamlined Service Delivery Design project (aka Release Pipeline, aka Continuous Delivery Pipeline). Initially it was thought that a developer might provide their own Dockerfile(s) for consumption by the pipeline. However, after some brief research and experimentation it was quickly apparent that writing an efficient and maintainable Dockerfile would require an inordinate degree of knowledge around layered filesystems, caching, directory context, obscure config format and inheritance, and unpredictable instruction behavior. In order to have sufficient trust in what was being tested—and what would eventually be deployed to production—Site Reliability Engineering, Release Engineering, and Services needed to adopt a better means of ingesting developer-provided image build configurations.

Release Engineering began experimenting with Blubber as a solution in early 2017 and officially took on the project in early fiscal year 2017-18 Q1 and continues to maintain and improve the project with support from SRE and Services.

Concepts

Declarative

Blubber provides developers with a simple YAML build configuration format for declaring:

  • what system dependencies their application requires
  • what language-specific dependency manager to delegate to
  • where the application files should be installed
  • how the application needs to be tested
  • how the application should run
  • what variations of this configuration there need be for development, testing, and production (or any other) environments
version: v3
base: docker-registry.wikimedia.org/nodejs-slim
apt: { packages: [librsvg2-2] }
lives:
  in: /srv/service

variants:
  build:
    base: docker-registry.wikimedia.org/nodejs-devel
    apt: { packages: [librsvg2-dev, git, pkg-config, build-essential] }
    node: { requirements: [package.json] }
    runs: { environment: { LINK: g++ } }
  test:
    includes: [build]
    entrypoint: [npm, test]

Stateless

Blubber runs as a stateless application, needing only the YAML configuration and a variant name for which to output a valid Dockerfile. It does not depend on anything else from the project filesystem or existing state of Docker images to function. Given a consistent configuration and variant name, its output is completely deterministic.

To demonstrate this, Blubber is currently running as a microservice (Blubberoid) on Toolforge, and can output a variant's Dockerfile via something like curl. For example, the above configuration piped to the below command would yield the "test" variant's Dockerfile.

curl -s --data-binary @- http://tools.wmflabs.org/blubber/test

Cache efficient

Blubber knows the idiosyncrasies of Docker's caching system and can produce consistently ordered Dockerfile output that makes full use of it. In addition to formatting and ordering instructions properly, it also knows how to delegate to package managers (e.g. Node's NPM, Python's Pip, etc.) in a way that will be cache efficient—package managers will only be re-run when building images if their related files are changed (e.g. package.json, requirements.txt, etc.).

Security focused

There's no built-in security model when writing raw Dockerfiles because it's assumed everything within your running container will be protected. This is simply false. Not all exploits are root exploits, and applications should still adhere to a sane security model for ownership and entry points so as to limit their attack surface and protect their runtime processes.

Blubber enforces a phased build process with dropped privileges and prevents users from inadvertently installing files as root or running their applications as a user that can write to application or system files.

Developer empowering

Blubber was designed with developer empowerment in mind. With the increased degree of trust afforded by its security model, Blubber can safely provide developers with configuration for defining all application dependencies, tests, and production entry points. And with a greater degree of trust in resulting images, Release Engineering and SRE can eventually provide developers with a more automated means of deployment.