MediaWiki Engineering/Guides/PHP optimisation tips

Some tips for making PHP code faster.

Benchmarking

Confirm any proposed performance measure by benchmarking it. Steps documented at Measure backend performance#Benchmarking.

Array separation

Arrays in PHP have copy-on-write semantics. When you modify an array, if the reference count is one, the modification is done in place and is fast. If the reference count is more than one, the array needs to be copied before the modification can take place. This is termed "separation" in the PHP source, since you start with one value and separate it so that there are two values.

Array separation means that innocuous-looking code can be surprisingly slow.

function getFirstElement($array) {
    return reset( $array );
}
$array = range(0, 1000000);
getFirstElement($array);

Compared to the following:

function getFirstElement($array) {
    return $array[0];
}
$array = range(0, 1000000);
getFirstElement($array);

Modification of the iteration pointer by reset() requires separation, leading to significantly more time and memory being spent. E.g. 18ms vs 8ms.

$ php maintenance/benchmarks/benchmarkEval.php \
 --setup 'function getFirstElement($array) { return reset( $array ); }' \
 --code '$array = range(0, 1000000); getFirstElement($array);'

Running PHP version 8.2.0

eval
 count: 100
 rate:  56.9/s
 mean:  17.57ms
 Peak memory usage: 52.01 MiB

$ php maintenance/benchmarks/benchmarkEval.php \
  --setup 'function getFirstElement($array) { return $array[0]; }' \
  --code '$array = range(0, 1000000); getFirstElement($array);'

Running PHP version 8.2.0

eval
  count: 100
  rate: 122.6/s
  mean: 8.16ms
  Peak memory usage: 36.00 MiB

Anything that takes an array as input and returns a modified version of the array will require O(N) time.

$a = [];
for ( $i = 0; $i < 1000; $i++ ) {
    $a = array_merge( $a, [ 1 ] );
}

The above code snippet requires n(n+1)/2 = 500500 single-element operations for a running time of 1ms. The alternative using $a[] = 1 is 50 times faster.

Some observations about built-in functions:

  • reset() and end() should be avoided. In PHP 7.3+ we will have array_key_first() and array_key_last() as alternatives.
  • array_merge() should be replaced by a loop when the result replaces the first argument, especially if the first argument is large.
  • array_pop() and array_key_last() are fast as long as there are not too many holes at the end of the array
  • array_push() is O(1), although $a[] = ... is faster unless there is a very large number of arguments.
  • array_splice() is slow despite its apparent in-place semantics. It always copies its input arguments.

Constant factors

PHP code is compiled to an array of operations (an oparray). The PHP VM traverses the oparray, executing each opcode as it finds it. Some ops are faster than others.

  • Local variable access is heavily optimised and is generally fast. The only slow thing you can do with local variables is accessing them by name, e.g. $$varname = 1 -- this builds a hashtable of local variables at the first instance of such code in a function.
  • Function calls are relatively slow due to the need to initialise a new stack frame. Userspace function calls are slightly slower than built-in function calls.
  • Some things that look like functions are actually special opcodes. This makes them faster. For example, count(), strlen(), isset() and empty() are fast. Starting with PHP 7.4, array_key_exists has been optimized to be as fast, or faster than isset. Use it as \array_key_exists, or with a function import to benefit.
  • Object construction is comparable to a function call.
  • Access to declared properties of an object is pretty fast, since this has been heavily optimised. It is faster to access a declared property than an undeclared property or an element of an associative array. But in a tight loop it might still be worthwhile to copy an object property to a local variable.

The PHP compiler is not as smart as a C compiler because it operates under strict time constraints. Do not assume the PHP compiler is going to help you out by optimising away your slow code. With a few exceptions, what you write is what it executes.

Caching and memoization

Memoization means caching the result of a function call in a way that is transparent to the caller. It is often an easy and effective way to improve performance.

The optimisation operator

Most languages have an optimisation operator. The optimisation operator makes any code faster when the operator is placed in front of the code in question.

 // $this->slowFunction();
 # $this->slowFunction();

In other words, it is faster to not do a thing than to do the thing. Engineers need to push back against expensive requirements. Product managers do not necessarily understand the cost of a requirement in terms of user observed latency or hardware cost.

Abstraction

Wirth's law states that software becomes slower as hardware becomes faster. Increasing abstractions and features keep pace with hardware improvements so that the benefits of hardware improvements are never seen by users. In fact, latency tends to increase over time.

Wirth's law is inherent in the way engineers think and operate. Engineers cannot resist the dopamine hit which comes from introducing a neat abstraction. It must be consciously and continuously fought to preserve a reasonable user experience.