User:JLinehan (WMF)/Code Style
Abstract
This is to track ideas relating to source code style and the authoring of source code at the Foundation. I think that the code style rules are delicate and introduce an unusual level of friction for contributors. In fact, they often actively harm the readability of the code.
Definitions
Delicateness
Rules are delicate when there is only one way to get it right, many ways to get it wrong, and the default behavior is wrong.
Readability
- Things are consistently indented and there is a clear separation between nesting levels
- There is whitespace to separate sections
- Lines are a reasonably small length (80 cols is ideal) for human eye scanning and also to prevent wrapping on displays
- Things are aligned where appropriate to aid comprehension and clean up visual flow
- Things are named clearly and read clearly
- Most comments are block comments
Writeability
- Lines can be edited without thinking
- Comma-separated variable declarations violate this because the final line must end with ';', and all others with ','. Thus re-ordering or deleting lines can cause a syntax error if the last line (with the ';') or the first line (with the 'var') is moved or deleted.
- Most comments are block comments. This allows you to easily add and remove lines without re-formatting. It also discourages long lines, aiding in readability.
- WYSIWYG
- Tabs break this. Tab characters sacrifice writeability (and sometimes readability) for a single feature, which is the ability to change the tabstop. This is not worth the cost. Most people will either upload source files with a mixture of tabs and spaces, or with spaces. It is simple for most editors to expand all tabs into a certain number of spaces (making all whitespace into spaces), but it is not simple -- indeed usually impossible -- to replace all (groups of n) spaces into tabs. For example, block comments require an additional whitespace character, as does alignment, ASCII diagrams, etc. There is an abortive attempt to use '//' comments because they align with tabs, but this is fixing the wrong problem. Tabs are simply not worth the effort for the dubious benefit that they provide.
- Minimal extra hooks and processing and linting
- The artificial levels of whitespace for function arguments and array indexes is tedious and unnatural for all developers.
- The correct mixture of tabs and spaces is delicate. It is simply to set your editor to expand or not expand tabs, but it is not simple to ask that it only expand tabs when they are part of the line's indentation. We don't want to expand tabs if they belong to the "body" of the code, for example ASCII art comments.
- If you edit on different machines, or change editors, or ssh into different servers, the tooling required to do this may not be there and will need to be replicated. Your editor's configuration may differ. It is extremely irritating and tedious to satisfy the code style requirements.
- If you have formatting such as "flowerboxes" (block comments that reach horizontally across the screen with '***'), different tab sizes could push them past 80 columns or whatever wrap width you have decided to defend. Any element designed for 80 columns or really n columns is dependent on its indent level also being a fixed number m. You could argue that such elements should be avoided then, but again, what is the purpose of the tab character? Just so that you can change your indent level? Who cares? That's just confusing. As flowerboxes and other things make clear, there are many elements that will break if they are not using tabs, and we know that there are elements that legitimately cannot use tabs.
Deep thoughts
camelCase was a mistake
- You constantly have to make decisions: Is it
getHTMLorgetHtml? Is itgenerateIDorgenerateId? Is itsendHTTPPOST,sendHttpPOST, orsendHttpPost?- Naturally whatever decision you make when you write the code, everybody will make the same decision when they try to call it and it will not cause any mistakes or require looking anything up at all.
- With snake_case this can happen (although there are far fewer options), but you also have the option of enforcing an "everything is lowercase" rule, which completely eliminates the problem.
- Capitalization is not sufficient for distinguishing word boundaries: Sometimes you have numbers next to capital letters. The separation disappears.
getISO8601Timestamplooks like an absolute mess.getIso8601Timestampis not much better. Same forgetUUIDv4, or is itgetUuidV4? orgetUuidv4?getUUIDV4? - Kiss any scope hinting goodbye: You are completely unable to use capitalization to hint at variable scope. In more civilized societies, it is common to designate shared variables as capitalized, e.g.
Config, or evenCONFIG. In languages with constants, all caps is generally used to label constants. None of this is possible with camelCase. All variables look the same. I blame Java and its crusade against global scope for this.
Tabs
- Just
- Gerrit's code view expands tabs to 8 spaces, which, combined with our style guidelines, guarantees that code looks ridiculous.
Stacked declarations
- Combined with the comment style requirements, you get absurdities such as
var core, debugMode,
// config contains:
// - baseUrl: corresponds to the $wgEventLoggingBaseUri configuration in PHP.
// If set to false (default), then events will not be logged.
// - schemaRevision: Object mapping schema names to revision IDs
config = require( './data.json' );
and reasonable things such as
var baseURL = 'https://eventgate-logging.wmflabs.org/v1/events';
var regexWebkit = /^\s*at (?:(.*?)\()?(.*?:\d+:\d+)\)?\s*$/i;
/* - -- --- -- ----------- --- - -
* / / / | | | | \___
* line start / group 1, | | | | \
* / function | group 2, | any line
* any # of name | url:line:col | # of end
* spaces (maybe | | spaces
* empty) | |
* | |
* literal literal
* '(' ')'
* (or nothing)
*/
become
var baseURL = 'https://eventgate-logging.wmflabs.org/v1/events',
regexWebkit = /^\s*at (?:(.*?)\()?(.*?:\d+:\d+)\)?\s*$/i,
// - -- --- -- ----------- --- - -
// / / / | | | | \___
// line start / group 1, | | | | \
// / function | group 2, | any line
// any # of name | url:line:col | # of end
// spaces (maybe | | spaces
// empty) | |
// | |
// literal literal
// '(' ')'
// (or nothing)
The reason I think this strange convention survives is because so many people define their local variables inside of an object, where things are nice and aligned and there is little to no use of 'var'. The declaration is just the name of the property, which is aligned. Except this approach is wasteful in several ways, particularly in regards to minification.
Bonkers for whitespace
for (i=0; i<n; i++) {bad,for ( i = 0; i < n; i++ ) {good. Does it bother anyone else that it's noti ++though? Also, my fingers want to typefor ( i = 0 ; i < n ; i + + ) {.- What is keeping this readability breakthrough from taking the world by storm? You know how the New Yorker Magazine's style guide insists on spelling 'cooperate' as 'coöperate'?
The CI checks are strict and sometimes insensible.
For example, it fails code on the basis of object fields not being in camel case, however the object could be formatted that way because it is matching the API of another system to which it will be provided. That system may require snake case.
JSON.stringify( {
"user_agent": navigator.userAgent,
} );
Code must be hand-mangled
In order to satisfy JavaScript performance strategies that are nearly a decade out of date. For example, variables must be hoisted to the top of the function declaration, even above e.g. a branch that would return 90% of the calls of the function is the first thing.
function foo()
{
if (hardly_ever_happens === true) {
var a = '';
var b = [];
/* ... */
}
}
is invalid, we must do
function foo()
{
var a = '',
b = [];
if (hardly_ever_happens === true) {
/* ... */
}
}
Inevitably, the top of every function becomes a mange of jumbled declarations that make far more sense in the locality where they are used. For example:
var lines,
parts,
random,
i,
date = new Date(),
stack = '',
message = '';
Right when your eyes want to start learning about the function, they must first deal with this garbage heap. What are the benefits of this approach? What is possibly gained by this?
Hand-mangling code in this way, particularly the infamous stacked-declaration nonsense in Javascript, is ugly (there is a lack of symmetry because only one of the declarations is preceded by 'var'), irritating (deleting lines or re-shuffling is error-prone because you must always keep the last line with a ';', and the rest with ','), and sometimes even ambiguous (some styles indent the variables after the first 'var'. Do you put each on its own line, or one line? If they can fit, that looks cleaner, so maybe do that. It encourages inconsistency).
Mangling of this kind should be done by the code minifier. The source code should be written to maximize human-readability and human-maintainability, and then get transformed into a form that maximizes performance (size and execution speed).
To Do
Warning signs
- App teams are using Github, things that are not written in JS often bend the rules. The value of such a strict style guide for JS is reduced in the presence of such fragmentation. If the CI and Gerrit process were simpler, easier, and more inviting for JS, there would perhaps be less of a temptation to do this.
Implement a better JS minifier in ResourceLoader
The current minifier apparently only removes whitespace, essentially. Tools like UglifyJS3 are sophisticated and research indicates[1] that they could save us 20% in compressed file size over the current solution. That is reason enough to implement something. However it is also the case that the introduction of an intelligent minification process that would handle function and variable hoisting, reordering, renaming, re-writing of predicates, elimination of 'return' statements, etc. This means that we can write code to be good for humans, and then let the minifier fix the performance. It provides the best of both worlds.
Revisit the coding style guidelines after the introduction of a minifier
Use this as an opportunity to open a discussion about how we might update our code style guidelines.