Client errors

From Wikitech

The mw-client-error dashboard in Logstash gives an overview of JavaScript errors occurring in the MediaWiki software across deployed wikis. Error logging is currently enabled for all Wikipedia sites, Wikidata, MediaWiki and the beta cluster and test instances. New wikis can be added by making a Phabricator request.

Alerts

An alert will fire when traffic to the client error logger is higher than normal levels. This can be configured on grafana by those with the means to do so. Error spikes correspond to those displayed on the logstash dashboard. Upon seeing this alert, please investigate errors within the last hour to identify the causes.

When the alert fires it could be for one of the following reasons:

  •  the train just rolled out
  • a recent backport introduced an error
  • a fundraising campaign went live and contains errors
  • a spider or bot is running at high frequency with errors
  • an edit on a wiki to MediaWiki:Common.js, MediaWiki:Vector.js, MediaWiki:Mobile.js or MediaWiki:Minerva.js.
  • there is an error inside a default gadget that is described inside the wiki's MediaWiki:Gadgets-definition

In the event of a backport or train, consider using this as an indicator to block the train, unless someone says otherwise.

Debugging errors

Finding the source of the error using file url

When debugging client errors the most useful thing to check first will be the file_url field. When looking at it we look to identify 2 things 1) the likely source of the problem and 2) the skin

This will give you the biggest hint to whether the code comes from gadget, deployed Wikimedia code or user script. This however will sometimes be undefined. If the file url is `load.php` this is usually an indication of a problem in Wikimedia deployed code, but it has 'ext.gadget' in the URL then it's a gadget. However note, the URL is not necessarily the definitive source of the problem. The file url may also tell you which skin is being used e.g. monobook/timeless which is likely important information for debugging.

If the file_url contains "> injectedScript" it is likely a user script e.g. common.js injecting a script tag via importScript. These are going to be harder to debug.

Finding the source of the error using stack trace

The next place to look is the stack_trace field. The trace shortener tool may help you read it more easily.

When using the stack trace, aim to find the exact location of the bug causing code.

When scanning the stack trace field look for URLs that identify the code. Note sometimes an error may be associated with a Wikimedia file url but the stack trace may indicate that an unsupported external URL is present, so look out for foreign domains e.g. github/googleapis here. In many stack traces, you must rely soley on the function. Note, looking at the stack trace may also contradict the file_url.

the following error despite having a load.php file url indicates the presence of a gadget as being the root cause.
stack trace at Object.history  eval at <anonymous> (URL1:8658:24

at Object.twinklefluff [as fluff]  eval at <anonymous> (URL1:8485:33 at HTMLDocument.TwinkleGlobal.load  eval at <anonymous> (URL1:6466:16 at mightThrow  URL2:49:149 at process  URL2:49:808 tags    input-kafka-clienterror-eqiad, kafka, es, es, throttle-exempt, normalized_message_untrimmed type URL1: https://meta.wikimedia.org/w/index.php?title=User:TwinkleWikitechUser/TwinkleGlobal.js&action=raw&ctype=text/javascript:97:6), <anonymous>

URL2: https://www.mediawiki.org/w/load.php?lang=ro&modules=jquery%2Coojs-ui-core%2Coojs-ui-widgets&skin=vector&version=1rdqu

file url https://www.mediawiki.org/w/load.php?lang=ro&modules=jquery%2Coojs-ui-core%2Coojs-ui-widgets&skin=vector&version=1rdqu

In the case where you are still unable to locate the source of the code, the next step is to use codesearch (for wikimedia deployed code) and the global search tool (for gadgets). Using the stack trace pick the function that's most likely to be unique and closest to the top of the stack trace.

Consider the following stack trace:

at VeDmTreeModifier.ve.dm.TreeModifier.checkCanInsertNodeType  URL1:226:425
at VeDmTreeModifier.ve.dm.TreeModifier.pushMoveNodeOp  URL1:220:751
at VeDmTreeModifier.ve.dm.TreeModifier.processRetain  URL1:216:931
at VeDmTreeModifier.ve.dm.TreeModifier.processLinearOperation  URL1:214:115
at VeDmTreeModifier.ve.dm.TreeModifier.calculateTreeOperations  URL1:213:837
at VeDmTreeModifier.ve.dm.TreeModifier.process  URL1:213:158
at VeDmTransactionProcessor.ve.dm.TransactionProcessor.process  URL1:158:950
at VeDmDocument.ve.dm.Document.commit  URL1:281:942
at VeDmSurface.ve.dm.Surface.changeInternal  URL1:242:744
at VeDmSurface.ve.dm.Surface.change  URL1:242:256

URL1: https://ru.wikipedia.org/w/load.php?lang=ru&modules=ext.visualEditor.articleTarget%2Cbase%2Ccore%2CdesktopArticleTarget%2CdesktopTarget%2Cdiffing%2Cicons%2Clanguage%2Cmediawiki%2Cmwalienextension%2Cmwcore%2Cmwextensions%2Cmwformatting%2Cmwgallery%2Cmwimage%2Cmwlanguage%2Cmwlink%2Cmwmeta%2Cmwsave%2Cmwsignature%2Cmwtransclusion%2Csanitize%2Cswitching%2Cwelcome%7Cext.visualEditor.core.desktop%2Cutils%7Cext.visualEditor.mwextensions.desktop%7Cext.visualEditor.mwimage.core&skin=vector&version=c0osm

Here, the method name checkCanInsertNodeType is pretty unique. It's not likely to be repeated in many places. In fact when we search for it in codesearch, we quickly yield a single result, which allows us to deduce that this is a bug inside VisualEditor code:


Beware false positives. Some gadgets plugin to Wikimedia deployed code, so it's always possible that a gadget is triggering an error inside Wikimedia code, by using an unexpected code path. When using global search, make sure to tick use regular expressions and to scope to the user/mediawiki namespaces in the event of large amounts of results.

Debugging generic stack traces

Some stack traces will be awfully generic, and even worse, may not require a definitive file url. In the following example, without a file url there's very little we can tell about the stack trace. Even though the URL present in the stack trace points to a load.php function there's very little we can learn from it, except that the user is in the Vector skin and using jQuery.

at Uri.toString  <anonymous>:213:750
at attr  URL1:106:510
at access  URL1:53:378
at access  URL1:53:109
at jQuery.fn.init.attr  URL1:105:826
at HTMLDocument.<anonymous>  <anonymous>:206:826
at mightThrow  URL1:49:149
at process  URL1:49:808

URL1: https://en.wikipedia.org/w/load.php?lang=en&modules=jquery&skin=vector&version=tqh7e

Using user agent

Some browsers are not worth debugging. Always consider the user agent when debugging and use it to determine whether the error is worth your time.

Using message

With these errors, you will need to explore various things. The message itself may yield certain clues. While an error like TypeError: Cannot read property 'insertBefore' of null may tell us little, since insertBefore is a commonly used method inside JavaScript code, the error TypeError: this.getAuthority is not a function at least points us to a function called getAuthority which when searching on codesearch could indicate a problem in TimedMediaHandler or MediaWiki's ForeignApi.

Using URL and user identifier (session id/IP)

The URL field may be useful.

If the error always occurs on the same special page, that may indicate a problem with that special page.

If the error is always occurring from the same user identifier that might indicate a problem with a user script. Consider URLs which require visible on wiki actions that might identify said user. For example if the URL contains action=edit or action=submit, did the page recently get edited and does that user have user scripts which may contain the bug? URLs that may reveal information include edit URLs, action=rollback, discussion pages, user pages, the draft namespace and certain special pages e.g. Special:Block.

When identifying such errors, please consider fixing the error. These kind of errors as you've demonstrated are privacy concerns.

Fixing errors

When an error is encountered you will likely want to mute the alert while it is addressed to avoid spamming your work colleagues with emails. To do this click "view source" at the end of the email and sign in (I find it helps to manually alter the URL to use grafana-rw rather than grafana). Click silence and set a duration.

Suggested durations and actions:

  • If caused by Gerrit-based code, aim for a backport on the same day and mute for 12hr.
  • If input from another team is needed, file a ticket as a deploy blocker, make sure it is acknowledged by the code owner and mute for 24hrs.
  • If a gadget is involved:
    • See https://www.mediawiki.org/wiki/Stable_interface_policy/Frontend#How_providers_can_make_breaking_changes_that_impact_widely_used_but_unmaintained_gadgets
    • For dependency errors update MediaWiki:Gadget-definitions
    • Where the bug is in code, if possible and you have a staff account:
      • fix the issue by reverting the change that introduced the error, with long edit summary detailing the error string, the function it occurs in.
      • If possible to fix the issue or defend against it, update the code with long edit summary detailing the error string, the function it occurs in.
      • Consider opting the gadget out of errors.
    • If not possible to edit the gadget yourself mute for 24hrs and write a comment on the talk page. Make sure to ping (using @ symbol) the last three editors of the gadget.Opting out of errors

Users can opt out of error logging by adding the following code to their user JS.

mw.loader.using('mediawiki.storage').then(function () {
  mw.storage.session.set( 'client-error-opt-out', '1' );
});

Infrastructure

Errors are reported via mediawiki.errorLogger and clientError.js in WikimediaEvents, and collected via EventGate (see T217142).