Jump to content

Logs

From Wikitech
This page is about server log files. For IRC channel logs, see e.g. https://wm-bot.wmcloud.org/

Logs of several sorts are generated across the cluster and collected in a single location replicated on some machines. Privileged users can explore most logs through the OpenSearch Dashboards front-end at https://logstash.wikimedia.org/.

The SRE Observability team is working on a common log format called ECS, see the linked doc and intro slides. ECS documentation can be found at https://doc.wikimedia.org/ecs/

For a quick reference of debugging techniques, see Logs/Runbook.

mwlog1002:/srv/mw-log/

These record wfDebugLog() and similar calls in MediaWiki (see especially mw:Structured logging). All cluster-wide logs are aggregated here (configured through $wmgUdp2logDest, see also wmgMonologChannels). There are dozens log files, which amount to around 15 GB compressed per day as of April 2015. Some are not sent to logstash (settings) and some are sampled; log archives are stored for a variable amount of time, up to 90 days (per data retention guideline). Note that logstash also records the context data for structured logging, so it might contain significantly more information than the files.

Source: All appserver clusters.

Directories:

  • archive/: Directory holding a limited number of previous days of the same logs (compressed once a day).

General channels:

  • exception.log: Fatal exceptions that receive either a localised "Internal error" page, or a Wikimedia Error page rendered by php-wmerrors.
    • Error pages report a request ID, e.g. [d84af39036] 2011-04-01: Fatal exception of type MWException".
    • To find the complete stack trace, search for d84af39036 in exception.log, or search for reqId:"d84af39036" in Logstash on the "mediawiki" dashboard.
  • apache2.log: aggregated Apache error logs, see #syslog
  • api.log: API requests and their parameters (including redacted POST payloads, and temporary PII). This used to be sampled, but is no longer (during 2014-2015) and is flushed every 30 days as of Nov 2015.

Specific components:

  • antispoof.log: Collision check passes and failures from the AntiSpoof extension. This checks for strings that look the same using different Unicode characters (such as spoofed usernames).
  • badpass.log: Failed login attempts to wikis.
  • captcha.log: Captcha attempts (both failed and successful attempts).
  • centralauth.log (2013-05-09–), centralauth-bug39996.log, centralauthrename.log (2014-07-14–): (temporary) debug logs for bugzilla:35707, bugzilla:39996, bugzilla:67875. In theory, rare events; can include username and page visited/request made.
  • CirrusSearch.log: Logs various info concerning cirrus (update/query failures and various debug info), Cirrus now uses the analytics platform to log search requests (Analytics/Data/Cirrus).
  • CirrusSearchSlowRequests.log: Logs slow requests
  • CirrusSearchChangeFailed.log: Logs update failures
  • external.log: ExternalStore blob fetch failures (see External storage)
  • imagemove.log: Page renames in the File namespace that take place (both failed and successful renames).
  • memcached.log: Memcached for MediaWiki (WANObjectCache, misc ephemeral data, rate limiting counters, advisory locks).
  • poolcounter.log: PoolCounter failures (connection problems, excess queue size, wait timeouts).
  • redis.log: Redis query and connection failures (might involve sessions, job queues, and some other assorted features).
  • resourceloader.log: Exceptions related to ResourceLoader.
  • JobExecutor.log: Tracks job queue activity and including errors (both failed and successful runs).
    • Can be used to produce stats on jobs run on the various wikis, e.g. with Tim's perl ~/job-stats.pl runJobs.log.
  • swift-backend.log: Errors in the SwiftFileBackend class (timeouts and HTTP 500 type errors for file and listing reads/writes).
  • slow-parse.log (since May 2012; 6 months archive)
  • spam.log: SimpleAntiSpam honeypot hits from bots (attempted user actions are discarded).
  • XWikimediaDebug.log: see X-Wikimedia-Debug#Debug logging.

The syslog for all application servers can be found on apache2.log on mwlog1001 or /srv/syslog/apache.log on centrallog1001. This includes things like segmentation faults.

5xx errors

5xx errors are available on centrallog1001.eqiad.wmnet:/srv/weblog/webrequest/5xx.json. And in logstash, with Varnish 5xx Logstash dashboard

mwmaint Maintenance scripts

See Maintenance server#Access recent runs.

deploy1002:/var/log/l10updatelog/l10update.log

Source: scap

  • l10update.log: Error log for LocalisationUpdate runs.

vanadium:/var/log/eventlogging/

  • various: Logs of EventLogging entries. Potentially useful, in case their transformation into SQL records fails.

Request logs

Logs of any kind of request, e.g. viewing a wiki page, editing, using the API, loading an image.

  • Analytics/Data/Webrequest: "wmf.webrequest" is a name of one unsampled requests archive in Hive. We started deleting older wmf.webrequest data in March 2015. We currently keep 62 days.

centrallog1002:/srv/weblog/webrequest

The cache (outer layer) request logs; see Squid logging#Log files.

The 1:1000 sampled logs are used for about 15 monthly and quarterly reports and day to day operations (source).

Beta cluster

The mw:Beta cluster has a similar logging configuration to production. Various server logs are written to the remote syslog server deployment-mwlog02.deployment-prep.eqiad1.wikimedia.cloud in /srv/mw-log.

Apache access logs are written to /var/log/apache2/other_vhosts_access.log on each beta cluster host.

See mw:Beta_Cluster#Testing_changes_on_Beta_Cluster for information on how to access the beta logstash web UI.

Mailservers

exim logs are retained for 90 days (see phabricator:T167333).

Dead

Lucene (search)

Each host logs at /a/search/log/log (now less noisy), see Search#Trouble on how to identify which host serves what pool etc.

fenari:/home/wikipedia/syslog

Source: All apaches

  • apache.log: Error log of all apaches (includes sterr of PHP, so PHP Notices, PHP Warnings etc.)
    • Use fatalmonitor to aggregate this into a (tailing) report
    • This has been deprecated in favor of fluorine:/a/mw-log/apache2.log and logstash.

fenari:/var/log/

Source: Machine-specific logs