Data Engineering/TOC
< Data Engineering(Redirected from Analytics/TOC)
Jump to navigation
Jump to search
This team was previously known as the Analytics team - Therefore much of the documentation on this wiki was created under the Analytics/ namespace.
Whilst the team name has changed to Data Engineering we still concern ourselves in many ways with Analytics data and systems. Therefore it does not make sense to rename these pages in bulk. For this reason we include the tables of contents below for both the Analytics and Data_Engineering namespaces. Migration of content pages will be performed on a case by case basis.
Table of Contents - Data Engineering
Table of Contents - Analytics
- AQS/Editors by country
- AQS/Legacy Pagecounts
- AQS/Media metrics
- AQS/Mediarequests
- AQS/Mediarequests/Limitations
- AQS/Pageviews
- AQS/Pageviews/Pageviews per project
- AQS/Unique Devices
- AQS/Wikistats 2
- AQS/Wikistats 2/DataQuality/VettingPerProjectFamilies
- AQS/Wikistats 2/DataQuality/Vetting of mediarequest metrics
- AQS/Wikistats 2/Data Quality/VettingPerProject
- AQS/Wikistats 2/Metrics Definition
- Alexa
- Archive/2015 data warehouse experiments
- Archive/2015 data warehouse experiments/2014-12-02 verifications
- Archive/2015 data warehouse experiments/2015-01-14 verifications
- Archive/2015 data warehouse experiments/2015-02-03 verifications
- Archive/AQS -RESTBase
- Archive/AQS - DataStore
- Archive/CommunityBacklog
- Archive/Dashboards - Limn
- Archive/Data/Mobile requests stream
- Archive/Data/Pagecounts-all-sites
- Archive/Data/Pagecounts-raw
- Archive/Data/Webrequests sampled
- Archive/Data/Zero webrequests
- Archive/EventLogging pipeline
- Archive/Geowiki
- Archive/Global-Dev Dashboard
- Archive/Hadoop - Logstash
- Archive/Hadoop Logging - Solutions Overview
- Archive/Hadoop Logging - Solutions Recommendation
- Archive/Hadoop Streaming
- Archive/Kraken/Meetings
- Archive/Kraken/Meetings/ArchitectureReview
- Archive/Limn
- Archive/Mingle
- Archive/Pageviews/Aggregation
- Archive/Pentaho
- Archive/Products
- Archive/Webrequest partitions monitorin
- Archive/Webstatscollector
- Archive/Wikipedia Zero
- Archive/Wikistats2.0/Design
- Archive/gp.wmflabs.org
- Cluster/Hue
- Cluster/Hue/Administration
- DataRequests
- Data Lake
- Data Lake/Content
- Data Lake/Content/Mediawiki wikitext current
- Data Lake/Content/Mediawiki wikitext history
- Data Lake/Content/Wikidata entity
- Data Lake/Content/Wikidata item page link
- Data Lake/Edits
- Data Lake/Edits/Edit hourly
- Data Lake/Edits/Geoeditors
- Data Lake/Edits/Geoeditors/Public
- Data Lake/Edits/MediaWiki history
- Data Lake/Edits/Mediawiki history dumps
- Data Lake/Edits/Mediawiki history dumps/FAQ
- Data Lake/Edits/Mediawiki history dumps/Python spark examples
- Data Lake/Edits/Mediawiki history dumps/Scala spark examples
- Data Lake/Edits/Mediawiki history reduced
- Data Lake/Edits/Mediawiki page history
- Data Lake/Edits/Mediawiki project namespace map
- Data Lake/Edits/Mediawiki user history
- Data Lake/Edits/Metrics
- Data Lake/Edits/Public
- Data Lake/Edits/Structured data/Commons entity
- Data Lake/Events
- Data Lake/ORES
- Data Lake/ORES/Historified scores
- Data Lake/ORES/Recent scores
- Data Lake/Traffic
- Data Lake/Traffic/Banner activity
- Data Lake/Traffic/BotDetection
- Data Lake/Traffic/Browser general
- Data Lake/Traffic/Caching
- Data Lake/Traffic/Interlanguage
- Data Lake/Traffic/Mediacounts
- Data Lake/Traffic/Pagecounts-ez
- Data Lake/Traffic/Pageview actor
- Data Lake/Traffic/Pageview hourly
- Data Lake/Traffic/Pageview hourly/Fingerprinting Over Time
- Data Lake/Traffic/Pageview hourly/Identity reconstruction analysis
- Data Lake/Traffic/Pageview hourly/K Anonymity Threshold Analysis
- Data Lake/Traffic/Pageview hourly/Sanitization
- Data Lake/Traffic/Pageview hourly/Sanitization algorithm proposal
- Data Lake/Traffic/Pageviews
- Data Lake/Traffic/Pageviews/Bots
- Data Lake/Traffic/Pageviews/Bots Research
- Data Lake/Traffic/Pageviews/Redirects
- Data Lake/Traffic/Projectview hourly
- Data Lake/Traffic/SessionLength
- Data Lake/Traffic/Unique Devices
- Data Lake/Traffic/Unique Devices/Automated traffic correction
- Data Lake/Traffic/Unique Devices/Last access solution
- Data Lake/Traffic/Unique Devices/Last access solution/Validation
- Data Lake/Traffic/UserRetention
- Data Lake/Traffic/Virtualpageview hourly
- Data Lake/Traffic/Webrequest
- Data Lake/Traffic/Webrequest/RawIPUsage
- Data Lake/Traffic/Webrequest/Tagging
- Data Lake/Traffic/mediawiki api request
- Data Lake/Traffic/mobile apps session metrics
- Data Lake/Traffic/mobile apps uniques
- Data Lake/Traffic/referrer daily
- Data Lake/Traffic/referrer daily/Dashboard
- Data access
- Data access guidelines
- Data quality/Entrophy alarms
- Data quality/User agent entropy
- Differential
- Doc proposal
- Fundraising
- Mysql/Utility Datasets
- Pageviews
- Performance
- Projects/Data Lake/Edits History
- Projects/Data Lake/SQL Engine on Cloud/Appendix
- Projects/Public Data Lake
- Research/CitationDataset
- Systems
- Systems/AQS
- Systems/AQS/Scaling
- Systems/AQS/Scaling/2016/Hardware Refresh
- Systems/AQS/Scaling/2017/Cluster Expansion
- Systems/AQS/Scaling/2020/Cluster Expansion
- Systems/AQS/Scaling/LoadTesting
- Systems/Airflow
- Systems/Airflow/Airflow testing instance tutorial
- Systems/Airflow/Developer guide
- Systems/Airflow/Instances
- Systems/Anaconda
- Systems/Archiva
- Systems/Clients
- Systems/Cluster
- Systems/Cluster/AMD GPU
- Systems/Cluster/Beeline
- Systems/Cluster/BrowserReports
- Systems/Cluster/Coordinator
- Systems/Cluster/Data Format Experiments
- Systems/Cluster/Data deletion and sanitization
- Systems/Cluster/Deploy/Refinery
- Systems/Cluster/Deploy/Refinery-source
- Systems/Cluster/Edit data loading
- Systems/Cluster/Edit history administration
- Systems/Cluster/Edit serving layer
- Systems/Cluster/Geolocation
- Systems/Cluster/Geotagging
- Systems/Cluster/Gobblin
- Systems/Cluster/Hadoop
- Systems/Cluster/Hadoop/Administration
- Systems/Cluster/Hadoop/Alerts
- Systems/Cluster/Hadoop/Load
- Systems/Cluster/Hadoop/Test
- Systems/Cluster/Hive
- Systems/Cluster/Hive/Alerts
- Systems/Cluster/Hive/Avro
- Systems/Cluster/Hive/Compression
- Systems/Cluster/Hive/Counting uniques
- Systems/Cluster/Hive/Queries
- Systems/Cluster/Hive/Queries/Wikidata
- Systems/Cluster/Hive/QueryUsingUDF
- Systems/Cluster/Kafka/Capacity
- Systems/Cluster/Mediawiki History Snapshot Check
- Systems/Cluster/Mediawiki history reduced algorithm
- Systems/Cluster/Mysql Meta
- Systems/Cluster/Oozie
- Systems/Cluster/Oozie/Administration
- Systems/Cluster/Page and user history reconstruction
- Systems/Cluster/Page and user history reconstruction algorithm
- Systems/Cluster/Revision augmentation and denormalization
- Systems/Cluster/Spark
- Systems/Cluster/Spark/Administration
- Systems/Cluster/System Users
- Systems/Cluster/Workflow management tools study
- Systems/DB Replica
- Systems/Dashiki
- Systems/Dashiki/Configuration
- Systems/DataHub
- Systems/DataHub/Upgrading
- Systems/Dealing with data loss alarms
- Systems/Druid
- Systems/Druid/Alerts
- Systems/Druid/Load test
- Systems/EventLogging
- Systems/EventLogging/Administration
- Systems/EventLogging/Architecture
- Systems/EventLogging/Backfilling
- Systems/EventLogging/Data representations
- Systems/EventLogging/EventCapsule
- Systems/EventLogging/Monitoring
- Systems/EventLogging/NotErrorLogging
- Systems/EventLogging/Outages
- Systems/EventLogging/Performance
- Systems/EventLogging/Publishing
- Systems/EventLogging/Sanitization vs Aggregation
- Systems/EventLogging/Schema Guidelines
- Systems/EventLogging/Sensitive Fields
- Systems/EventLogging/TestingOnBetaCluster
- Systems/EventLogging/User agent sanitization
- Systems/EventStreams
- Systems/Event Data retention
- Systems/Event Data retention/AppInstallId
- Systems/Event Sanitization
- Systems/Exporting from HDFS to Swift
- Systems/Geoeditors
- Systems/Hive to Druid Ingestion Pipeline
- Systems/Jupyter
- Systems/Jupyter-SWAP
- Systems/Jupyter/Administration
- Systems/Jupyter/Tips
- Systems/Kerberos
- Systems/Kerberos/UserGuide
- Systems/Maintenance Schedule
- Systems/Managing systemd timers
- Systems/Manual maintenance
- Systems/Manual maintenance/Refined flags script
- Systems/MariaDB
- Systems/Matomo
- Systems/Presto
- Systems/Presto/Administration
- Systems/Presto/Query Logger
- Systems/Refine
- Systems/Reportupdater
- Systems/Siege
- Systems/Superset
- Systems/Superset/Date functions
- Systems/Tier2
- Systems/Turnilo
- Systems/Varnishkafka
- Systems/Wikimetrics
- Systems/Wikimetrics/Adding New Features
- Systems/Wikimetrics/Adding New Features/CentralAuth Cohorts
- Systems/Wikimetrics/Adding New Features/Tag Cohorts
- Systems/Wikimetrics/Global metrics
- Systems/Wikistats
- Systems/Wikistats/Traffic
- Systems/Wikistats2/Metrics/FAQ
- Systems/Wikistats 2
- Systems/ua-parser
- Systems/ua-parser/2019-09-18 Update
- Team/Conferences
- Team/Conferences/Apache Big Data Europe - November 2016
- Team/MailingList
- Team/Office Hours
- Team/Quarterly Reviews
- Web publication
- Wikistats/Deprecation of Wikistats 1
- Wikistats2.0/Map Component
- Wikistats 2/Smoke Testing
- analytics.wikimedia.org
Tables of contents - With Redirects
Data Engineering
Analytics
- AQS/Editors by country
- AQS/Legacy Pagecounts
- AQS/Media metrics
- AQS/Mediarequests
- AQS/Mediarequests/Limitations
- AQS/Pageviews
- AQS/Pageviews/Pageviews per project
- AQS/Unique Devices
- AQS/Wikistats 2
- AQS/Wikistats 2/DataQuality/VettingPerProjectFamilies
- AQS/Wikistats 2/DataQuality/Vetting of mediarequest metrics
- AQS/Wikistats 2/Data Quality/VettingPerProject
- AQS/Wikistats 2/Metrics Definition
- Alexa
- Archive/2015 data warehouse experiments
- Archive/2015 data warehouse experiments/2014-12-02 verifications
- Archive/2015 data warehouse experiments/2015-01-14 verifications
- Archive/2015 data warehouse experiments/2015-02-03 verifications
- Archive/AQS -RESTBase
- Archive/AQS - DataStore
- Archive/CommunityBacklog
- Archive/Dashboards - Limn
- Archive/Data/Mobile requests stream
- Archive/Data/Pagecounts-all-sites
- Archive/Data/Pagecounts-raw
- Archive/Data/Webrequests sampled
- Archive/Data/Zero webrequests
- Archive/EventLogging pipeline
- Archive/Geowiki
- Archive/Global-Dev Dashboard
- Archive/Hadoop - Logstash
- Archive/Hadoop Logging - Solutions Overview
- Archive/Hadoop Logging - Solutions Recommendation
- Archive/Hadoop Streaming
- Archive/Kraken/Meetings
- Archive/Kraken/Meetings/ArchitectureReview
- Archive/Limn
- Archive/Mingle
- Archive/Pageviews/Aggregation
- Archive/Pentaho
- Archive/Products
- Archive/Webrequest partitions monitorin
- Archive/Webstatscollector
- Archive/Wikipedia Zero
- Archive/Wikistats2.0/Design
- Archive/gp.wmflabs.org
- Cluster/Hue
- Cluster/Hue/Administration
- DataRequests
- Data Lake
- Data Lake/Content
- Data Lake/Content/Mediawiki wikitext current
- Data Lake/Content/Mediawiki wikitext history
- Data Lake/Content/Wikidata entity
- Data Lake/Content/Wikidata item page link
- Data Lake/Edits
- Data Lake/Edits/Edit hourly
- Data Lake/Edits/Geoeditors
- Data Lake/Edits/Geoeditors/Public
- Data Lake/Edits/MediaWiki history
- Data Lake/Edits/Mediawiki history dumps
- Data Lake/Edits/Mediawiki history dumps/FAQ
- Data Lake/Edits/Mediawiki history dumps/Python spark examples
- Data Lake/Edits/Mediawiki history dumps/Scala spark examples
- Data Lake/Edits/Mediawiki history reduced
- Data Lake/Edits/Mediawiki page history
- Data Lake/Edits/Mediawiki project namespace map
- Data Lake/Edits/Mediawiki user history
- Data Lake/Edits/Metrics
- Data Lake/Edits/Public
- Data Lake/Edits/Structured data/Commons entity
- Data Lake/Events
- Data Lake/ORES
- Data Lake/ORES/Historified scores
- Data Lake/ORES/Recent scores
- Data Lake/Traffic
- Data Lake/Traffic/Banner activity
- Data Lake/Traffic/BotDetection
- Data Lake/Traffic/Browser general
- Data Lake/Traffic/Caching
- Data Lake/Traffic/Interlanguage
- Data Lake/Traffic/Mediacounts
- Data Lake/Traffic/Pagecounts-ez
- Data Lake/Traffic/Pageview actor
- Data Lake/Traffic/Pageview hourly
- Data Lake/Traffic/Pageview hourly/Fingerprinting Over Time
- Data Lake/Traffic/Pageview hourly/Identity reconstruction analysis
- Data Lake/Traffic/Pageview hourly/K Anonymity Threshold Analysis
- Data Lake/Traffic/Pageview hourly/Sanitization
- Data Lake/Traffic/Pageview hourly/Sanitization algorithm proposal
- Data Lake/Traffic/Pageviews
- Data Lake/Traffic/Pageviews/Bots
- Data Lake/Traffic/Pageviews/Bots Research
- Data Lake/Traffic/Pageviews/Redirects
- Data Lake/Traffic/Projectview hourly
- Data Lake/Traffic/SessionLength
- Data Lake/Traffic/Unique Devices
- Data Lake/Traffic/Unique Devices/Automated traffic correction
- Data Lake/Traffic/Unique Devices/Last access solution
- Data Lake/Traffic/Unique Devices/Last access solution/Validation
- Data Lake/Traffic/UserRetention
- Data Lake/Traffic/Virtualpageview hourly
- Data Lake/Traffic/Webrequest
- Data Lake/Traffic/Webrequest/RawIPUsage
- Data Lake/Traffic/Webrequest/Tagging
- Data Lake/Traffic/mediawiki api request
- Data Lake/Traffic/mobile apps session metrics
- Data Lake/Traffic/mobile apps uniques
- Data Lake/Traffic/referrer daily
- Data Lake/Traffic/referrer daily/Dashboard
- Data access
- Data access guidelines
- Data quality/Entrophy alarms
- Data quality/User agent entropy
- Differential
- Doc proposal
- Fundraising
- Mysql/Utility Datasets
- Pageviews
- Performance
- Projects/Data Lake/Edits History
- Projects/Data Lake/SQL Engine on Cloud/Appendix
- Projects/Public Data Lake