Category:Data platform
Appearance
Technical documentation for users of the Data Platform, which is maintained primarily by Data Platform Engineering. Documentation that is mostly for administrators and maintainers of the infrastructure, pipelines, components, etc. is also categorized under Category:Data_platform_systems.
Pages in category "Data platform"
The following 200 pages are in this category, out of 228 total.
(previous page) (next page)D
- Data Platform
- Data Platform Engineering/Ops week
- Data Platform/Analyze data
- Data Platform/AQS
- Data Platform/AQS/Media metrics
- Data Platform/AQS/Mediarequests/Limitations
- Data Platform/AQS/Pageviews/Pageviews per project
- Data Platform/AQS/Wikistats 2/Data Quality/VettingPerProject
- Data Platform/AQS/Wikistats 2/DataQuality/Vetting of mediarequest metrics
- Data Platform/AQS/Wikistats 2/DataQuality/VettingPerProjectFamilies
- Data Platform/Dashboard tutorial
- Data Platform/Data access
- Data Platform/Data access guidelines
- Data Platform/Data Incident management
- Data Platform/Data Lake
- Data Platform/Data Lake/Content
- Data Platform/Data Lake/Content/Mediawiki wikitext current
- Data Platform/Data Lake/Content/Mediawiki wikitext history
- Data Platform/Data Lake/Content/Wikidata entity
- Data Platform/Data Lake/Content/Wikidata item page link
- Data Platform/Data Lake/Data Issues/2021-02-09 Unique Devices By Family Overcount
- Data Platform/Data Lake/Data Issues/2021-06-04 Traffic Data Loss
- Data Platform/Data Lake/Data Issues/2023-01-08 Webrequest Data Loss
- Data Platform/Data Lake/Data Issues/2023-11 eventgate-analytics-external Data Loss
- Data Platform/Data Lake/Edits
- Data Platform/Data Lake/Edits/Edit hourly
- Data Platform/Data Lake/Edits/Geoeditors
- Data Platform/Data Lake/Edits/Geoeditors/Public
- Data Platform/Data Lake/Edits/MediaWiki history
- Data Platform/Data Lake/Edits/MediaWiki history dumps
- Data Platform/Data Lake/Edits/MediaWiki history dumps/FAQ
- Data Platform/Data Lake/Edits/Mediawiki history dumps/Python Dask examples
- Data Platform/Data Lake/Edits/Mediawiki history dumps/Python Pandas examples
- Data Platform/Data Lake/Edits/MediaWiki history dumps/Python spark examples
- Data Platform/Data Lake/Edits/MediaWiki history dumps/Scala spark examples
- Data Platform/Data Lake/Edits/Mediawiki history reduced
- Data Platform/Data Lake/Edits/MediaWiki history/Revision identity reverts
- Data Platform/Data Lake/Edits/Mediawiki page history
- Data Platform/Data Lake/Edits/Mediawiki project namespace map
- Data Platform/Data Lake/Edits/Mediawiki user history
- Data Platform/Data Lake/Edits/Metrics
- Data Platform/Data Lake/Edits/Public
- Data Platform/Data Lake/Edits/Structured data/Commons entity
- Data Platform/Data Lake/Events
- Data Platform/Data Lake/Project History
- Data Platform/Data Lake/Public Data Lake
- Data Platform/Data Lake/Traffic
- Data Platform/Data Lake/Traffic/Banner activity
- Data Platform/Data Lake/Traffic/BotDetection
- Data Platform/Data Lake/Traffic/Browser general
- Data Platform/Data Lake/Traffic/Caching
- Data Platform/Data Lake/Traffic/Interlanguage
- Data Platform/Data Lake/Traffic/Mediacounts
- Data Platform/Data Lake/Traffic/mediawiki api request
- Data Platform/Data Lake/Traffic/mobile apps session metrics
- Data Platform/Data Lake/Traffic/mobile apps uniques
- Data Platform/Data Lake/Traffic/Pagecounts-ez
- Data Platform/Data Lake/Traffic/Pageview actor
- Data Platform/Data Lake/Traffic/Pageview hourly
- Data Platform/Data Lake/Traffic/Pageview hourly/Fingerprinting Over Time
- Data Platform/Data Lake/Traffic/Pageview hourly/Identity reconstruction analysis
- Data Platform/Data Lake/Traffic/Pageview hourly/K Anonymity Threshold Analysis
- Data Platform/Data Lake/Traffic/Pageview hourly/Sanitization
- Data Platform/Data Lake/Traffic/Pageview hourly/Sanitization algorithm proposal
- Data Platform/Data Lake/Traffic/Pageviews
- Data Platform/Data Lake/Traffic/Pageviews/Bots
- Data Platform/Data Lake/Traffic/Pageviews/Bots Research
- Data Platform/Data Lake/Traffic/Pageviews/Redirects
- Data Platform/Data Lake/Traffic/Projectview hourly
- Data Platform/Data Lake/Traffic/ReaderCounts
- Data Platform/Data Lake/Traffic/referrer daily
- Data Platform/Data Lake/Traffic/referrer daily/Dashboard
- Data Platform/Data Lake/Traffic/SessionLength
- Data Platform/Data Lake/Traffic/Unique Devices
- Data Platform/Data Lake/Traffic/Unique Devices/Automated traffic correction
- Data Platform/Data Lake/Traffic/Unique Devices/Last access solution
- Data Platform/Data Lake/Traffic/Unique Devices/Last access solution/Validation
- Data Platform/Data Lake/Traffic/UserRetention
- Data Platform/Data Lake/Traffic/Virtualpageview hourly
- Data Platform/Data Lake/Traffic/Webrequest
- Data Platform/Data Lake/Traffic/Webrequest/RawIPUsage
- Data Platform/Data Lake/Traffic/Webrequest/Tagging
- Data Platform/Data lifecycle management
- Data Platform/Data modeling guidelines
- Data Platform/Data quality/Entrophy alarms
- Data Platform/Data quality/User agent entropy
- Data Platform/Dataset archiving and deletion
- Data Platform/Dataset creation
- Data Platform/Discover data
- Data Platform/Evaluations
- Data Platform/Evaluations/2021 data catalog selection
- Data Platform/Evaluations/2021 data catalog selection/Rubric
- Data Platform/Evaluations/2021 data catalog selection/Rubric/Amundsen
- Data Platform/Evaluations/2021 data catalog selection/Rubric/Atlas
- Data Platform/Evaluations/2021 data catalog selection/Rubric/DataHub
- Data Platform/Evaluations/2021 data catalog selection/Rubric/OpenMetadata
- Data Platform/Evaluations/Data Format Experiments
- Data Platform/Evaluations/Dumps
- Data Platform/Evaluations/Event Platform/EventStreams
- Data Platform/Evaluations/Event Platform/Stream Processing/Framework Evaluation
- Data Platform/Evaluations/SQL Engine on Cloud
- Data Platform/Evaluations/Workflow management tools study
- Data Platform/Event Sanitization
- Data Platform/Fundraising
- Data Platform/Geoeditors
- Data Platform/Internal API requests
- Data Platform/Mysql/Utility Datasets
- Data Platform/Sessions
- Data Platform/Systems
- Data Platform/Systems/Airflow
- Data Platform/Systems/Airflow/Developer guide
- Data Platform/Systems/Airflow/Developer guide/Python Job Repos
- Data Platform/Systems/Airflow/Instances
- Data Platform/Systems/Airflow/Upgrading
- Data Platform/Systems/Analytics Meta
- Data Platform/Systems/analytics.wikimedia.org
- Data Platform/Systems/AQS
- Data Platform/Systems/AQS/Scaling
- Data Platform/Systems/AQS/Scaling/2016/Hardware Refresh
- Data Platform/Systems/AQS/Scaling/2017/Cluster Expansion
- Data Platform/Systems/AQS/Scaling/2020/Cluster Expansion
- Data Platform/Systems/AQS/Scaling/LoadTesting
- Data Platform/Systems/Archiva
- Data Platform/Systems/Bigtop Packages
- Data Platform/Systems/Ceph
- Data Platform/Systems/Clients
- Data Platform/Systems/Cluster
- Data Platform/Systems/Cluster/Geotagging
- Data Platform/Systems/Cluster/Hadoop/Load
- Data Platform/Systems/Cluster/Spark History
- Data Platform/Systems/Conda
- Data Platform/Systems/Coordinator
- Data Platform/Systems/Dashiki
- Data Platform/Systems/Dashiki/Configuration
- Data Platform/Systems/Data deletion and sanitization
- Data Platform/Systems/Data Quality
- Data Platform/Systems/DataHub
- Data Platform/Systems/DataHub/Administration
- Data Platform/Systems/DataHub/Data Catalog Documentation Guide
- Data Platform/Systems/DataHub/Upgrading
- Data Platform/Systems/DB Replica
- Data Platform/Systems/Dealing with data loss alarms
- Data Platform/Systems/Druid
- Data Platform/Systems/Druid/Alerts
- Data Platform/Systems/Druid/Load test
- Data Platform/Systems/Edit data loading
- Data Platform/Systems/Edit history administration
- Data Platform/Systems/Edit serving layer
- Data Platform/Systems/Event Data retention
- Data Platform/Systems/Event Data retention/AppInstallId
- Data Platform/Systems/EventLogging
- Data Platform/Systems/EventLogging/Administration
- Data Platform/Systems/EventLogging/Architecture
- Data Platform/Systems/EventLogging/Backfilling
- Data Platform/Systems/EventLogging/Data representations
- Data Platform/Systems/EventLogging/EventCapsule
- Data Platform/Systems/EventLogging/Monitoring
- Data Platform/Systems/EventLogging/NotErrorLogging
- Data Platform/Systems/EventLogging/Outages
- Data Platform/Systems/EventLogging/Performance
- Data Platform/Systems/EventLogging/Publishing
- Data Platform/Systems/EventLogging/Sanitization vs Aggregation
- Data Platform/Systems/EventLogging/Schema Guidelines
- Data Platform/Systems/EventLogging/Sensitive Fields
- Data Platform/Systems/EventLogging/TestingOnBetaCluster
- Data Platform/Systems/EventLogging/User agent sanitization
- Data Platform/Systems/Exporting from HDFS to Swift
- Data Platform/Systems/Geolocation
- Data Platform/Systems/Gobblin
- Data Platform/Systems/Hadoop
- Data Platform/Systems/Hadoop Event Ingestion Lifecycle
- Data Platform/Systems/Hadoop/Administration
- Data Platform/Systems/Hadoop/Alerts
- Data Platform/Systems/Hadoop/Test
- Data Platform/Systems/Hive
- Data Platform/Systems/Hive to Druid Ingestion Pipeline
- Data Platform/Systems/Hive/Alerts
- Data Platform/Systems/Hive/Avro
- Data Platform/Systems/Hive/Compression
- Data Platform/Systems/Hive/Counting uniques
- Data Platform/Systems/Hive/Queries
- Data Platform/Systems/Hive/Queries/Wikidata
- Data Platform/Systems/Hive/Querying using UDFs
- Data Platform/Systems/Iceberg
- Data Platform/Systems/Iceberg/Migration Dependencies
- Data Platform/Systems/Java
- Data Platform/Systems/Jupyter
- Data Platform/Systems/Jupyter/Administration
- Data Platform/Systems/Kerberos
- Data Platform/Systems/Kerberos/Administration
- Data Platform/Systems/Maintenance Schedule
- Data Platform/Systems/Managing systemd timers
- Data Platform/Systems/Manual maintenance
- Data Platform/Systems/MariaDB
- Data Platform/Systems/Matomo
- Data Platform/Systems/Mediawiki history reduced algorithm
- Data Platform/Systems/Mediawiki History Snapshot Check
- Data Platform/Systems/Page and user history reconstruction
- Data Platform/Systems/Page and user history reconstruction algorithm