Jump to content

Talk:Analytics/Archive/Doc proposal

From Wikitech

Commenting on Joseph's proposalː

Analytics/Archives/

  2015 data warehouse experiments
  2015 data warehouse experiments/2014-12-02 verifications
  2015 data warehouse experiments/2015-01-14 verifications
  2015 data warehouse experiments/2015-02-03 verifications
  
  Cluster/ETL
  Cluster/Logging Solutions Overview
  Cluster/Logging Solutions Recommendation
  Cluster/Streaming -- Rename Hadoop Streaming for xml dumps?
  Cluster/Webrequest partitions

  Dashboards (archived)

  Data/Mobile requests stream
  Data/Webrequests sampled
  Data/Zero webrequests

  Data/Pagecounts-all-sites
  Data/Pagecounts-ez  >>> Danː not this one, this one's active and better than the pageviews dumps actually
  Data/Pagecounts-raw
>>> Danː I'm ok with archiving these but we should keep linking to them for context I think

  Kraken/Meetings
  Kraken/Meetings/ArchitectureReview
  Kraken/Meetings/SecurityReview

  Global-Dev Dashboard

  Limn

  Mingle

  Pageviews/Aggregation

  Pentaho
  Products

  TOC

  Webstatscollector

  Wikipedia Zero

  Wikistats2.0
  Wikistats2.0/Design

  gp.wmflabs.org

  statsv -- delete?
>>> Danː isn't this still used?  I think wiki folks generally frown on deleting, archive is ok if it's not used


Analytics/Tutorials
>>> Danː ooh, I like this, Tutorials will be a great place to put more content

  Dashboards


Analytics/Data Lake
  analytics.wikimedia.org

Analytics/Data Lake/Traffic

  Data access

  Data/Pageviews --> Pageviews 
  Data/Redirects --> Pageviews/Redirects
  Bots --> Pageviews/Bots
  
  Data/UserRetention --> User Retention

  Datasets/  <-- core page?
    Cluster/BrowserReports --> Browser Reports + enhance
  
  Monitoring (to create)

  -- Remove data below  >>> Danː what? No remove, all this stuff is awesome, especially sanitizing pages
  Data/ApiAction 
  Data/Browser general -- How does it relates to BrowserReports??  >>> Danː This is the intermediate table queried from reportupdater-queries
  Data/Cirrus -- Discuss with Discovery, delete?
  Data/Mediacounts
  Data/Pageview hourly
  Data/Pageview hourly/Fingerprinting Over Time
  Data/Pageview hourly/Identity reconstruction analysis
  Data/Pageview hourly/K Anonymity Threshold Analysis
  Data/Pageview hourly/Sanitization
  Data/Pageview hourly/Sanitization algorithm proposal
  Data/Projectview hourly
  Data/Unique Devices
  Data/Webrequest
  Data/Webrequest/RawIPUsage
  Data/mobile apps session metrics

  Pageviews

  PageviewAPI
  LegacyPageviewAPI

  PageviewAPI/Capacity - Delete ?
  PageviewAPI/DataStore
  PageviewAPI/RESTBase

  Unique Devices
  Unique Devices/Last access solution
  Unique clients/Last access solution/BotResearch
  Unique clients/Last access solution/Validation



Analytics/Data Lake/Edits
  Data Lake <-- Core page
  -- Remove Data Lake/Schemas/ below   >>> Danː I'm not sure what you mean by remove, these pages are also important.
  Data Lake/Schemas/Mediawiki history
  Data Lake/Schemas/Mediawiki page history
  Data Lake/Schemas/Mediawiki user history
  Data Lake/Schemas/Metric results


Analytics/Systems

  AQS
  AQS/Scaling
  AQS/Scaling/2016/Hardware Refresh
  AQS/Scaling/2017/Cluster Expansion
  AQS/Scaling/LoadTesting
  
  Archiva

  Cluster/Druid --> Druid
  Cluster/Druid/Load test --> Druid/Load test

  Cluster
  Cluster/Access
  Cluster/Beeline
  Cluster/Camus
  Cluster/Data Format Experiments
  Cluster/Deploy/Refinery
  Cluster/Deploy/Refinery-source
  Cluster/Deploy a fix to incorrect camus partitionning
  Cluster/Geotagging
  Cluster/Hadoop -- Update ??
  Cluster/Hadoop/Administration
  Cluster/Hadoop/Load
  Cluster/Hardware -- Update ??
  Cluster/Hive  --- Update (tables)
  Cluster/Hive/Avro
  Cluster/Hive/Compression
  Cluster/Hive/Counting uniques
  Cluster/Hive/Mediawiki --> Empty - To delete?
  Cluster/Hive/Queries
  Cluster/Hive/Queries/Wikidata
  Cluster/Hive/QueryUsingUDF
  Cluster/Hive/Schemas -- Replaced by datalake dataset - Delete?
  Cluster/Kafka/Capacity
  Kafka Udp2log --> Cluster/Kafka/Kafka Udp2log
  Cluster/Logstash -- Really ???
  Cluster/MediaWiki Avro Logging -- Split to DataLake??
  Cluster/Oozie
  Cluster/Oozie/Administration
  Cluster/Ports
  Cluster/Puppet
  Cluster/Spark -- Update oozie part
  Geolocation --> Cluster/Geolocation

  Conferences
  Conferences/Apache Big Data Europe - November 2016

  Dashiki

  Datastores/Evaluation

  -- Update Data Lake/Pipeline to mediawiki history pipeline
  Data Lake/Pipeline/Data loading
  Data Lake/Pipeline/Denormalization and historification
  Data Lake/Pipeline/Page and user history reconstruction
  Data Lake/Pipeline/Page and user history reconstruction algorithm and optimizations
  Data Lake/Pipeline/Serving layer
  -- Similarly as above, create traffic pipeline

  EventLogging
  EventLogging/Administration
  EventLogging/Architecture
  EventLogging/Backfilling
  EventLogging/Data representations
  EventLogging/Data retention and auto-purging
  EventLogging/Monitoring
  EventLogging/New pipeline
  EventLogging/Outages
  EventLogging/Performance
  EventLogging/Publishing
  EventLogging/Sanitization vs Aggregation
  EventLogging/Sensitive Fields
  EventLogging/TestingOnBetaCluster

  EventStreams <-- In systems really??

  Wikimetrics
  Wikimetrics/Adding New Features
  Wikimetrics/Adding New Features/CentralAuth Cohorts
  Wikimetrics/Adding New Features/Tag Cohorts
  Wikimetrics/Global metrics

  Geowiki

  Reportupdater

  Siege

  Varnishkafka

  Vital Signs

  Wikistats

  piwik

OTHERS:

  DataRequests --> In FAQ? 
  DataResearch --> milimetric/DataResearch ?
  DataResearch/VisualEditor --> milimetric/DataResearch/VisualEditor ?
  Data Lake/Doc proposal -- Delete
  
  Datasets << Removed?

  FAQ
  MailingList - In FAQ ?

  Onboarding   >>> Danː move to main

  Oncall  >>> Danː move to main

  Team -- Move to main?  >>> Danː yes
  Tier2 -- Move to main  >>> Danː yes

Start a discussion about Analytics/Archive/Doc proposal

Start a discussion