Talk:Analytics/Archive/Doc proposal
Appearance
(Redirected from Talk:Analytics/Data Lake/Doc proposal)
Commenting on Joseph's proposalː
Analytics/Archives/
2015 data warehouse experiments
2015 data warehouse experiments/2014-12-02 verifications
2015 data warehouse experiments/2015-01-14 verifications
2015 data warehouse experiments/2015-02-03 verifications
Cluster/ETL
Cluster/Logging Solutions Overview
Cluster/Logging Solutions Recommendation
Cluster/Streaming -- Rename Hadoop Streaming for xml dumps?
Cluster/Webrequest partitions
Dashboards (archived)
Data/Mobile requests stream
Data/Webrequests sampled
Data/Zero webrequests
Data/Pagecounts-all-sites
Data/Pagecounts-ez >>> Danː not this one, this one's active and better than the pageviews dumps actually
Data/Pagecounts-raw
>>> Danː I'm ok with archiving these but we should keep linking to them for context I think
Kraken/Meetings
Kraken/Meetings/ArchitectureReview
Kraken/Meetings/SecurityReview
Global-Dev Dashboard
Limn
Mingle
Pageviews/Aggregation
Pentaho
Products
TOC
Webstatscollector
Wikipedia Zero
Wikistats2.0
Wikistats2.0/Design
gp.wmflabs.org
statsv -- delete?
>>> Danː isn't this still used? I think wiki folks generally frown on deleting, archive is ok if it's not used
Analytics/Tutorials
>>> Danː ooh, I like this, Tutorials will be a great place to put more content
Dashboards
Analytics/Data Lake
analytics.wikimedia.org
Analytics/Data Lake/Traffic
Data access
Data/Pageviews --> Pageviews
Data/Redirects --> Pageviews/Redirects
Bots --> Pageviews/Bots
Data/UserRetention --> User Retention
Datasets/ <-- core page?
Cluster/BrowserReports --> Browser Reports + enhance
Monitoring (to create)
-- Remove data below >>> Danː what? No remove, all this stuff is awesome, especially sanitizing pages
Data/ApiAction
Data/Browser general -- How does it relates to BrowserReports?? >>> Danː This is the intermediate table queried from reportupdater-queries
Data/Cirrus -- Discuss with Discovery, delete?
Data/Mediacounts
Data/Pageview hourly
Data/Pageview hourly/Fingerprinting Over Time
Data/Pageview hourly/Identity reconstruction analysis
Data/Pageview hourly/K Anonymity Threshold Analysis
Data/Pageview hourly/Sanitization
Data/Pageview hourly/Sanitization algorithm proposal
Data/Projectview hourly
Data/Unique Devices
Data/Webrequest
Data/Webrequest/RawIPUsage
Data/mobile apps session metrics
Pageviews
PageviewAPI
LegacyPageviewAPI
PageviewAPI/Capacity - Delete ?
PageviewAPI/DataStore
PageviewAPI/RESTBase
Unique Devices
Unique Devices/Last access solution
Unique clients/Last access solution/BotResearch
Unique clients/Last access solution/Validation
Analytics/Data Lake/Edits
Data Lake <-- Core page
-- Remove Data Lake/Schemas/ below >>> Danː I'm not sure what you mean by remove, these pages are also important.
Data Lake/Schemas/Mediawiki history
Data Lake/Schemas/Mediawiki page history
Data Lake/Schemas/Mediawiki user history
Data Lake/Schemas/Metric results
Analytics/Systems
AQS
AQS/Scaling
AQS/Scaling/2016/Hardware Refresh
AQS/Scaling/2017/Cluster Expansion
AQS/Scaling/LoadTesting
Archiva
Cluster/Druid --> Druid
Cluster/Druid/Load test --> Druid/Load test
Cluster
Cluster/Access
Cluster/Beeline
Cluster/Camus
Cluster/Data Format Experiments
Cluster/Deploy/Refinery
Cluster/Deploy/Refinery-source
Cluster/Deploy a fix to incorrect camus partitionning
Cluster/Geotagging
Cluster/Hadoop -- Update ??
Cluster/Hadoop/Administration
Cluster/Hadoop/Load
Cluster/Hardware -- Update ??
Cluster/Hive --- Update (tables)
Cluster/Hive/Avro
Cluster/Hive/Compression
Cluster/Hive/Counting uniques
Cluster/Hive/Mediawiki --> Empty - To delete?
Cluster/Hive/Queries
Cluster/Hive/Queries/Wikidata
Cluster/Hive/QueryUsingUDF
Cluster/Hive/Schemas -- Replaced by datalake dataset - Delete?
Cluster/Kafka/Capacity
Kafka Udp2log --> Cluster/Kafka/Kafka Udp2log
Cluster/Logstash -- Really ???
Cluster/MediaWiki Avro Logging -- Split to DataLake??
Cluster/Oozie
Cluster/Oozie/Administration
Cluster/Ports
Cluster/Puppet
Cluster/Spark -- Update oozie part
Geolocation --> Cluster/Geolocation
Conferences
Conferences/Apache Big Data Europe - November 2016
Dashiki
Datastores/Evaluation
-- Update Data Lake/Pipeline to mediawiki history pipeline
Data Lake/Pipeline/Data loading
Data Lake/Pipeline/Denormalization and historification
Data Lake/Pipeline/Page and user history reconstruction
Data Lake/Pipeline/Page and user history reconstruction algorithm and optimizations
Data Lake/Pipeline/Serving layer
-- Similarly as above, create traffic pipeline
EventLogging
EventLogging/Administration
EventLogging/Architecture
EventLogging/Backfilling
EventLogging/Data representations
EventLogging/Data retention and auto-purging
EventLogging/Monitoring
EventLogging/New pipeline
EventLogging/Outages
EventLogging/Performance
EventLogging/Publishing
EventLogging/Sanitization vs Aggregation
EventLogging/Sensitive Fields
EventLogging/TestingOnBetaCluster
EventStreams <-- In systems really??
Wikimetrics
Wikimetrics/Adding New Features
Wikimetrics/Adding New Features/CentralAuth Cohorts
Wikimetrics/Adding New Features/Tag Cohorts
Wikimetrics/Global metrics
Geowiki
Reportupdater
Siege
Varnishkafka
Vital Signs
Wikistats
piwik
OTHERS:
DataRequests --> In FAQ?
DataResearch --> milimetric/DataResearch ?
DataResearch/VisualEditor --> milimetric/DataResearch/VisualEditor ?
Data Lake/Doc proposal -- Delete
Datasets << Removed?
FAQ
MailingList - In FAQ ?
Onboarding >>> Danː move to main
Oncall >>> Danː move to main
Team -- Move to main? >>> Danː yes
Tier2 -- Move to main >>> Danː yes
Start a discussion about Analytics/Archive/Doc proposal
Talk pages are where people discuss how to make content on Wikitech the best that it can be. You can use this page to start a discussion with others about how to improve Analytics/Archive/Doc proposal.