Talk:Analytics/Archive/Doc proposal
Appearance
Commenting on Joseph's proposalː
Analytics/Archives/ 2015 data warehouse experiments 2015 data warehouse experiments/2014-12-02 verifications 2015 data warehouse experiments/2015-01-14 verifications 2015 data warehouse experiments/2015-02-03 verifications Cluster/ETL Cluster/Logging Solutions Overview Cluster/Logging Solutions Recommendation Cluster/Streaming -- Rename Hadoop Streaming for xml dumps? Cluster/Webrequest partitions Dashboards (archived) Data/Mobile requests stream Data/Webrequests sampled Data/Zero webrequests Data/Pagecounts-all-sites Data/Pagecounts-ez >>> Danː not this one, this one's active and better than the pageviews dumps actually Data/Pagecounts-raw >>> Danː I'm ok with archiving these but we should keep linking to them for context I think Kraken/Meetings Kraken/Meetings/ArchitectureReview Kraken/Meetings/SecurityReview Global-Dev Dashboard Limn Mingle Pageviews/Aggregation Pentaho Products TOC Webstatscollector Wikipedia Zero Wikistats2.0 Wikistats2.0/Design gp.wmflabs.org statsv -- delete? >>> Danː isn't this still used? I think wiki folks generally frown on deleting, archive is ok if it's not used Analytics/Tutorials >>> Danː ooh, I like this, Tutorials will be a great place to put more content Dashboards Analytics/Data Lake analytics.wikimedia.org Analytics/Data Lake/Traffic Data access Data/Pageviews --> Pageviews Data/Redirects --> Pageviews/Redirects Bots --> Pageviews/Bots Data/UserRetention --> User Retention Datasets/ <-- core page? Cluster/BrowserReports --> Browser Reports + enhance Monitoring (to create) -- Remove data below >>> Danː what? No remove, all this stuff is awesome, especially sanitizing pages Data/ApiAction Data/Browser general -- How does it relates to BrowserReports?? >>> Danː This is the intermediate table queried from reportupdater-queries Data/Cirrus -- Discuss with Discovery, delete? Data/Mediacounts Data/Pageview hourly Data/Pageview hourly/Fingerprinting Over Time Data/Pageview hourly/Identity reconstruction analysis Data/Pageview hourly/K Anonymity Threshold Analysis Data/Pageview hourly/Sanitization Data/Pageview hourly/Sanitization algorithm proposal Data/Projectview hourly Data/Unique Devices Data/Webrequest Data/Webrequest/RawIPUsage Data/mobile apps session metrics Pageviews PageviewAPI LegacyPageviewAPI PageviewAPI/Capacity - Delete ? PageviewAPI/DataStore PageviewAPI/RESTBase Unique Devices Unique Devices/Last access solution Unique clients/Last access solution/BotResearch Unique clients/Last access solution/Validation Analytics/Data Lake/Edits Data Lake <-- Core page -- Remove Data Lake/Schemas/ below >>> Danː I'm not sure what you mean by remove, these pages are also important. Data Lake/Schemas/Mediawiki history Data Lake/Schemas/Mediawiki page history Data Lake/Schemas/Mediawiki user history Data Lake/Schemas/Metric results Analytics/Systems AQS AQS/Scaling AQS/Scaling/2016/Hardware Refresh AQS/Scaling/2017/Cluster Expansion AQS/Scaling/LoadTesting Archiva Cluster/Druid --> Druid Cluster/Druid/Load test --> Druid/Load test Cluster Cluster/Access Cluster/Beeline Cluster/Camus Cluster/Data Format Experiments Cluster/Deploy/Refinery Cluster/Deploy/Refinery-source Cluster/Deploy a fix to incorrect camus partitionning Cluster/Geotagging Cluster/Hadoop -- Update ?? Cluster/Hadoop/Administration Cluster/Hadoop/Load Cluster/Hardware -- Update ?? Cluster/Hive --- Update (tables) Cluster/Hive/Avro Cluster/Hive/Compression Cluster/Hive/Counting uniques Cluster/Hive/Mediawiki --> Empty - To delete? Cluster/Hive/Queries Cluster/Hive/Queries/Wikidata Cluster/Hive/QueryUsingUDF Cluster/Hive/Schemas -- Replaced by datalake dataset - Delete? Cluster/Kafka/Capacity Kafka Udp2log --> Cluster/Kafka/Kafka Udp2log Cluster/Logstash -- Really ??? Cluster/MediaWiki Avro Logging -- Split to DataLake?? Cluster/Oozie Cluster/Oozie/Administration Cluster/Ports Cluster/Puppet Cluster/Spark -- Update oozie part Geolocation --> Cluster/Geolocation Conferences Conferences/Apache Big Data Europe - November 2016 Dashiki Datastores/Evaluation -- Update Data Lake/Pipeline to mediawiki history pipeline Data Lake/Pipeline/Data loading Data Lake/Pipeline/Denormalization and historification Data Lake/Pipeline/Page and user history reconstruction Data Lake/Pipeline/Page and user history reconstruction algorithm and optimizations Data Lake/Pipeline/Serving layer -- Similarly as above, create traffic pipeline EventLogging EventLogging/Administration EventLogging/Architecture EventLogging/Backfilling EventLogging/Data representations EventLogging/Data retention and auto-purging EventLogging/Monitoring EventLogging/New pipeline EventLogging/Outages EventLogging/Performance EventLogging/Publishing EventLogging/Sanitization vs Aggregation EventLogging/Sensitive Fields EventLogging/TestingOnBetaCluster EventStreams <-- In systems really?? Wikimetrics Wikimetrics/Adding New Features Wikimetrics/Adding New Features/CentralAuth Cohorts Wikimetrics/Adding New Features/Tag Cohorts Wikimetrics/Global metrics Geowiki Reportupdater Siege Varnishkafka Vital Signs Wikistats piwik OTHERS: DataRequests --> In FAQ? DataResearch --> milimetric/DataResearch ? DataResearch/VisualEditor --> milimetric/DataResearch/VisualEditor ? Data Lake/Doc proposal -- Delete Datasets << Removed? FAQ MailingList - In FAQ ? Onboarding >>> Danː move to main Oncall >>> Danː move to main Team -- Move to main? >>> Danː yes Tier2 -- Move to main >>> Danː yes
Start a discussion about Analytics/Archive/Doc proposal
Talk pages are where people discuss how to make content on Wikitech the best that it can be. You can use this page to start a discussion with others about how to improve Analytics/Archive/Doc proposal.