User:SRodlund/PAWS examples lists
This is a list of existing PAWS notebooks created by users that can serve as examples for others. The list includes notebooks that employ database connections and API connections and are useful to individuals wishing to complete research and on-wiki tasks.
To create the Wiki replicas list, I looked at all notebooks using the MyPySql libary and sorted through them removing any that employed JOINS (which may become problematic as Wiki replicas go through planned upgrades), any test or practice notebooks, and notebooks which were obvious assignments or labs. I looked for examples that would be useful templates or examples for others to follow in their own work with PAWS and databases.
To create the API notebooks list, I looked at all PAWS notebooks created in 2019 and 2020. I removed any test, practice, and incomplete notebooks, then looked for notebooks that would serve as useful templates or examples for others to understand how to complete onwiki tasks using PAWS. I looked for a variety of notebooks that interact with a number of different projects - Wikipedia, Wikimedia Commons, Wikidata, etc.
The notebooks were also marked with specific topic tags (see key below) to help aid in understanding what they cover and what tasks they are best suited for.
Ultimately, these example notebooks will (hopefully) help inform us to create a set of how-tos for specificto help make it simpler for folks working on Wikis.
A visual key to help keep track of what examples and tutorials are available
Wiki replicas and datasets
PAWS notebook: Working with Wiki replicas and datasets
Tutorials and How-tos
- My First Notebook - A quick tutorial that explores how to connect to the database and make a query.
- Revision histories - A lab that explores how to do some introductory data extraction and analysis from Wikipedia data
- Hyperlink networks (APIs)
- Collaboration networks - A lab that explores how to analyze the structure of collaborations in Wikipedia data about users revisions across multiple articles
- Pageviews - A lab that explores how to analyze the structure of collaborations in Wikipedia data about users; revisions across multiple articles
- Wikimedia public tools for researchers - In this notebook you will find a set of working examples to connect Wikimedia Research Resources, to jupyter notebooks, importing the data to Pandas Dataframes.
- SQL Demo and examples - A variety of examples for working with SQL and PAWS
- How-to - Visualizing Wikipeda topics - Connect to the database and use several Python libraries to create visualizations of data from Wikipedia
- How-to - Teahouse question archive builder - This notebook will build a queryable data object out of a parsed thread dataset
- How-to - Event Stream, API, Database connections - A variety of methods for accessing data about revisions
- Querying Wikidata with SPARQL
Wikidata dumps tutorials
These are 3 Wikidata dump tutorials that Issac suggested might be part of a dumps specific notebook tutorial
Wiki replica Helper
This will be invaluable for anyone working with Wiki replicas and PAWS
- Yuvi's Replica Helper - This is a importable notebook that provides simple helpers for performing queries on the labsdb replica databases from PAWS. It is stateful and designed to be easy to use in an interactive setup.
- Example of Yuvi's replica helper in use
- Another example of Yuvi's replica helper in use
Note: This only includes notebooks without JOINS in their SQL queries -- which may not work correctly after planned changes to Wiki replicas. For a list of notebooks that inlcude JOINS by USER-ID, see this list: https://wikitech.wikimedia.org/wiki/User:SRodlund/PAWS_examples_lists/notebooks_with_joins
- Find Wikidata Q ids for all pages in category
- Curation log
- Get count of unreviewed pages per creation day, by autoconfirmed status
- Get the recent changes of the day
- Common edits by WMF staff
- How many NPP pages marked for deletion are actually deleted?
- Teahouse Answers
- Language revision counts per day
- SELECT page_title FROM page WHERE page title like ;% %;
- Wikidata database - Names similar to Karl
- Number of pages with "Berlin" - Wikimedia DE
- Changes made to pages using MyPySQL and Pywikibot - HY Wikipedia
- User Ids and their edit counts - Teahouse
- Get top viewed categories
- Tables Download
- Querying Media Counts - WikiLovesAfrica
- Querying images and how often they were used - WikiLovesAfrica
- This notebook contains functions for article comparison
- Edit notices - En Wikipedia
- A look at Barnstars
- Images not marked for fair use
- Wiki abuse filter list
- What is the annual volume of patrolling?
- Accessing page protections
- Wikimedia - public dumps
- Infering countries from articles - public dumps
- Pageviews - public dumps
- A variety of tasks with dumps
- Public dumps
- Generic notebook for dump processing
- Simplified Wikidata dumps
- Extract pages containing a keyword from a dump
- Call SPARQL with Python
- Building layered maps using SPARQL
- Add referenences to items already in Wikidata
- Get Wikipedia languages SPARQL query
- Runs Wikidata query in iframe and displays results
- Get Wikidata info from an arbitrary URL
- Species without English descriptions - Wikidata
PAWS notebook: API Connections
Tutorials and How-tos
- Mediawiki page history - The MediaWiki REST API lets you build apps and scripts that interact with any MediaWiki-based wiki. In this tutorial, we'll use the REST API page history endpoints to explore the history of articles on English Wikipedia.
- Mediawiki Rest API examples- This notebook contains a variety of Mediawiki Rest API examples: search pages, autocomplete page title, get page history, get page history counts, get revision, compare revision, get page, get page offline, get page source, get languages, get files, get files on a page, create page, update page.
- Wikimedia Feeds intro - Many Wikipedias include daily featured articles and other curated content on their homepages. You can see an example of this content on the main page of English, German, and French Wikipedias. The Wikifeeds API lets you access this content programmatically and add high-quality, multilingual content to your apps.
- Create an image grid using free images from Wikimedia Commons - This guide uses the MediaWiki REST API to explore media files on Wikimedia Commons. Wikimedia Commons is a collection of over 60,000,000 freely usable media files, many of which are used in Wikipedia articles.
- Reuse free images from Wikimedia Commons - This guide uses the MediaWiki REST API to explore media files on Wikimedia Commons. Wikimedia Commons is a collection of over 60,000,000 freely usable media files, many of which are used in Wikipedia articles.
- Exploring page history- The MediaWiki REST API lets you build apps and scripts that interact with any MediaWiki-based wiki. In this tutorial, we'll use the REST API page history endpoints to explore the history of articles on English Wikipedia.
- Search Wikipedia articles - The MediaWiki REST API lets you build apps and scripts that interact with any MediaWiki-based wiki. In this tutorial, we'll use the REST API search endpoints to search for articles about the Solar System on English Wikipedia.
- Retrieving free knowledge - This guide uses the MediaWiki REST API to explore articles on English Wikipedia.
- Wikipedia page stats comparison - This guide uses the MediaWiki REST API to explore articles on English Wikipedia.
- Get featured content from English Wikipedia - The Wikifeeds API provides convenient access to content featured on the Main Page of English Wikipedia.
Various notebooks using APIs
- Action API tests
- Article quality demo
- Blocked users Wikipedia DE
- Get namespace names - MediaWiki API
- Find vandalism on a give set of pages
- Find pages translated from English to Hindi
- Understand impact of the content translation tool
- Content translation exploraation A complex notebook featuring content translation Super interesting; not sure if it is entirely useful for this purpose
- Wikidata API example - update descriptions
- Extracting Covid-19 data from English Wikipedia
Pywikibot (Uses MediaWiki API)
- Add copyright to items in Wikidata
- Add copyright, creator to items in Wikidata
- Add awards to Wikidata category Sports Hall of Fame
- Add referenences to items already in Wikidata
- Auto Wikiproject
- Add short descriptions to biographies on Wikipedia EN
- Add items to Wikidata
- Change qualifier in P39 statements - Wikidata
- Make changes to pages using MyPySQL and Pywikibot - HY Wikipedia - On Wiki task using replicas and API
- Remove broken files
- Investigate bot issues
- Policy changes - ZH Wikipedia - Uses databases, pywikibot, json files, etc
- Teahouse archives answers - Uses databases, pywikibot, json files, etc
- Analyze number of new editors per month
- Catagorize images after the end of Wiki Loves Love
- Clean history merge list - Wikiproject history
- Categorize images from Wiki Loves Earth
- Move and recategorize patronymic names on Commons
- Dead interlanguage links
- Fix BDA Ids on Wikidata
- Fix titles on Wikidata
- Get articles without images
- Global replace in Wikipedia DE
- Categorize graves in cemeteries - commons
- Mass remove claims - Wikidata
- A script to move pages
- Get files with NASA image template - Commons
- Remove redirect class
- Check userpage authorship - RU Wikipedia
- Fix bad interwiki links
- Recategorize and move pages
- Upload text
- Parse data from talk pages
- Add a property to a category - Wikidata
- Autostatus update for Wikiproject
- Batch delete and unlink images
- Identify unhelpful file names on Commons
- Bulk depracate a template
- Bulk deprecate an index parameter
- Add statements to candiadats in Canada elections - Wikidata
- Move all pages from one subcategory to another
- Create new user pages
- Redirect a talk page
- Relicense uploads to Wikimedia Commons
- Replace page text
- Update a redirect