Talk:SRE/Data Persistence/Design Review/Global Editor Metrics 2025
keyspace name?
Table name: [WIP]
edits_per_user_per_page_daily
Another bikeshed that needs paint is keyspace name. Keyspace is (somewhat) analogous to a "db" in MySQL/MariaDB terms, it's a namespace for grouping tables. Functionally speaking, all of the tables that correspond with a keyspace share replication and durability configuration. It's also possible to create roles and grants that apply to all of the tables within a keyspace, rather than to have to apply them individually. Otherwise it mainly boils down to organization.
We created a new keyspace (called commons) for Commons Impact Metrics, and one for for Image Suggestions (image_suggestions of course). I reckon that if we had other, sufficiently similar tables coming, we could group them under a name that made sense... In the absence of anything more meaningful, maybe global_editor_metrics? That'd make the fully-qualified table name: global_editor_metrics.edits_per_user_page_daily.
-- EEvans (WMF) (talk) 19:58, 16 September 2025 (UTC)
- Hm! What are the AQS / pageview API metrics keyspaces?
- For short I'd drop the 'global' part. If in the future someone wants metrics there that aren't 'global' but still about individual editors, I think we can reuse the keyspace we create for this. So e.g.
editor_metrics. edits_per_user_page_dailyotto (talk) 17:25, 22 September 2025 (UTC)- On second thought, while these metrics are about 'editors', they are more generally about 'users that have edited'. I could see Growth or some other team also wanting to store metrics about 'moderators', who sometimes make edits, but also sometimes do other kinds of things (thank you pages, talk pages, deleting pages, RevisionDelete / redacting pages, etc.)
- So, let's talk this mediawiki_user_metrics? Or, if that is too verbose (since we don't qualify with MW elsewhere), then perhaps just 'user_metrics'? I'll change the document to reflect this. otto (talk) 17:42, 22 September 2025 (UTC)
- After more discussion, we decided to go with just 'analytics' for this and new Analytics API (AQS) related tables. otto (talk) 13:28, 16 October 2025 (UTC)
Does page_id need to be bigint?
I'm wondering if page_id needs to be bigint. đ¤
I think it's a 32bit unsigned (mw:Manual:Page_table), and int in Cassandra is a 32bit signed. Up to 4,294,967,295 versus 2,147,483,647 (so strictly speaking, it should be a bigint), but do we expect to ever have 2.1B pages? I ask mostly because I've never stopped and thought about this before, and we've used int for page and revision IDs pretty liberally (the latter of which is an unsigned bigint in mediawiki đŹ).
-- EEvans (WMF) (talk) 20:15, 16 September 2025 (UTC)
- Oh, def should be bigint. All ids should be unsigned. I didn't realize that! int vs bigint in Hive world just means int vs long, so I assumed the same.
- Changing to bigint. otto (talk) 17:22, 22 September 2025 (UTC)
- Oh, I misunderstood. Okay there is no 'long' or unsigned in cassandra types. Let's go with bigint for these.
- Will change the _count fields to be bigints as well. otto (talk) 17:31, 22 September 2025 (UTC)
pageviews_top_pages_per_editor_monthly
Should wiki_id be of type text, rather than int?
Is views intentionally bigint? EEvans (WMF) (talk) 21:10, 29 October 2025 (UTC)