Jump to content

Talk:SRE/Data Persistence/Design Review/Global Editor Metrics 2025

From Wikitech

keyspace name?

Table name: [WIP] edits_per_user_per_page_daily

Another bikeshed that needs paint is keyspace name. Keyspace is (somewhat) analogous to a "db" in MySQL/MariaDB terms, it's a namespace for grouping tables. Functionally speaking, all of the tables that correspond with a keyspace share replication and durability configuration. It's also possible to create roles and grants that apply to all of the tables within a keyspace, rather than to have to apply them individually. Otherwise it mainly boils down to organization.

We created a new keyspace (called commons) for Commons Impact Metrics, and one for for Image Suggestions (image_suggestions of course). I reckon that if we had other, sufficiently similar tables coming, we could group them under a name that made sense... In the absence of anything more meaningful, maybe global_editor_metrics? That'd make the fully-qualified table name: global_editor_metrics.edits_per_user_page_daily.

-- EEvans (WMF) (talk) 19:58, 16 September 2025 (UTC)Reply
Hm! What are the AQS / pageview API metrics keyspaces?
For short I'd drop the 'global' part. If in the future someone wants metrics there that aren't 'global' but still about individual editors, I think we can reuse the keyspace we create for this. So e.g.
editor_metrics. edits_per_user_page_daily otto (talk) 17:25, 22 September 2025 (UTC)Reply
On second thought, while these metrics are about 'editors', they are more generally about 'users that have edited'. I could see Growth or some other team also wanting to store metrics about 'moderators', who sometimes make edits, but also sometimes do other kinds of things (thank you pages, talk pages, deleting pages, RevisionDelete / redacting pages, etc.)
So, let's talk this mediawiki_user_metrics? Or, if that is too verbose (since we don't qualify with MW elsewhere), then perhaps just 'user_metrics'? I'll change the document to reflect this. otto (talk) 17:42, 22 September 2025 (UTC)Reply
After more discussion, we decided to go with just 'analytics' for this and new Analytics API (AQS) related tables. otto (talk) 13:28, 16 October 2025 (UTC)Reply

Does page_id need to be bigint?

I'm wondering if page_id needs to be bigint. 🤔

I think it's a 32bit unsigned (mw:Manual:Page_table), and int in Cassandra is a 32bit signed. Up to 4,294,967,295 versus 2,147,483,647 (so strictly speaking, it should be a bigint), but do we expect to ever have 2.1B pages? I ask mostly because I've never stopped and thought about this before, and we've used int for page and revision IDs pretty liberally (the latter of which is an unsigned bigint in mediawiki 😬).

-- EEvans (WMF) (talk) 20:15, 16 September 2025 (UTC)Reply

Oh, def should be bigint. All ids should be unsigned. I didn't realize that! int vs bigint in Hive world just means int vs long, so I assumed the same.
Changing to bigint. otto (talk) 17:22, 22 September 2025 (UTC)Reply
Oh, I misunderstood. Okay there is no 'long' or unsigned in cassandra types. Let's go with bigint for these.
Will change the _count fields to be bigints as well. otto (talk) 17:31, 22 September 2025 (UTC)Reply

pageviews_top_pages_per_editor_monthly

Should wiki_id be of type text, rather than int?

Is views intentionally bigint? EEvans (WMF) (talk) 21:10, 29 October 2025 (UTC)Reply