Talk:Robot policy
The intro is a bit harsh. Bots are our most productive user class. Please make it friendlier!
There's no need to lead off with "Failing to comply with these rules will generally result in... you as an Operator eventually banned from accessing our resources" or the all caps "Failure to follow these guidelines may result in your bot being blocked or heavily rate-limited."
- first try prolific light rate-limits before scaring people about blocking and heavy limits :) - clarify that rate-limits come first, and how bots will be notified that they are being limited
Currently the page mentions specifics that may conflict with the operation of useful good-faith low-volume bots. Clarify that this is a dialogue to keep the sites running smoothly, feedback welcome to tune the requirements, &c. e.g.:
"Bots that don't follow these guidelines may be rate-limited ... and informed of the reason, and pointed to ways to request higher quotas or exemptions Bots that repeatedly try to get around these guidelines or rate limits may be blocked"
Sj (talk) 16:26, 9 April 2025 (UTC)
- @Sj: Bots are, by far, our most expensive users overall. A tiny sliver of them are run by editors who are here to contribute edits or support editors, which are the uses you're thinking of when you say "bots", but that's not what's meant here. Jdforrester (WMF) (talk) 17:32, 9 April 2025 (UTC)
- I don't think that the tone of language on this page is going to affect the most expensive subsets of robots affected by this policy. It will affect the good-faith devs who know enough to find this page in the first place.
- I should very much like to know the classes of bots and their distribution, including abusive bots [which we never needed a detailed policy like this to block], misconfigured ones, trained ones, those belonging to the emperor, &c. Many of them are devoted to disseminating knowledge, and understanding how and why is among other things a way of identifying new knowledge flows by following demand. In this context, given related threads over the past year, I think first of bots and agents run by popular apps and AI services with millions of users, whose organizations tend to be supporters of the wikiverse, while also operating at high volumes and not yet being in equilibrium with their ecosystem.
- The fact that this page jumped from no mention of rate-limits at all to leading off with [permanent] banning and blocking suggests a spikiness of internal policy that doesn't need to leak through into public communication. There's always time to convey context, and make it a warm dialogue. Sj (talk) 22:10, 9 April 2025 (UTC)
- I agree with you that the tone of that part is a bit too harsh, it's clear from some of the feedback I got. I think your proposed wording works better.
- But please rest assured, abusive bots are already being pointed to this policy and this policy is mostly intended for them. GLavagetto (WMF) (talk) 04:51, 11 April 2025 (UTC)
- Also on why we need this change in the policy: the SRE team has been playing whack-a-mole for the best part of the last two years; we've even purpose-built systems to make filtering of traffic easier and faster.
- And it's getting worse: just in the last month, we've had to add 9 filtering rules for new abusive bots. Which means not only 9 incident responses, but also a lot of work to make sure we eventually remove filters that are not needed and might cause some false positives.
- The only way out of this - which again i don't think is sustainable - is to move from episodic enforcements of our rules to systematic enforcement.
- That's also why I tried to cover more systems in the updated policy, including media files and e.g. phabricator.
- GLavagetto (WMF) (talk) 06:01, 11 April 2025 (UTC)
Consolidating overlapping pages
@GLavagetto (WMF) thank you for these major documentation improvements!
I just noticed the page bot traffic, which is linked from https://developer.wikimedia.org/use-content/bot-traffic/. Since it only has instructions for indicating an operator's IP ranges, it seems unnecessarily complicated for it to be on a separate page. Why don't we merge that page into this one? Neil Shah-Quinn (WMF) (talk) 20:41, 11 April 2025 (UTC)
- The reason that page is separated is that, eventually, we want to have a proper intake process with a form, where the content of that page should go to. Having it on wikitech right now is just a placeholder while we figure out a proper process for handling requests at scale. GLavagetto (WMF) (talk) 13:49, 14 April 2025 (UTC)
Actually, along the same lines, what do think about moving the content on this page onto the developer portal page? I feel like this would have a number of benefits:
- The page would look more official and serious, whereas a normal wiki page like this looks more like documentation
- Readers would see the concrete advice on this page first, whereas the developer portal currently points first to the more formal, wordy pages on Governance Wiki
- Avoids frictionless editing of the page (normally, that's a feature, not a bug, but I think a serious, high-visibility policy like this is an exception).
It does look like it might be difficult to write a detailed page within the architecture of the developer portal, so if that's the case, maybe the best compromise would be to make the developer portal page link only to this page. This page already links to the other pages as relevant, so few changes would be needed. We could then protect this page to get the same benefit of a speed bump for edits (it doesn't need to be super strict; I think limiting it to auto-confirmed users would be plenty). Neil Shah-Quinn (WMF) (talk) 20:57, 11 April 2025 (UTC)
- The developer portal is deliberately designed to provide user journey focused links to on-wiki documentation. The particular content you are wondering about changing was added in phab:T388051 based on the needs of the https://enterprise.wikimedia.com/ team. -- BryanDavis (talk) 00:27, 16 April 2025 (UTC)
Year update 2025
Proti. Against. I miss the following scenario in the rules (it comes from practice with my bot): I have a bot, I make changes in 1 thread, I want an action like download-[edit-save]-wait_X_s . It seems to me that the new rules are unnecessarily complicated for this scenario. (automatic translate, original: Chybí mi v pravidlách uvedenie nasledujúceho scenára (vychadzá z praxe s mojím botom): Mám bota, zmeny vykonávam v 1 vlákne, chcem činnosť v štýle stiahnem-[upravím-uložím]-počkám_X_s . Príde mi, že na tento scenár sú nové pravidla zbytočne zložité. Dušan Kreheľ (talk) 18:24, 7 June 2025 (UTC)
Bot traffic page
This should link to or incorporate Bot traffic which is completely undiscoverable now. I guess the "Provide a URL where we can download a JSON formatted list of CIDRs from which your requests will originate." sentence is trying to say the same thing, but it's not very clear. Tgr (WMF) (talk) 15:36, 18 July 2025 (UTC)