Help talk:Toolforge/Running Pywikibot scripts (advanced)/Archives/2022

Rendered with Parsoid
From Wikitech
Warning! Please do not post any new comments on this page. This is a discussion archive See current discussion or the archives index.

Shared pywikibot file usage instructions need to be updated for v7.0.0 changes (was: does not work)

The "Using the shared Pywikibot files (recommended setup)" does not work. I followed the steps carefully, and failed. Then I installed it locally on toolforge, and it worked. —usernamekiran (talk) 17:58, 4 March 2022 (UTC)Reply

@Usernamekiran, do you have any specific errors to share? Anecdotally I can say that I used the shared setup instructions in officewikibot just a couple of weeks ago and was able to successfully configure the framework and run scripts on the grid engine via both manual submission and cron. -- BryanDavis (talk) 00:44, 5 March 2022 (UTC)Reply
@BryanDavis: After following the steps carefully/correctly, when I ran the command python3 /data/project/shared/pywikibot/stable/generate_user_files.py I got "file pywikibot not found" or something meaning the same thing. I apologise, I can't recall exactly. —usernamekiran (talk) 15:59, 5 March 2022 (UTC)Reply
@BryanDavis: Hi. I tried it again. After following the previous commands properly, when I entered python3 /data/project/shared/pywikibot/stable/generate_user_files.py it gives out the error python3: can't open file '/data/project/shared/pywikibot/stable/generate_user_files.py': [Errno 2] No such file or directory I think, in the last attempt, I had created the user files manually (in .pywikibot directory), I will try that again and see if that works. I will update here what happens. —usernamekiran (talk) 16:43, 7 March 2022 (UTC)Reply
@Usernamekiran you are correct that /data/project/shared/pywikibot/stable/generate_user_files.py is not available on disk today. I double checked the $HOME/.bash_history on the officewikibot tool I mentioned and found that the file was in that location 4 weeks ago. The new location is /data/project/shared/pywikibot/stable/pywikibot/scripts/generate_user_files.py. This move appears to be part of the changes in pywikibot 7.0.0 which was released on 2022-02-26. I will try to update the page to reflect these changes, but everyone's help is welcome in updating the documentation. -- BryanDavis (talk) 17:06, 7 March 2022 (UTC)Reply
Special:Diff/1954677 updates the paths for generate_user_files.py and version.py in our local docs. -- BryanDavis (talk) 17:16, 7 March 2022 (UTC)Reply
@BryanDavis: Sorry, I did not see you reply earlier. I manually added user-config, user-password (with 600 permission), and user-fixes.py manually. All I could do was to successfully run /data/project/shared/pywikibot/core/pywikibot/scripts/version.py but I could not do anything else. While I had installed the pywikibot on toolforge locally, I could use everything (I have removed it now, for the second time). —usernamekiran (talk) 17:25, 7 March 2022 (UTC)Reply
pinging Klein Muçi who have been helpful in the issue, and might want to stay updated. —usernamekiran (talk) 17:42, 7 March 2022 (UTC)Reply
Bryan, Klein Does this also mean that the contents of .bash_profile should be updated? Currently its export PYTHONPATH=/data/project/shared/pywikibot/stable:/data/project/shared/pywikibot/stable/scripts —usernamekiran (talk) 18:12, 7 March 2022 (UTC)Reply
I'm in no way near being an expert in this matter but I doubt it will have to be changed. I tried to run my existing scripts directly and through jsub, without doing any change, and they run normally. Strangely... - Klein Muçi (talk) 19:15, 7 March 2022 (UTC)Reply
@Klein When running directly, how, and at which command prompt do you run the scripts? I am not sure what am I doing wrong. —usernamekiran (talk) 20:02, 7 March 2022 (UTC)Reply
I believe those are still the correct paths to add to PYTHONPATH. The release notes only mention the generate_family_file.py, generate_user_files.py, shell.py and version.py scripts being moved to pywikibot/scripts. -- BryanDavis (talk) 21:17, 7 March 2022 (UTC)Reply

multiple user-fixes.py

@Bryan Hi. I hope you are doing well. I have been meaning to thank you, and Klein Muçi, and update you with my progress discussed in the section above, but I was always getting caught with something or other. I will do it shortly though. What I wanted to ask is, currently my user-fixes.py contains around 200 entries for find and replace task. There is nothing except find and replace though. There are around 400 more words that need to be added to source code (currently pending approval/vetting of mrwiki community). What is the maximum limit of entries for the task optimally without becoming burden on server? I think this list will keep on increasing with time, but at a slower pace after this. Currently, the cronjob is set to run once every 24 hours (4PM UTC)

Two solutions come to my mind. First: making a few different user-fixes files, if that's possible. Second: I can create different "fixes" within the same file, limiting the entries in each fix to 100. Then I can set cronjob to run each fix at a different time. eg fix1 at 1 PM, fix2 at 2 PM, and so on. Kindly let me know what you think. Thanks a lot in advance, —usernamekiran (talk) 17:35, 5 April 2022 (UTC)Reply

Your tool is limited to a maximum amount of CPU and RAM that it can use at any given time to protect the Toolforge servers. Keep an eye on how your single job is doing, and as long as it is not being stopped for exceeding its resource quota or taking longer to process things than you can allow things should be fine. You can start working on splitting things up into smaller pieces if you start to hit limits. -- BryanDavis (talk) 17:47, 5 April 2022 (UTC)Reply
Thanks Bryan. Also, I recently got an email with subject "[REMINDER] - Your KIRANBOT4 Project Is Still Running On Strech Grid Engine". My command prompt is tools.kiranbot4@tools-sgebastion-08:~$. In the email, there are a few links, but I couldn't find any page page with clear steps for migrating to Buster grid, or Kubernetes. There are some steps given at en:User:Novem Linguae/Essays/Toolforge bot tutorial, but originally they are about setting up from scratch, not about migrating. Also, my tool does not have webservice, only pywikibot. How should I migrate? Also, what do you recommend? Buster grid, or Kubernetes? I apologise for bothering you so much. —usernamekiran (talk) 08:18, 6 April 2022 (UTC)Reply
@Usernamekiran, see News/Toolforge Stretch deprecation#Move_a_cron_job for specific instructions on moving a job from the legacy Stretch grid to the new Buster grid. Today I would recommend the Buster grid for most pywikibot users. There has been discussion of making it easier to run pywikibot jobs on the Kubernetes cluster, but today it is an under documented and under supported use case. -- BryanDavis (talk) 16:45, 6 April 2022 (UTC)Reply
Hello Bryan. I think I did it, I am not sure how much successful I was. Is there any way to check if I migrated successfully? —usernamekiran (talk) 14:33, 8 April 2022 (UTC)Reply
@Usernamekiran, the report at https://grid-deprecation.toolforge.org/u/usernamekiran will eventually be empty if you succeeded in migrating off of stretch or disabling all jobs for tools you maintain. That report is updated once per hour. A more manual check can be done by looking at the qstat -xml output for each tool. The hostname component of the <queue_name> that the job is running under tells you if the queue is running on stretch (contains -09nn where nn is any two digits) or buster (contains -10- ). At the time I am writing this, your KiranBOT job is running on continuous@tools-sgeexec-0920.tools.eqiad.wmflabs which indicates it is still on the stretch grid. -- BryanDavis (talk) 16:01, 8 April 2022 (UTC)Reply
@Bryan I stopped all the jobs. —usernamekiran (talk) 17:55, 8 April 2022 (UTC)Reply
@Bryan Hi. Currently, I have only two active jobs, and these jobs are not visible anymore on https://grid-deprecation.toolforge.org/t/kiranbot4 as they have been run using buster grid. The remaining two jobs will be excluded from the list after a week I think. Thanks again for all your help. Just one more doubt: like we can see the jobs running on old grid, is there a way to see jobs running on buster grid? maybe a command from my end to see which grid I am using? Thanks a lot again. —usernamekiran (talk) 17:36, 9 April 2022 (UTC)Reply

A more manual check can be done by looking at the qstat -xml output for each tool. The hostname component of the <queue_name> that the job is running under tells you if the queue is running on stretch (contains -09nn where nn is any two digits) or buster (contains -10- ).

-- BryanDavis (talk) 22:36, 10 April 2022 (UTC)Reply

the new installation guide errs as well

Hello. I was trying to make a fresh/clean install from the start. I tried to follow the latest installation guide, but I was getting various errors, including ModuleNotFoundError: No module named 'pip' when I tried to install "requests", because the "requests" was not installed as well. Everything goes well up until step 3. I tried a few different times, and I was getting different errors. All of my attempts included manual/non-interactive method of configuring pywikibot. I also tried manually running the commands instead of executing pwb_venv.sh As of now, the last time tried, I followed the guide word to word, and when I entered the command to generate the user files, I got following error, and gave up.

 
Traceback (most recent call last):
  File "/data/project/kiranbot4/pwbvenv/bin/pwb", line 5, in <module>
    from pywikibot.scripts.pwb import run
ModuleNotFoundError: No module named 'pywikibot.scripts.pwb'
CRITICAL: Exiting due to uncaught exception <class 'ModuleNotFoundError'>

@JJMC89, BryanDavis: could you please look into it, and see whats going wrong? I don't think the issue from my end. —usernamekiran (talk) 14:49, 21 December 2022 (UTC)Reply

You're running into phab:T320851. I've added a warning with a workaround for git installs. — JJMC89 (T·C) 17:13, 21 December 2022 (UTC)Reply
I am still getting No module named 'requests', and I am still unable to install dependencies. —usernamekiran (talk) 02:31, 22 December 2022 (UTC)Reply
The last section of your pwb_venv.sh doesn't match the instructions. Copy it exactly but omit the last line if you don't have any additional dependencies. After running the command from step 3, the end of $HOME/setup-venv.out should be
Successfully built pywikibot
Installing collected packages: urllib3, six, PyMySQL, PyJWT, oauthlib, mwparserfromhell, idna, charset-normalizer, certifi, requests, requests-oauthlib, pywikibot, mwoauth
Successfully installed PyJWT-2.6.0 PyMySQL-1.0.2 certifi-2022.12.7 charset-normalizer-2.1.1 idna-3.4 mwoauth-0.3.8 mwparserfromhell-0.6.4 oauthlib-3.2.2 pywikibot-7.7.2 requests-2.28.1 requests-oauthlib-1.3.1 six-1.16.0 urllib3-1.26.13
After that, if you are having issues with modules not being found, then you skipping parts of the instructions. For pwb commands, you need to be in Kubernates with the venv activated. — JJMC89 (T·C) 03:32, 22 December 2022 (UTC)Reply
@JJMC89: I successfully got the version, but when I tried to run replace.py I got: SyntaxError: Non-UTF-8 code starting with '\xa1' in file /data/project/kiranbot4/pwbvenv/bin/python on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details This didn't use to happen in my previous installation. —usernamekiran (talk) 03:11, 22 December 2022 (UTC)Reply
How are you running it? When I run (pwbvenv) tools.jjmc89-bot-dev@shell-1671679587:~$ python3 pywikibot-core/pwb.py replace -help, I get the expected help output. — JJMC89 (T·C) 03:32, 22 December 2022 (UTC)Reply
I got the expected output when I followed your syntax. The replace script also worked for wikipedia when I used (pwbvenv) tools.kiranbot4@shell-1671700027:~$ python3 pywikibot-core/pwb.py replace <lots of arguments>.
Now I am unable to execute it from a shell script (for cron jobs). After running the shell script, I am getting No module named 'requests' error. The contents of my .sh file are:
#!/bin/bash
source $HOME/.bash_profile
$HOME/pwbvenv/bin/python3 $HOME/pywikibot-core/pwb.py replace -page:Wikipedia:à€§à„‚à€łà€Șà€Ÿà€Ÿà„€/KiranBOT_II -page:User:KiranBOT_II/sandbox_3 -fix:colon -fix:visarg -fix:name1 -exceptinsidetag:ref -exceptinsidetag:comment -exceptinsidetag:math -exceptinsidetag:hyperlink -exceptinsidetag:template -exceptinsidetag:nowiki -lang:mr
@JJMC89: In my previous installation, I didn't use webservice/interactive Kubernetes shell, I used to use virtual environment without starting webservice/interactive Kubernetes shell. What should I do so that I will be able to utilise shell scripts in yaml file? —usernamekiran (talk) 09:23, 22 December 2022 (UTC)Reply
That was the method: special:permalink/2023685#Using the virtual environment without activating it. —usernamekiran (talk) 09:31, 22 December 2022 (UTC)Reply
  • I made a clean local install using the steps provided in the permalink above. I apologise for the inconvenience. Your help, fast, and prompt replies are appreciated a lot JJMC. I apologise again. —usernamekiran (talk) 10:05, 22 December 2022 (UTC)Reply
  • @JJMC89: Using the installation (from permalink method), I was able to run my custom script via console directly (with command tools.kiranbot4@tools-sgebastion-11:~$ $HOME/pwbvenv/bin/python3 $HOME/custom-script.py), I was also able to run the shell scripts directly from the terminal by tools.kiranbot4@tools-sgebastion-11:~$./task1-a.sh. They were running without any problem from terminal command, but if I tried them with cron, my .err file was giving me error somewhere along the lines "No module named 'requests'" I tried to install it, but it kept saying requests is already installed. So I installed the pywikibot again, this time according to the live guide. I made two changes though, I changed the syntax for installing the dependencies in pwb_venv.sh file to
    # install dependencies
    pip install --upgrade pip setuptools wheel
    # in $HOME/pywikibot-core
    pip install $HOME/pywikibot-core mwparserfromhell
    pip install $HOME/pywikibot-core mwoauth
    pip install $HOME/pywikibot-core requests
    pip install $HOME/pywikibot-core pymysql
    pip install $HOME/pywikibot-core bs4
    
and second deviation being, I did not start the webservice shell. With this installation, I am able to run the crons successfully, but I can't run python scripts, or shell scripts directly from terminal. It says "No module named 'requests'". If I try to install "requests" in virtual environment then I am getting "ModuleNotFoundError: No module named 'pip'". I think, I would not be able able to install any dependencies now. What do you think, whats going wrong? —usernamekiran (talk) 14:40, 29 December 2022 (UTC)Reply
@JJMC89: fixed ping. —usernamekiran (talk) 14:41, 29 December 2022 (UTC)Reply
I do not need to run the scripts directly from terminal, I can do that from my local machine installation as well, but I can't run the "version" command (getting "no requests module" error, and I tried both the commands from the guide). I am happy as long as crons work, and they are working currently. —usernamekiran (talk) 16:36, 29 December 2022 (UTC)Reply
The last five lines of pwb_venv.sh should just be this one: pip install $HOME/pywikibot-core[html,mwparserfromhell,mwoauth,mysql]. Currently it is installing pywikibot 5 times. Generally, you shouldn't run scripts from the bastion. If you're going to, the venv in k8s (webservice shell or toolforge-jobs) and on the bastion should not be the same one. Attempting to use the same venv in both places will lead to the errors that you are seeing. — JJMC89 (T·C) 17:40, 29 December 2022 (UTC)Reply