User:SDineshKumar/Learning&Dev

From Wikitech

Week 37

Goal
  1. Get 1st Commit to Prod - ✅
    1. - https://gerrit.wikimedia.org/r/c/operations/puppet/+/721413
  2. Explore and figure out what is the most happening project and high priority here - ✅
    1. Gitlab Integration and MWCLI seems to be the major projects happening.
Mentors / Facilitators
  1. Ryan Kemper and Erik (Ebernhardson) from Search team.
Learnings
  1. Need for creation of two accounts - WikiMediaDev and MediaWiki - Link
  2. Shockingly, There's seems to be some minimal CD to actually NO CD and looks like most deployments happen manually.
    1. Deployments happen through a Train Process - https://wikitech.wikimedia.org/wiki/Deployments/Train
  3. Build, CIs and Config Mgmnt are handled by Jenkins & Puppet respectively.
    1. Puppet looks like steep learning curve due to its own declarative language and compilation difficulties.
      1. https://wikitech.wikimedia.org/wiki/Puppet
      2. https://wikitech.wikimedia.org/wiki/Puppet_coding
    2. Every team / devs has to play with puppet config to setup, plan and deploy their code to prod and that's one of the reason the repo is big and gets more merges than usual (highly active repo).
  4. One of the major project happening around in build, release and SRE teams is setting up Gitlab. Seems the goal is to move towards CD.
  5. Deployments are managed by (an intuitively named) tool - SCAP - https://wikitech.wikimedia.org/wiki/Scap
  6. Gerrit's UX felt bit un-obvious. Getting used to it will make it go away.
    1. Every code review is a patch. No headache to push a separate branch to Repo's remote and merge there. Just a patch gets merged on remote.
    2. New devs can also add themselves as reviewers to a CodeReview and provide comments/suggestions and improve code by creating a new patch.
    3. Initiation of auto build by Jenkins needs a higher level permission.
  7. Phab is cool for managing issue/tasks and projects (both sprint & kanban) and surprisingly has its own pastebin management as well.
    1. Though Phab development is stopped by Q2 2021 and Gitlab also provides its own issue management and boards, we will still continue to stick with Phab - https://phabricator.wikimedia.org/T245575
    2. Seems we need to sort the tasks on Phab to order by priority / last updated time. Saving a default setting on a board will make it universal to all of the current board's users. This looks not so intuitive.
  8. To help out Releng team (especially to look into Prod) will take some more time to earn trust. https://phabricator.wikimedia.org/T291235
  9. Surprisingly, Search feature on Wikipedia is actually run by Elastic Search Service which is a platform team here and also runs a data querying service - https://www.mediawiki.org/wiki/Wikimedia_Search_Platform
  10. Tools and microservices on ToolsForge are run on K8s.
  11. Thankfully Docker is used exstensively as expected. It appears most of the tools / services have Debian as their base image. CentOS or RedHat based OS usage seems minimal to zero in overall WMF.
  12. Seems we have a new DC in Singapore.
Yet to figure out
  1. If Elastic Search is running to serve Prod Article search traffic, Are we also maintaining and using a different ELK stack for going through log management? ✅
    1. Yes, LogStash and ELK stack run for managing logs. They are handled by SRE Observability team.
  2. How does paging work here? Who is getting paged right now? - ✅
    1. Klaxon - https://klaxon.wikimedia.org/ - Closed access only to Staff/Employed engineers for safety reasons. It'll be great of such tools say upfront that the access is restricted so that Volunteers/Contributors need not try to login and fail.
  3. What are burning tasks now and who is firefighting them?
  4. Is there a tool for reviewing and verifying Change Management with peers/contributors before touching Prod?
    1. Seems Phabricator can also handle that as well, along with calendaring. Yet to see a separate tool for CM
  5. Do we have runbooks? ✅
    1. Yes - https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Runbooks - Go through them as time permits<LATER>
  6. Where / How does Load testing happen before deploying to Prod?
  7. Is there Chaos testing environment?
  8. How does security team look for vulnerabilities in tools or services built on tools forge or on VPS ?
    1. How does threat modelling happen on the sevices built on tools forge and who does that?
Next Goal
  1. Look up further to contribute to Releng / Search team by things which use/run Java / Python / Ruby that can make it to Prod.

Week 41

Goal
  1. Received a suggestion to pick the task early this week https://phabricator.wikimedia.org/T278378 from Gehel and collaborate with Ryan/Erik - WIP
    1. Synced up with Ryan and checked his commit https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/716532
    2. Plan from my side is two-fold.
      1. CR-1 => Move the endpoints config to puppet and deploy the same.
      2. CR-2 => Remove the configs in spicerack.
  2. Look up further to contribute to Releng/Search team by things which use/run Java / Python / Ruby that can make it to Prod. ✅ => Task 1.
  3. Walk through some designs and see how design reviews happen via Pholio. <LATER>
  4. Explore what's happening with K8s and make a commit there, probably helm chart!<LATER>
  5. Review someone's code and help to get it to Prod if possible.
  6. Figure out
    1. why create my userpage on meta.wikimedia gets autodeleted?
    2. Is there a provisioning for users to change username? If so, change TheReadOnly to SDineshKumar on meta, phab, etc.
  7. <LATER> As you learn, see if you can create a contribution doc specific to DevOps/SRE and get itreviewed and approved to be featured on https://www.mediawiki.org/wiki/How_to_contribute
  8. <LATER> Read upon past incidents and corrections performed- https://wikitech.wikimedia.org/wiki/Incident_status
Learnings
  1. Setting up watchlist on projects page is awesome.
  2. Weekly deployments happening via Train appears to be high priority done manually.
    1. https://wikitech.wikimedia.org/wiki/How_to_deploy_code
  3. Seems Gitlab is open for public contributors see if setting up SSH keys work. - ✅
    1. It Works. Uploading same keys from setup
  4. Surpringly, a pic from one of the other Silicon Valleys is featured rather than the Bay Area on the contribution page. Also lists a really good comprehensive list of learning materials for furure play with PHP when needed.
  5. Permissions to check for before requesting for help or pick up a task to contribute.
    1. https://wikitech.wikimedia.org/wiki/SRE/LDAP/Groups
  6. [Best_Practices] - https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects#Good_practices_for_name_and_description
  7. SREs mainly work in one of these projects/teams - https://wikitech.wikimedia.org/wiki/SRE. So, DevOps & Build/Release Engineering does not fall under the SRE umbrella in here. And, Oncall week is being called clinic duty here. This sounds like a classical name.
ToDo / Yet to figure out
  1. Figure out Why won't SSO work and why a contributor needs to create 2 separate accounts at toolsforge/VPS?
    1. https://wikitech.wikimedia.org/wiki/Help:Create_a_Wikimedia_developer_account#Step_2:_Decide_which_service_you_need_a_Wikimedia_developer_account_for