Talk:SRE/business case/Disposable Development Environment

Rendered with Parsoid
From Wikitech
Latest comment: 11 months ago by JHathaway in topic The scope of the Dev envs isn't very clear

5.2.5

jbond, can you expand I why this would probably make pontoon incompatible with WMCS? JHathaway (talk) 23:00, 29 July 2022 (UTC)Reply

i have tried to clarify but ill expand a bit here. in pontoon we need to disable a bunch of puppet classes that are incomparable with the WMCS base puppet policy. The main one that comes to mind is the admin module as WMCS uses ldap authentication. maintaining compatibility with WMCS whilst trying to replicate production as close as possible seems like a counter-intuitive goal to and has more been born out of working with what wee have. Further to this by maintaining the WMCS compatibility we ensure that pontoon only works with WMCS, needing additional work to be compatible with a more classic vanilla cloud environments e.g. GCP, AWS etc. Jbond (talk) 09:14, 1 August 2022 (UTC)Reply
thanks jbond, that helps, I tweaked the clarification a bit. JHathaway (talk) 16:36, 1 August 2022 (UTC)Reply

Owner?

I think an important section should be added about whose responsibility it would be to maintain whatever solution is developed. Development environment methodologies without an owner tend to degrade over time and require significant effort to keep running smoothly. JHathaway (talk) 23:00, 29 July 2022 (UTC)Reply

I agree however I'm curious on what @Jobo thinks as they created the business case template. that said Im not sure i have an answer to who the owner would be Jbond (talk) 09:17, 1 August 2022 (UTC)Reply
Ensuring sustainable maintenance is a very valid concern, and we need to decide on ownership before moving forward with project. Maybe some cooperation with Release engineering would be advisable. I'll give it some thought and ask around. That being said, even though I think it's important information to include in this project, I wouldn't add it as a permanent section of business case template. Jobo (talk) 11:40, 8 August 2022 (UTC)Reply

Physical Environment?

I think it would be helpful to a Physical Environment as another option. Though it be a tremendous amount of work, and has significant downsides, it does provide features that are impossible to obtain with any of the other solutions, e.g. test a ganeti upgrade on real hardware, or test our re-imaging process. JHathaway (talk) 23:02, 29 July 2022 (UTC)Reply

i think physical hardware environments are useful however im not sure it fits with this business case, specifically as its name "Disposable ...". I see this case more about creating the tooling and shared services for development environments as opposed to actually creating the environments. i also think that we should focus this proposal strictly on the development process and not things like staging or performance testing

I think it would be nice if the tooling could also spin up a development environment but my gut feeling is that would add significant amount of work to support, as a side not you can generally ask dc-ops for some spare hardware if you need to test something on physical hardware Jbond (talk) 09:21, 1 August 2022 (UTC)Reply

Perhaps it would be helpful to add an out of scope section, which acknowledges that a physical staging environment would have unique benefits, but this proposal is targeting the many other development needs that can be served without having a duplicate physical environment? JHathaway (talk) 16:45, 1 August 2022 (UTC)Reply
There is an intersection with the network world. For example testing a Bird upgrade would benefit recreating the servers to network interactions. But for the scope of this project they could be replaced by simplified VMs or daemons running only the needed network service.
@Cathal Mooney did some work regarding network virtualization for testing. It's probably out of scope to have an unified system but to keep in mind in case some efforts can be mutualized. Ayounsi (talk) 12:21, 5 August 2022 (UTC)Reply

Sample Development Tasks

I think it might be helpful to add sample development tasks and show how the different development options might help their development. Here are some sample projects and how I would order which environment methodology would be the most helpful:

General Methodologies

  1. Physical Deployment
  2. Cloud Deployments
  3. Container Deployments

Sample Dev Work

  1. Deploying single sign-on solution for our infrastructure
    1. Container Deployments
    2. Cloud Deployments
    3. Physical Deployment
  2. Network automation change
    1. Physical Deployment
  3. Testing a Ganeti Upgrade
    1. Physical Deployment
  4. Adding config parameters to Apache
    1. Container Deployments
    2. Physical Deployment
    3. Cloud Deployments
  5. Testing a new version of Exim
    1. Physical Deployment
    2. Container Deployments
    3. Cloud Deployments JHathaway (talk) 23:07, 29 July 2022 (UTC)Reply
    My personal view is that network automation changes should be very much out of scope of this. I think @Cathal Mooney has been playing with a tool similar to GNS3 which would probably be better for network automation testing.
    Thanks @John Bond. I think the network is sufficiently different from the rest of the environment that it probably makes sense to treat it as a special case for now yes. I'm working on a framework to build a virtualized replica of our networking nodes using containerlab which will hopefully be able to cover much of what we need.
    For "Adding config parameters to Apache" can you expand on why a physical environment is better then a disposable cloud environment. similar why is a physical environment the most preferred option for exim changes? My general view is that the cloud environment should be as close to production as to be no different from physical (other then performance). Containers ewill likley have some differences but we should aim for them to be small and known. so the difference between cloud vs container becomes a choice between accuracy (VM's) vs speed (containers) Jbond (talk) 09:26, 1 August 2022 (UTC)Reply
    I think that is okay if networking automation testing is outside of the scope of this proposal, but I think it would be valuable to note that in the doc. JHathaway (talk) 16:46, 1 August 2022 (UTC)Reply
    Apologies for the lack of clarity in my topic. I think of a physical environment as having the highest fidelity to production in comparison to other methods. That high fidelity makes it easier in some instances to have confidence that a change will behave as you expect in production. For example exim, or rather email in general relies on external systems, public dns, public internet, public mail servers. Ideally a physical environment would connect with those systems in a similar manner as production and provide confidence that a change, say to how we perform SPF checks in exim, will work as expected in production. Hopefully a cloud or container based solution would allow you to mock out some of those interactions, but I don't think it is an easy problem to solve. The apache config case is less compelling, as I agree with you that a cloud environment or container environment could get very close to production, but it is still nice to know that your curl command is hitting almost the exact same stack as it will in production. JHathaway (talk) 16:56, 1 August 2022 (UTC)Reply

The scope of the Dev envs isn't very clear

Hi,

Thanks for this! I wanna point out that reading through the doc, I had issues figuring out what the scope is as well as the intended audience. My current understanding is that the target is specifically people that maintain infrastructure (of which SRE is a very large part) and the scope is easier Puppet module/profile/role development. Is this understanding correct?


Even if my understanding is wrong, I think that the doc (and at least the parent task in the resulting task hierarchy under T337970) needs to make that pretty the 2 above items (intended audience and scope) clear, otherwise people will be struggling to understand the use cases and make assumptions, only to be frustrated when they turn out wrong.Alexandros Kosiaris (talk) 07:54, 2 June 2023 (UTC)Reply

thanks @Alexandros Kosiaris for the helpful feedback. I took an initial stab at clarifying the intent of the work on T337970. Please let me know if that helps, I admit I struggled a bit to find the clearest description. JHathaway (talk) 21:40, 2 June 2023 (UTC)Reply