Wikimedia Cloud Services team/EnhancementProposals/2020 Network refresh/2021-01-12-checkin
Appearance
2021-01-12 WMCS network checkin
agenda
- Q3 goals sync
- make sure we know each other Q3 OKRs and such
- procurement of hardware for new edge network setup:
- codfw cloudgw device procurement https://phabricator.wikimedia.org/T268016
- eqiad cloudgw devices procurement https://phabricator.wikimedia.org/T270705
- renaming labtestvirt2003 to cloudgw https://phabricator.wikimedia.org/T271519
- codfw cloudsw TBD
- Production Cloud services relationship
notes
Q3 OKRS: Cloud Services: 1. Finish introducing cloudgw into the network (see HW phabricator tickets) 2. Remove two NAT exceptions 3. Establish guidelines for cloud and production relationship (deliver as we go, publish scenarios as we agree) SRE 1. New resources (network engineer!) 2. 10G / DC Upgrades Faidon: In general, let's try and limit the number of open items / task under consideration. Focus and finish things to completion. Front-line defenses continues to be important, especially in todays climate Still trying to catch up from Q2, time continues to be limited Can we delay cloudgw? What would this block or prevent? Brooke: The conversation will change architecture decisions. How to handle NFS, wikireplicas, and other needs going forward. However, cloudgw is mostly an implementation piece. Nicholas: The design / understanding for the relationship is important and shouldn't be delayed. Arturo: People don't know how to cross realms. So ideas are not pursued because they feel it's impossible. Proposal is the provide guidelines for how to introduce services that have cross-realm data needs. And it can be an evolving document. https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Production_Cloud_services_relationship Faidon: Need similar guidelines for production. Some should use anycast, some should use LVS,etc. This document could link to a similar abstraction to production guidelines. Cross-tenant? Cross-region? Is this layer2 or layer3 (tunnel or isolated?) No answers today, but multi-dc is coming. In VM and in cloud. Not in VM and not in cloud. Anything in-between or mix? Arturo: Openstack Ironic does this. Proposal is poor man's implementation of this. Faidon: We are likley to see more of this. There's always been a gap to "test" a new thing. You can do VM's or be in production. No real testbed. Where does toolforge fit? Arturo: No plans to run toolforge outside of cloud vps. Distant feature is ?? Brooke: Toolforge is interesting because it touches every case. Virtaulized on VM's. Then has NFS partly utilized to get credentials for production DB's. Also talks to wikireplicas in production. No plans to move out. K8's network overlaid on virtual network. Faidon: Any additional overlay? Brooke: Yes, service network on top using calico.