Wikimedia Cloud Services team/EnhancementProposals/2020 Network refresh/2021-01-12-checkin

From Wikitech

2021-01-12 WMCS network checkin

agenda

notes

Q3 OKRS:
    Cloud Services:

    1. Finish introducing cloudgw into the network (see HW phabricator tickets)

    2. Remove two NAT exceptions

    3. Establish guidelines for cloud and production relationship (deliver as we go, publish scenarios as we agree)

    SRE

    1. New resources (network engineer!)

    2. 10G / DC Upgrades



Faidon:

    In general, let's try and limit the number of open items / task under consideration. Focus and finish things to completion.


    Front-line defenses continues to be important, especially in todays climate


    Still trying to catch up from Q2, time continues to be limited


    Can we delay cloudgw? What would this block or prevent?


Brooke:

    The conversation will change architecture decisions. How to handle NFS, wikireplicas, and other needs going forward. However, cloudgw is mostly an implementation piece.


Nicholas:

    The design /  understanding for the relationship is important and shouldn't be delayed.


Arturo: 

    People don't know how to cross realms. So ideas are not pursued because they feel it's impossible. Proposal is the provide guidelines for how to introduce services that have cross-realm data needs. And it can be an evolving document.


https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Production_Cloud_services_relationship

Faidon:

    Need similar guidelines for production. Some should use anycast, some should use LVS,etc. This document could link to a similar abstraction to production guidelines.


    Cross-tenant? Cross-region? Is this layer2 or layer3 (tunnel or isolated?)


    No answers today, but multi-dc is coming.


    In VM and in cloud. Not in VM and not in cloud. Anything in-between or mix?


Arturo:

        Openstack Ironic does this. Proposal is poor man's implementation of this.


Faidon:

    We are likley to see more of this. There's always been a gap to "test" a new thing. You can do VM's or be in production. No real testbed.


    Where does toolforge fit?


Arturo:

    No plans to run toolforge outside of cloud vps. Distant feature is ??


Brooke:
    Toolforge is interesting because it touches every case. Virtaulized on VM's. Then has NFS partly utilized to get credentials for production DB's. Also talks to wikireplicas in production. No plans to move out. K8's network overlaid on virtual network.
    
Faidon:
    Any additional overlay?
    
Brooke:
    Yes, service network on top using calico.