Wikimedia Cloud Services team/EnhancementProposals/2020 Network refresh/2021-02-24-checkin

From Wikitech

2021-02-24 WMCS network checkin

agenda


notes

  • No more exceptions for dumps. Two exceptions dropped! ;-)
  • procurement of hardware for new edge network setup:
    • codfw cloudsw https://phabricator.wikimedia.org/T272348 (needs discussion)
    • No existing underrun for this
    • Concern about time and what the need is for? (testing? to match prod?)
    • Ok to be Q4 or later
    • Purpose: Dallas is mirrored setup of production deployment in eqiad. Used as testbed for making changes. Want to ensure Dallas matches the changes in eqiad with cloudgw boxes
    • This would not be be a part of any "second" region for cloud.
    • This sounds potentially expensive. Can we reduce the need for testing replica? Can it be virtualized?
    • scale is different, it's a minimal representation
    • For SRE, "testing" in production, use incremental changes. No real testbed exists. Encourage to find ways to do this.
    • Second region could remove need for this
  • Cross-Realm traffic guidelines
    • Proposal to publish the 3 common use cases, with the last complex case as a DRAFT
    • Request to create a list of services, and current thoughts on how they might map
      • Want to avoid forcing decisions before we understand them
    • Would be useful to understand the scope of the problems and services, suggesting a wikitable
    • End goal is to be able to craft new services utilizing the new document
    • Hard to map services to cases, but could map use cases to cases
    • Last case is like a loose HV, manaully managed with puppet, not controlled by neutron
    • "we can map flows more than we can map services"
    • Toolforge is a composition from many services, like NFS