Wikimedia Cloud Services team/EnhancementProposals/2020 Network refresh/2021-02-24-checkin
Appearance
2021-02-24 WMCS network checkin
agenda
- https://phabricator.wikimedia.org/T271058
- buy new boxes? WMCS to decide
- procurement of hardware for new edge network setup:
- codfw cloudsw https://phabricator.wikimedia.org/T272348 (needs discussion)
- Cross-Realm traffic guidelines document (previously known as Production Cloud services relationship)
notes
- Faidon: on cloudnet1004 HW issues: https://phabricator.wikimedia.org/T271058 feel free to request new boxes in case we need to
- No more exceptions for dumps. Two exceptions dropped! ;-)
- CloudNAT https://phabricator.wikimedia.org/T209011
- Faidon / SRE ready to help
- WMCS will look again and reach out
- procurement of hardware for new edge network setup:
- codfw cloudsw https://phabricator.wikimedia.org/T272348 (needs discussion)
- No existing underrun for this
- Concern about time and what the need is for? (testing? to match prod?)
- Ok to be Q4 or later
- Purpose: Dallas is mirrored setup of production deployment in eqiad. Used as testbed for making changes. Want to ensure Dallas matches the changes in eqiad with cloudgw boxes
- This would not be be a part of any "second" region for cloud.
- This sounds potentially expensive. Can we reduce the need for testing replica? Can it be virtualized?
- scale is different, it's a minimal representation
- For SRE, "testing" in production, use incremental changes. No real testbed exists. Encourage to find ways to do this.
- Second region could remove need for this
- Cross-Realm traffic guidelines
- Proposal to publish the 3 common use cases, with the last complex case as a DRAFT
- Request to create a list of services, and current thoughts on how they might map
- Want to avoid forcing decisions before we understand them
- Would be useful to understand the scope of the problems and services, suggesting a wikitable
- End goal is to be able to craft new services utilizing the new document
- Hard to map services to cases, but could map use cases to cases
- Last case is like a loose HV, manaully managed with puppet, not controlled by neutron
- "we can map flows more than we can map services"
- Toolforge is a composition from many services, like NFS