Traffic cache hardware

From Wikitech
Jump to navigation Jump to search

This is an overview of our currently deployed and active cache hardware at the Traffic layer.

Hardware classes

We have purchased and retired multiple classes of server hardware over the years in staggered timeframes. In the general case we'll always have multiple overlapping classes of hardware as various warranty and support periods expire. These are the currently-active hardware configuration classes:

Label Model CPU Type/Speed Phys Cores RAM Cache storage NIC speed, type, driver DC Ops Config
L Dell R430 2x Xeon E5-2650 v4 @ 2.20GHz 24 384 GB 2x Intel S3710 800GB SSD 10G, BCM57810, bnx2x Legacy
F1 Dell R440 2x Xeon Gold 5118 @ 2.3Ghz 24 384 GB 1x Samsung PM1725a 1.6TB NVMe (U.2 SFF) 10G, BCM57412, bnxt F-10G, +storage card
F2 Dell R440 2x Xeon Gold 5118 @ 2.3Ghz 24 384 GB 1x Samsung PM1725b 1.6TB NVMe (HHHL Card) 10G, BCM57412, bnxt F-10G, +storage card
F3 Dell R440 2x Xeon Gold 5118 @ 2.3Ghz 24 384 GB 1x Samsung PM1725b 1.6TB NVMe (HHHL Card) 10/25G, BCM57412, bnxt F-10G, +storage card, +10/25G NIC variant
F4-T Dell R450 2x Xeon Gold 5318Y @ 2.1Ghz 48 512 GB 1x 6.4TB NVMe Card 10/25G, BCM57414, bnxt F
F4-U Dell R450 2x Xeon Gold 5318Y @ 2.1Ghz 48 512 GB 2x 6.4TB NVMe Card 10/25G, BCM57414, bnxt F
  • The new DC Ops Config F from mid-2022 is exclusive to these edge cache roles, and thus our F4 config using it includes the storage cards and NIC upgrades as part of its base definition.

Deployed hardware

Currently deployed hardware by data center and caching cluster.

Current as of October 2022
Data center cache_text cache_upload total
eqiad 8x F1 8x F1 16x F1
codfw 8x F2 8x F2 16x F2
esams 8x F2 8x F2 16x F2
ulsfo 8x F4-T 8x F4-U 16x F4
eqsin 6x L + 2x F2 6x L + 2x F2 12x L + 4x F2 = 16
drmrs 8x F3 8x F3 16x F3
total 6x L + 42x Fn = 48 6x L + 42x Fn = 48 12x L + 84x Fn = 96

Proposed FY22-23 changes + refreshes

  • ulsfo and eqsin get refreshed to new-standard 16xF4 config in first half of the FY.
  • The 8x off-cycle (newer, still in warranty) F-nodes in ulsfo and eqsin are shipped to eqiad.
  • Eqiad installs these into the new E+F rows for a number of reasons:
    • Utilize the new rows in eqiad in general (more load/redundancy spread)
    • Test impact of expanded server counts in general
    • Re-use this good hardware instead of tossing it, so we don't waste it just for being off-cycle purchased
    • Buys us time to push natural eqiad warranty refresh out another FY, spreading out refresh cycles better (too many this year!)
    • Allow the F4 refreshes in ulsfo+eqsin to be whole-DC upgrades, since F4 enables whole-DC architecture changes in traffic routing.
  • esams gets refreshed in Q4 to the same new F4 config as ulsfo+eqsin (we have some time and space to adjust this based on earlier outcomes if necc)
Data center cache_text cache_upload total Note
eqiad 8x F1 + 4x F2 8x F1 + 4x F3 16x F1 + 4x F2 + 4x F3 Reinforced this FY
codfw 8x F2 8x F2 16x F2 no changes this FY
esams 8x F4-T 8x F4-U 16x F4 refreshed to F4 in Q4
ulsfo 8x F4-T 8x F4-U 16x F4 refreshed to F4 in Q1
eqsin 8x F4-T 8x F4-U 16x F4 refreshed to F4 in Q2
drmrs 8x F3 8x F3 16x F3 no changes this FY
total 52x Fn 52x Fn 104x Fn