Jump to content

Catalyst/Scaling

From Wikitech

Project Catalyst will need to scale as it takes on more workloads. The original layout topology was 1 VM with 4 cores and 8 GB of memory. The general plan will be to add more kubernetes nodes (VMs) as needed but this creates some interesting considerations.

How to add more nodes

Adding additional nodes to k3s is straight forward, but please note the "Unknowns" below.

  1. request additional resources from CloudVPS (if needed) for the catalyst project
    1. be sure to request enought storage for the new nodes (80 GB per node is the current recommendation)
  2. create a new VM
    1. on horizon:
      1. create a new 80 GB volume
      2. create new instance with
        1. latest Debian image
        2. 4 cores and 8 GB of memory
        3. the http security group the the VM
        4. the 80 GB volume attached
    2. on the new VM
      1. setup k3s-data partition
        1. mkfs.ext4 /dev/vdb
        2. mkdir /k3s-data
        3. echo "/dev/vdb /k3s-data ext4 defaults,nofail 0 2" >> /etc/fstab
        4. mount -a
      2. install k3s
        1. rsync --rsync-path="sudo rsync" -L k3s.catalyst.eqiad1.wikimedia.cloud:/var/lib/rancher/k3s/server/node-token /tmp/node-token
        2. curl -sfL https://get.k3s.io | sh - --data-dir /k3s-data --token-file /tmp/node-token --server k3s.catalyst.eqiad1.wikimedia.cloud
        3. rm /tmp/node-token
        4. sudo kubectl get nodes

Unknowns

We use the Rancher's Local Path provisioner for persistent storage. Normally it is safe to add more nodes with the following caveats:

  1. data in the local-path provisioner will not be replicated to and cannot be shared with the new nodes
    1. we use have a "repository" pool for new environments to quickly get extensions, etc. by leveraging the multiple persistent volumes that point to the same location on the node's disk
    2. the pool is populated by patchdemo
    3. we have a story to have a separate service manage the pool phab:T376273, but we'll have to ensure that service is running on every node (or use an alternate method of populating the pool phab:T376273#10203342)