Data Engineering/Systems/Ceph
< Data Engineering | Systems
The Data Engineering team is currently evaluating Ceph for two purposes:
- To consider the use of Ceph and its S3 compatible interface as a replacement for (or addition to) Hadoop's HDFS file system
- To provide block storage capability to workloads running on the dse-k8s Kubernetes cluster.
Project Status
The project is in a pre-production state. We have five servers in eqiad and currently the cluster is in the process of being commissioned.
Here is the original design document
The Phabricator epic ticket is: T324660
Cluster Architecture
At present we will be running a co-located configuration with five hosts. Each host will run:
- 1 monitor daemon
- 20 OSD daemons
- 1 (or more) radosgw daemons
Each host has a 10 Gbps network connection to its switch. If we start to hit this throughput ceiling, we can request to increase this to 25 Gbps by changing the optics.
Storage Configuration
Each of the five hosts has the following primary storage devices:
Count | Capacity | Technology | Make/Model | Total Capacity | Use Case |
---|---|---|---|---|---|
12 | 18 TB | HDD | Seagate Exos X18 nearline SAS | 216 TB | Cold tier |
8 | 3.8 TB | SSD | Kioxia RM6 mixed-use | 30.4 TB | Hot tier |
That makes the raw capacity of the five-node cluster:
- Cold tier: 1.08 PB
- Hot tier: 152 TB