This is the first blog post of a series of performance blog posts regarding ceph storage cluster.

In our daily business we see many setups that perform good and most of them very badly. First things first:

What can affect the overall performance of a ceph-cluster?

Slow network (latency!), bad/slow disks, lack of CPU-cycles.

5 cents about networks:

What we have obverved is that in 90% of all setups, the bottleneck is the network itself. One can think of the network as the wire between the mainboard and the disk in a regular computer.

With ceph, latency matters. Even though the disk itself is rather fast, if one has a slow network (long latency) data can not be written/read from other ceph nodes.

Here are some numbers you should test/have in mind for your own setup:

# ping -c 100 IP of your ceph node

Ran above ping test from one node to another.

0.05-0.07ms latency is ok.

# Setup 1 – Intel x710 10G fibre cross-connected to each other node. No switch involved.

rtt min/avg/max/mdev = 0.039/0.061/0.105/0.017 ms

# Setup 2 Mellanox 40G q-sfp with mellanox switch.

rtt min/avg/max/mdev = 0.036/0.069/0.117/0.020 ms

# Setup 3 Intel x710 10G fibre – connected to a Zyxel 10G Switch

rtt min/avg/max/mdev = 0.109/0.215/0.390/0.050 ms

So you see that each device – in between the nodes – creates additional latency (compare same intel nics with switch and without).

Also notice that even though there is a switch involved in the mellanox setup, the 40G links reduce the latency.

5 cent about disks/storage:

If you test disks, always test it in the same way to have comparable values.

Here are some tests (package fio must be installed)

fio –randrepeat=1 –ioengine=libaio –direct=1 –gtod_reduce=1 –name=test –filename=test –bs=4k –iodepth=64 –size=4G –readwrite=randrw –rwmixread=75

All tests are done inside a VM on top of a ceph cluster.

# Setup 1 ADATA SX8200PNP NVMe with a PCI-E to M2 adapter card.

Overall ~ 35k IOPS read and 12k IOPS write with x710 intel 10G intereconnected without switch.

Things to check/keep in mind:

PCI-E adapter cards operating in full pci-e mode (x8 instead of slow x4)

PCI latest version V4,v5 and not v2 from 2007.

are the adapter cards in the „right“ slot? If you have moer than one CPU, check board manual which pci-slots belong to which CPU.

AppArmor disabled (should be off)

all cpu mitigations off (unsafe but in isolated environments possible).

Categories: Blog