We recently enhanced our Cloud Storage by additional nodes, which we call “Next Gen Cloud Storage”, due to their capability to provide increased performance and capacity. Servers which are providing storage for our cloud, are called “Storage Node“.
The setup consists of four HP Enterprise servers, which provide 18TB of storage each. Storage nodes are based on Enterprise SAS disks running on Hardware Raid 10, which offers great performance already.
To additionally boost storage performance on higher levels, each node consists of two NVMe drives operating in Raid 1, which get used as Cache, before I/O hits SAS Raid10, which consists of 12 drives per node.
Each node also has redundant 20Gbit connectivity, which can be easily extended to 40Gbit, if needed.
In terms of network connectivity, two 10G/40G Juniper QFX switches provide far way more than enough capacity, while operating redundantly within Juniper Virtual Chassis.
The Storage Layer is operated as GlusterFS Distributed+Replicated volumes, which utilizes bcache to cache data on NVMe drives. For data safety and consistency, we’re utilizing arbitrated volumes.
In terms of performance monitoring, we implemented both monitoring checks for iowait as well as drive and cache response time, which it’s metrics get pulled into InfluxDB, in order to present them nicely to our staff within Grafana.
Why NVMe Cache?
We are already running several HDD and SSD nodes and wanted to boost disk I/O on another level, so called “Next Generation”. We are pretty convinced in terms of performance gains offered by NVMe drives, which we already use for several dedicated servers.
Next on our roadmap is to get our new NVMe cached SSD (Gluster) Storage up and running. We’re currently awaiting a delivery of several SSD drives and expect to have another highly performing SSD Storage Cluster deployed until the end of next week.
Beside that, we’re already planning to invest into pure NVMe Storage, utilizing U.2 NVMe drives and 40Gbit Ethernet.