Written by

Date published

March 28, 2025

Optimizing Ceph storage with advanced networking solutions

Collaborators: Marc THOLL, Pol WARNIMONT, Marcin RZECZKOWSKI

Challenges with traditional storage

Over the last few years, our company has been running its virtualization stack with a traditional storage approach, utilizing dedicated storage appliances like Dell EMC storage bays and connecting them to our servers using iSCSI or Fibre Channel (FC). For redundancy, we doubled the appliances. Our full-flash storage solutions were provided by Huawei.

With our continuing push to provide our customers with cloud resources, originating completely from within the borders of Luxembourg, guaranteeing data sovereignty and with our advancing AI research project, the need for a highly scalable and reliable data storage solution has increased significantly.

1st attempt at a solution

Weathermap showing the physical connection of the two SANs

To address this new demand, we initially opted to modify our approach. We moved to a dedicated Storage Area Network (SAN). We operate in two data centers and set up two independent SANs, each using one switch per side, connected via two geographically distinct passive wavelengths. The storage bays and servers were each connected to both SANs, providing us with four storage paths. This setup utilized multi-path to ensure redundancy over multiple paths, which themselves were not redundant. Each server had two network cards, with one port per card connected to each SAN, resulting in a total of four paths. This configuration ensured that we would never lose more than two paths in the event of any single device failing.

Despite these measures, once everything was in place, we noticed that we were not achieving the expected performance. Traditional storage systems often come with challenges such as scalability issues, management overhead, and vendor lock-in, which further motivated our search for alternative solutions.

Given the current geopolitical landscape and the lack of European-made storage hardware, we chose to take a different approach. By eliminating vendor lock-ins and returning to a core aspect of our identity—open-source solutions—we aim to foster innovation and maintain flexibility in our infrastructure.

Exploring Ceph: A new approach to storage

Our system administrators, always eager to explore new technologies, proposed the idea of using Ceph for our storage needs. Our past experiences with traditional storage systems had been less than satisfactory, and we were motivated to explore something innovative and potentially more robust.

Ceph is an open-source storage platform designed to provide excellent performance, reliability, and scalability. It unifies object, block, and file storage in a single cluster, making it a versatile solution for various storage needs.

After an analysis, concluding that Ceph is able to fulfill our requirements in terms of redundancy and scalability, we moved to choosing the right hardware. The decision fell on Supermicro servers. The servers run on 2x AMD EPYC 7313 16-Core Processor and 128 GB of RAM.

For the network setup, we decided to utilize the existing infrastructure with the two independent SANs. After some initial hurdles in finding the correct configurations and waiting for the hardware to arrive, we were ready to start testing. However, we soon encountered a significant obstacle.

In Ceph, there are two distinct networks:

Cluster network:

Used by the servers to synchronize data among themselves.

Public network:

Handles cluster management traffic (MONITOR).
Facilitates Ceph connections to data users, such as virtualization hosts.

The challenge we faced was that the public network in Ceph can only be configured with a single IP network. This limitation could have posed a significant hurdle for our redundancy requirements.

To address this issue, we considered two potential solutions:

New chassis switches:

One proposal involved purchasing new chassis switches. While this might have resolved the issue, it came with a high cost and offered limited additional benefits.

Stacking switches with LACP:

The second idea, which was also widely suggested in forums, involved stacking the switches and using Link Aggregation Control Protocol (LACP). However, this approach contradicted our initial design philosophy for the two SANs. We had deliberately avoided stacking due to past negative experiences, such as stacks splitting during operation, leading to network outages or creating loops that disrupted the entire network.

Innovative routing solutions for Ceph

While brainstorming solutions, our network department proposed an innovative idea. Although Ceph's public network can only handle one IP prefix, there is no requirement for all IPs to be within the same broadcast domain. This insight led us to consider routing traffic as a viable solution.

Initial routing setup

Our initial routing solution involved connecting two Ceph nodes to each SAN Layer 3 switch. The switches would run OSPF (Open Shortest Path First) and announce the networks of each node. Each node was connected using all four of its interfaces in a single LACP (Link Aggregation Control Protocol) port channel. This approach offered several advantages and disadvantages:

Pros:

Ease of setup: OSPF runs on only four switches, simplifying the configuration process.
Interface redundancy: LACP provides redundancy at the interface level, enhancing reliability.

Cons:

Switch failure impact: Losing a switch results in the loss of two entire Ceph nodes, compromising redundancy.
Limited load balancing: Load balancing is confined to the LACP trunk, which may not fully utilize available bandwidth.

Optimized routing solution

Weathermap showing the physical connections of the final routed solution

Dissatisfied with the limitations of the initial approach, we developed a more robust solution: full routing. In this configuration, each Ceph node runs FRRouting (FRR) and OSPF independently. This setup offers several key benefits:

Enhanced redundancy: Each node is connected to two switches, ensuring that the failure of one switch does not result in node loss. To further bolster reliability, Bidirectional Forwarding Detection (BFD) is employed for sub-second failure detection in the event of one or multiple link failures. Additionally, Equal-Cost Multi-Path (ECMP) routing is utilized to provide multiple redundant routes, enhancing the overall resilience of the network.
Efficient use of interfaces: Both the public and cluster networks operate within the same routed network, enabling Ceph to dynamically utilize the server's full bandwidth for both networks based on current demands.
Service stability: Cluster and public services run on loopback IPs, ensuring they are not tied to specific interfaces that could go down.
Improved load balancing: Load balancing is enhanced through the use of Equal-Cost Multi-Path (ECMP) routing, which provides each destination with eight distinct paths. This configuration allows for effective distribution of traffic across multiple routes, optimizing bandwidth utilization and ensuring balanced load sharing.

Results and insights

After implementing the full routing solution for our Ceph cluster, we conducted a series of tests to evaluate its performance and redundancy. It's important to note that the Ceph instance is not yet highly optimized, but the initial results are promising. We achieved random read/write rates of 3.6 GB/s, demonstrating a significant improvement over our previous setup. These tests were conducted using industry-standard benchmarking tools to ensure accuracy and reliability.

In addition to performance testing, we also assessed the redundancy of the system. Our tests revealed that a node can lose all but one link without any impact on functionality, aside from a reduction in available bandwidth. Similarly, all but one of the inter-data center (Inter-DC) links can fail without affecting the system, though this also results in reduced available bandwidth. These findings confirm the robustness and reliability of our routed Ceph solution.

Future plans

Looking ahead, we are considering several enhancements to further improve our Ceph infrastructure. One of our primary goals is to add more inter-DC links between the SANs. This expansion will allow for even better load balancing on the inter-DC side, ensuring optimal performance and resilience.

By continuously evaluating and refining our network configuration, we aim to achieve a highly optimized Ceph cluster that meets our performance and redundancy requirements, ultimately supporting our core business operations more effectively.

Latest articles

Broadcast announcements on your PBX

May 7, 2025

Cloud PBXIntegration

Mixvoip partners with LuxProvide to elevate AI-powered customer intelligence

April 15, 2025

Luxembourg

Fast, flexible, and local: Mixvoip ensures ICT operations for Doctena

February 6, 2025

Co-managed LAN

Talk to an expert

Speak directly with our experts who are ready to provide insights and answers.

Optimizing Ceph storage with advanced networking solutions

Optimizing Ceph storage with advanced networking solutions

Challenges with traditional storage

1st attempt at a solution

Exploring Ceph: A new approach to storage

Innovative routing solutions for Ceph

Initial routing setup

Optimized routing solution

Results and insights

Future plans

Latest articles

Talk to an expert

Telephony

Hardware

Cloud PBX

Insights

Connectivity

Network and IT

Mixvoip

Resources