Optimizing Ceph storage with advanced networking solutions
Collaborators: Marc THOLL, Pol WARNIMONT, Marcin RZECZKOWSKI
Challenges with traditional storage
Over the last few years, our company has been running its virtualization stack with a traditional storage approach, utilizing dedicated storage appliances like Dell EMC storage bays and connecting them to our servers using iSCSI or Fibre Channel (FC). For redundancy, we doubled the appliances. Our full-flash storage solutions were provided by Huawei.
With our continuing push to provide our customers with cloud resources, originating completely from within the borders of Luxembourg, guaranteeing data sovereignty and with our advancing AI research project, the need for a highly scalable and reliable data storage solution has increased significantly.
1st attempt at a solution
To address this new demand, we initially opted to modify our approach. We moved to a dedicated Storage Area Network (SAN). We operate in two data centers and set up two independent SANs, each using one switch per side, connected via two geographically distinct passive wavelengths. The storage bays and servers were each connected to both SANs, providing us with four storage paths. This setup utilized multi-path to ensure redundancy over multiple paths, which themselves were not redundant. Each server had two network cards, with one port per card connected to each SAN, resulting in a total of four paths. This configuration ensured that we would never lose more than two paths in the event of any single device failing.
Despite these measures, once everything was in place, we noticed that we were not achieving the expected performance. Traditional storage systems often come with challenges such as scalability issues, management overhead, and vendor lock-in, which further motivated our search for alternative solutions.
Given the current geopolitical landscape and the lack of European-made storage hardware, we chose to take a different approach. By eliminating vendor lock-ins and returning to a core aspect of our identity—open-source solutions—we aim to foster innovation and maintain flexibility in our infrastructure.
Exploring Ceph: A new approach to storage
Our system administrators, always eager to explore new technologies, proposed the idea of using Ceph for our storage needs. Our past experiences with traditional storage systems had been less than satisfactory, and we were motivated to explore something innovative and potentially more robust.
Ceph is an open-source storage platform designed to provide excellent performance, reliability, and scalability. It unifies object, block, and file storage in a single cluster, making it a versatile solution for various storage needs.
After an analysis, concluding that Ceph is able to fulfill our requirements in terms of redundancy and scalability, we moved to choosing the right hardware. The decision fell on Supermicro servers. The servers run on 2x AMD EPYC 7313 16-Core Processor and 128 GB of RAM.
For the network setup, we decided to utilize the existing infrastructure with the two independent SANs. After some initial hurdles in finding the correct configurations and waiting for the hardware to arrive, we were ready to start testing. However, we soon encountered a significant obstacle.
In Ceph, there are two distinct networks:
- Cluster network:
- Used by the servers to synchronize data among themselves.
- Public network:
- Handles cluster management traffic (MONITOR).
- Facilitates Ceph connections to data users, such as virtualization hosts.
The challenge we faced was that the public network in Ceph can only be configured with a single IP network. This limitation could have posed a significant hurdle for our redundancy requirements.
To address this issue, we considered two potential solutions:
- New chassis switches:
- Stacking switches with LACP:
One proposal involved purchasing new chassis switches. While this might have resolved the issue, it came with a high cost and offered limited additional benefits.
The second idea, which was also widely suggested in forums, involved stacking the switches and using Link Aggregation Control Protocol (LACP). However, this approach contradicted our initial design philosophy for the two SANs. We had deliberately avoided stacking due to past negative experiences, such as stacks splitting during operation, leading to network outages or creating loops that disrupted the entire network.
Innovative routing solutions for Ceph
While brainstorming solutions, our network department proposed an innovative idea. Although Ceph's public network can only handle one IP prefix, there is no requirement for all IPs to be within the same broadcast domain. This insight led us to consider routing traffic as a viable solution.
Initial routing setup
Our initial routing solution involved connecting two Ceph nodes to each SAN Layer 3 switch. The switches would run OSPF (Open Shortest Path First) and announce the networks of each node. Each node was connected using all four of its interfaces in a single LACP (Link Aggregation Control Protocol) port channel. This approach offered several advantages and disadvantages:
Pros:
- Ease of setup: OSPF runs on only four switches, simplifying the configuration process.
- Interface redundancy: LACP provides redundancy at the interface level, enhancing reliability.
Cons:
- Switch failure impact: Losing a switch results in the loss of two entire Ceph nodes, compromising redundancy.
- Limited load balancing: Load balancing is confined to the LACP trunk, which may not fully utilize available bandwidth.
Optimized routing solution
Dissatisfied with the limitations of the initial approach, we developed a more robust solution: full routing. In this configuration, each Ceph node runs FRRouting (FRR) and OSPF independently. This setup offers several key benefits:
- Enhanced redundancy: Each node is connected to two switches, ensuring that the failure of one switch does not result in node loss. To further bolster reliability, Bidirectional Forwarding Detection (BFD) is employed for sub-second failure detection in the event of one or multiple link failures. Additionally, Equal-Cost Multi-Path (ECMP) routing is utilized to provide multiple redundant routes, enhancing the overall resilience of the network.
- Efficient use of interfaces: Both the public and cluster networks operate within the same routed network, enabling Ceph to dynamically utilize the server's full bandwidth for both networks based on current demands.
- Service stability: Cluster and public services run on loopback IPs, ensuring they are not tied to specific interfaces that could go down.
- Improved load balancing: Load balancing is enhanced through the use of Equal-Cost Multi-Path (ECMP) routing, which provides each destination with eight distinct paths. This configuration allows for effective distribution of traffic across multiple routes, optimizing bandwidth utilization and ensuring balanced load sharing.
Results and insights
After implementing the full routing solution for our Ceph cluster, we conducted a series of tests to evaluate its performance and redundancy. It's important to note that the Ceph instance is not yet highly optimized, but the initial results are promising. We achieved random read/write rates of 3.6 GB/s, demonstrating a significant improvement over our previous setup. These tests were conducted using industry-standard benchmarking tools to ensure accuracy and reliability.
In addition to performance testing, we also assessed the redundancy of the system. Our tests revealed that a node can lose all but one link without any impact on functionality, aside from a reduction in available bandwidth. Similarly, all but one of the inter-data center (Inter-DC) links can fail without affecting the system, though this also results in reduced available bandwidth. These findings confirm the robustness and reliability of our routed Ceph solution.
Future plans
Looking ahead, we are considering several enhancements to further improve our Ceph infrastructure. One of our primary goals is to add more inter-DC links between the SANs. This expansion will allow for even better load balancing on the inter-DC side, ensuring optimal performance and resilience.
By continuously evaluating and refining our network configuration, we aim to achieve a highly optimized Ceph cluster that meets our performance and redundancy requirements, ultimately supporting our core business operations more effectively.
Latest articles
Talk to an expert
Speak directly with our experts who are ready to provide insights and answers.