Ceph: meet the Open SDS Storage king
In the era of Software Defined – everything one of the less agile and toughest block to abstract is storage. Ceph comes as the leader solution in the Open Source SDS with interesting and unique features that enable enterprises to build and maintain complex storage solutions.
Software Defined Storage – what is it
Software defined storage is just what it sounds like: storage defined by software. In the past large storage arrays were exclusively tailored by specialized manufactures such as NetApp. The complexity of storing huge quantities of data reliably, avoiding data loss and corruption was delegated to hardware and hardware manufacturers. Software played a marginal role in this ecosystem.
With the advent of Cloud technologies and Big Data, old storage models and hardware were soon found to be unfit. Scaling and data consistency across nodes were major problems. During this period new storage solutions such as HDFS were born. Above all Object Storage became predominant in cloud architectures with Amazon S3, the de-facto object storage leader, paving the way.
These solutions are commonly referred to as Software Defined Storage, a buzz word that encompasses storage technologies that are not bound by hardware or vendor-lock in. As opposed to the past, SDS solutions can be deployed on commodity hardware and do not require specific hardware configurations.
Ceph: SDS at its finest
Ceph is an Open Source project backed by many IT corporations such as Red Hat, SUSE, Canonical, Fujitsu and Intel. The name comes from the term Cephalopod. Ceph is pretty unique in its kind, because is a distributed storage management tool that exposes different types of storage:
- Object-level: the object level is accessible through Amazon S3-compatible and OpenStack Swift-compatible APIs.
- Block-level: the block level is accessible through rbd interface (native for Linux) and iSCSI.
- Filesystem-level: the filesystem level is the most abstracted from Ceph’s inner working, this level provides a POSIX-compliant filesystem interface.
Among all the different solutions, Ceph is the only one that provides advanced features such as snapshots, compression (no deduplication yet) and thin provisioning while exposing the three different levels aforementioned. No other software, commercial or not, open or closed source (to the best of my knowledge) is able to do the same.
Ceph is designed to scale out on thousands of nodes and reach exabyte-level storage.
Understanding Ceph’s inner working may be daunting at first, but the good thing is you don’t really need to understand how (unless you’re going to install it) Ceph performs its magic. Nevertheless here’s a brief architectural overview. There are only two Ceph node types:
- Ceph OSD nodes: which stands for Object Storage Daemon, are the nodes that store data. The Ceph OSD daemon (notice redundancy) runs on these nodes and each disk has a different OSD.
- Ceph Monitor nodes: are the nodes that store cluster maps needed to retrieve the objects.
At its core, every file, block or object stored in Ceph is treated as an Object by the system, each OSD is responsible for storing and managing operations related to such objects. So, how are these objects stored on disks? They are stored on… another filesystem (more in the next section).
Internally Ceph organizes objects in pools and keeps a number of replicas of each object. Each object is checksummed to ensure data integrity and snapshots can be performed per-object. Ceph also integrates well with KVM and libvirt, providing the power of SDS to open virtualization.
How does Ceph store data on disks?
In the last paragraph you learned that Ceph stores data using another filesystem. At first it may seem hard, but it really isn’t. In order to read/write objects to physical disks, Ceph leverages local filesystems:
- XFS: is the suggested filesystem for production use.
- BtrFS: is mentioned for its capabilities, but not suggested for production use.
- Ext4: is to be avoided due to its limitations.
All the abstraction and the different levels exposed by Ceph are ultimately mapped to local files (each object is a file), how this file is exactly mapped on the drive depends on the underlying filesystem.
This architecture however created latency and redundancy. The whole architecture known as FileStore (store object on underlying files) was somewhat problematic. In 2017 with Ceph Luminous (v 12.2.0) a new storage backend called BlueStore was introduced. This new revolutionary backend boasted higher performance compared to FileStore and eliminated the need of an underlying filesystem. Since Luminous, BlueStore is the default backend for new OSDs.
Although BlueStore is awesome, older clusters can upgrade OSDs selectively, one at a time. As a matter of fact a Ceph cluster can run with mixed OSD backends without problems.
Comparison with Amazon S3
S3 is the major Object-storage API and Amazon S3 is the major Object storage player. Ceph exposes a S3-compatible interface to allow applications programmed for S3 to work against a Ceph cluster.
Amazon S3 is great for object storage; however older, non-cloud applications may struggle to migrate. In this scenario Ceph offers block-level storage and can be used to support legacy application. To be fair Amazon offers Elastic Block Storage which is used for the same purpose.
No other comparison can be made on the architecture level since Amazon S3 and EBS are managed by Amazon.
Comparison with OpenStack Swift
The Swift API is a REST API that is used to access OpenStack Swift Object Storage. Ceph supports Swift API and can be used for the same purposes. Swift can’t offer block– or file-level, however Cinder can be used to offer block-level access and it can use Swift as a backend.
The Ceph vs Swift matter is pretty hot in OpenStack environments. Each software has its own up/downsides, for example Ceph is consistent and has better latency but struggles in multi-region deployments. On the other hand Swift is eventually consistent has worse latency but doesn’t struggle as much in multi-region deployments.
Although a bit outdated you can take a look at this excellent article by Mirantis: Ceph vs Swift – An Architect’s Perspective.
Comparison with GlusterFS
GlusterFS is a distributed filesystem that exposes filesystem-level access leveraging a internal architecture similar to FileStore. GlusterFS is pretty fast compared to Ceph but it needs low latency between nodes to work and doesn’t provide as many features.
Ceph, Containers, Kubernetes and OpenShift
Although Ceph is a complete solution to storage needs, the integration with container technologies such as Docker or Kubernetes is still something that needs to be carefully engineered. For Kubernetes, you can clearly see RBD and CephFS (the filesystem-level) in the list of Persistent Volumes, however it is a manual process and can be difficult to get it working.
OpenShift on the other hand has a clear path to Ceph integration, and Red Hat is working hard to make this procedure more seamless.