Ceph: meet the Open SDS Storage king

by mark · Published 16 May 2018 · Updated 22 May 2018

In the era of Software Defined – everything one of the less agile and toughest block to abstract is storage. Ceph comes as the leader solution in the Open Source SDS with interesting and unique features that enable enterprises to build and maintain complex storage solutions.

Software Defined Storage – what is it

Software defined storage is just what it sounds like: storage defined by software. In the past large storage arrays were exclusively tailored by specialized manufactures such as NetApp. The complexity of storing huge quantities of data reliably, avoiding data loss and corruption was delegated to hardware and hardware manufacturers. Software played a marginal role in this ecosystem.

With the advent of Cloud technologies and Big Data, old storage models and hardware were soon found to be unfit. Scaling and data consistency across nodes were major problems. During this period new storage solutions such as HDFS were born. Above all Object Storage became predominant in cloud architectures with Amazon S3, the de-facto object storage leader, paving the way.

These solutions are commonly referred to as Software Defined Storage, a buzz word that encompasses storage technologies that are not bound by hardware or vendor-lock in. As opposed to the past, SDS solutions can be deployed on commodity hardware and do not require specific hardware configurations.

Ceph: SDS at its finest

Ceph is an Open Source project backed by many IT corporations such as Red Hat, SUSE, Canonical, Fujitsu and Intel. The name comes from the term Cephalopod. Ceph is pretty unique in its kind, because is a distributed storage management tool that exposes different types of storage:

Object-level: the object level is accessible through Amazon S3-compatible and OpenStack Swift-compatible APIs.
Block-level: the block level is accessible through rbd interface (native for Linux) and iSCSI.
Filesystem-level: the filesystem level is the most abstracted from Ceph’s inner working, this level provides a POSIX-compliant filesystem interface.

Among all the different solutions, Ceph is the only one that provides advanced features such as snapshots, compression (no deduplication yet) and thin provisioning while exposing the three different levels aforementioned. No other software, commercial or not, open or closed source (to the best of my knowledge) is able to do the same.

Ceph is designed to scale out on thousands of nodes and reach exabyte-level storage.

Ceph Architecture

Understanding Ceph’s inner working may be daunting at first, but the good thing is you don’t really need to understand how (unless you’re going to install it) Ceph performs its magic. Nevertheless here’s a brief architectural overview. There are only two Ceph node types:

Ceph OSD nodes: which stands for Object Storage Daemon, are the nodes that store data. The Ceph OSD daemon (notice redundancy) runs on these nodes and each disk has a different OSD.
Ceph Monitor nodes: are the nodes that store cluster maps needed to retrieve the objects.

At its core, every file, block or object stored in Ceph is treated as an Object by the system, each OSD is responsible for storing and managing operations related to such objects. So, how are these objects stored on disks? They are stored on… another filesystem (more in the next section).

Internally Ceph organizes objects in pools and keeps a number of replicas of each object. Each object is checksummed to ensure data integrity and snapshots can be performed per-object. Ceph also integrates well with KVM and libvirt, providing the power of SDS to open virtualization.

How does Ceph store data on disks?

In the last paragraph you learned that Ceph stores data using another filesystem. At first it may seem hard, but it really isn’t. In order to read/write objects to physical disks, Ceph leverages local filesystems:

XFS: is the suggested filesystem for production use.
BtrFS: is mentioned for its capabilities, but not suggested for production use.
Ext4: is to be avoided due to its limitations.

All the abstraction and the different levels exposed by Ceph are ultimately mapped to local files (each object is a file), how this file is exactly mapped on the drive depends on the underlying filesystem.

This architecture however created latency and redundancy. The whole architecture known as FileStore (store object on underlying files) was somewhat problematic. In 2017 with Ceph Luminous (v 12.2.0) a new storage backend called BlueStore was introduced. This new revolutionary backend boasted higher performance compared to FileStore and eliminated the need of an underlying filesystem. Since Luminous, BlueStore is the default backend for new OSDs.

Although BlueStore is awesome, older clusters can upgrade OSDs selectively, one at a time. As a matter of fact a Ceph cluster can run with mixed OSD backends without problems.

Comparison with Amazon S3

S3 is the major Object-storage API and Amazon S3 is the major Object storage player. Ceph exposes a S3-compatible interface to allow applications programmed for S3 to work against a Ceph cluster.

Amazon S3 is great for object storage; however older, non-cloud applications may struggle to migrate. In this scenario Ceph offers block-level storage and can be used to support legacy application. To be fair Amazon offers Elastic Block Storage which is used for the same purpose.

No other comparison can be made on the architecture level since Amazon S3 and EBS are managed by Amazon.

Comparison with OpenStack Swift

The Swift API is a REST API that is used to access OpenStack Swift Object Storage. Ceph supports Swift API and can be used for the same purposes. Swift can’t offer block– or file-level, however Cinder can be used to offer block-level access and it can use Swift as a backend.

The Ceph vs Swift matter is pretty hot in OpenStack environments. Each software has its own up/downsides, for example Ceph is consistent and has better latency but struggles in multi-region deployments. On the other hand Swift is eventually consistent has worse latency but doesn’t struggle as much in multi-region deployments.

Although a bit outdated you can take a look at this excellent article by Mirantis: Ceph vs Swift – An Architect’s Perspective.

Comparison with GlusterFS

GlusterFS is a distributed filesystem that exposes filesystem-level access leveraging a internal architecture similar to FileStore. GlusterFS is pretty fast compared to Ceph but it needs low latency between nodes to work and doesn’t provide as many features.

Ceph, Containers, Kubernetes and OpenShift

Although Ceph is a complete solution to storage needs, the integration with container technologies such as Docker or Kubernetes is still something that needs to be carefully engineered. For Kubernetes, you can clearly see RBD and CephFS (the filesystem-level) in the list of Persistent Volumes, however it is a manual process and can be difficult to get it working.

OpenShift on the other hand has a clear path to Ceph integration, and Red Hat is working hard to make this procedure more seamless.

Image courtesy of mark | marksei

Author
Recent Posts

mark

The IT guy with a slight look of boredom in his eyes. Freelancer. Current interests: Kubernetes, Tensorflow, shiny new things.

Alexander T says:

11 May 2019 at 15:07

Thank you for the overview.

Loading...

Reply

Cookie	Duration	Description
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_60468161_1	past	Set by Google to distinguish users.
_ga_DR9SCJ09BV	2 years	This cookie is installed by Google Analytics.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.

Cookie	Duration	Description
edgebucket	session	Reddit sets this cookie to save the information about a log-on Reddit user, for the purpose of advertisement recommendations and updating the content.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	14 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
csv	2 years	No description available.
GoogleAdServingTest	session	No description
wp_api	past	No description
wp_api_sec	past	No description
_pk_id.1.95fa	1 year 27 days	No description
_pk_ses.1.95fa	29 minutes	No description
__smSessionId	9 hours	No description available.
__smToken	1 year	This cookie is set by the Sumo. This cookie is used for verifying whether the user is logged in or not.

You may also like...

1 Response

Leave a ReplyCancel reply

Recent Posts

Recent Comments

Categories

Latest tutorials

Ceph: meet the Open SDS Storage king

Software Defined Storage – what is it

Ceph: SDS at its finest

Ceph Architecture

How does Ceph store data on disks?

Comparison with Amazon S3

Comparison with OpenStack Swift

Comparison with GlusterFS

Ceph, Containers, Kubernetes and OpenShift

Related posts:

You may also like...

How to install Linux Mint 20 in 10 easy steps

OpenStack can’t attach cinder volume

KRACK vulnerability breaks encryption over every Wi-Fi networks

1 Response

Leave a ReplyCancel reply

Recent Posts

Recent Comments

Categories

Latest tutorials