Discussion:
design a rados-based distributed kv store support scan op
Plato Zhang
2014-09-30 09:23:15 UTC
Permalink
Hi all,

We plan to develop a new distributed key-value storage engine, which
would based on rados and supoort range-scan operation. Besides, it'll
be open source.

We know rados is already a kv storage engine, and can do pretty well
if we use KeyValueStore.
However, it lacks the ability of scanning a key range, since keys are
hashed into pgs.

In our scenarios, scanning is a common need. We considered the way of
splitting our key space into fixed number of ranges, mapping each
range to a rados object, and then storing key/value pairs in rados
objects. However, there are two main disadvantages:
1. We can not adjust the number of ranges as the cluster scales.
2. For a good splitting, we must properly predict the distribution of
keys, or we'll encounter serious load-balance problem.

So we decide to develop a new service based on rados (we want to use
rados for unifying our storage infrastructure).

Before we start, we want to do some inquiries:
1. Is there already any similar system/module(scan-supported &&
rados-based) existed?
2. Except for the basic key-value functions, what features are you
most interested in?
3. Other suggestions?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Yan, Zheng
2014-09-30 13:35:21 UTC
Permalink
Post by Plato Zhang
Hi all,
We plan to develop a new distributed key-value storage engine, which
would based on rados and supoort range-scan operation. Besides, it'll
be open source.
We know rados is already a kv storage engine, and can do pretty well
if we use KeyValueStore.
However, it lacks the ability of scanning a key range, since keys are
hashed into pgs.
In our scenarios, scanning is a common need. We considered the way of
splitting our key space into fixed number of ranges, mapping each
range to a rados object, and then storing key/value pairs in rados
1. We can not adjust the number of ranges as the cluster scales.
2. For a good splitting, we must properly predict the distribution of
keys, or we'll encounter serious load-balance problem.
So we decide to develop a new service based on rados (we want to use
rados for unifying our storage infrastructure).
1. Is there already any similar system/module(scan-supported &&
rados-based) existed?
FYI:
directory fragments in MDS satisfy all above requirement. but the code
is not general enough to be used outside MDS
Post by Plato Zhang
2. Except for the basic key-value functions, what features are you
most interested in?
3. Other suggestions?
Thanks!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Sage Weil
2014-09-30 14:03:52 UTC
Permalink
Post by Plato Zhang
Hi all,
We plan to develop a new distributed key-value storage engine, which
would based on rados and supoort range-scan operation. Besides, it'll
be open source.
We know rados is already a kv storage engine, and can do pretty well
if we use KeyValueStore.
However, it lacks the ability of scanning a key range, since keys are
hashed into pgs.
In our scenarios, scanning is a common need. We considered the way of
splitting our key space into fixed number of ranges, mapping each
range to a rados object, and then storing key/value pairs in rados
1. We can not adjust the number of ranges as the cluster scales.
2. For a good splitting, we must properly predict the distribution of
keys, or we'll encounter serious load-balance problem.
So we decide to develop a new service based on rados (we want to use
rados for unifying our storage infrastructure).
1. Is there already any similar system/module(scan-supported &&
rados-based) existed?
Take a look at the ceph.git/src/key_value_store/ directory. This is an
implementation (mostly complete) of a distributed btree implemented on top
of rados objects. There is a blog and write-up about it here:

http://ceph.com/community/summer-adventures-with-ceph-building-a-b-tree/
http://ceph.com/papers/CawthonKeyValueStore.pdf

This work hasn't gotten any love since it was first written, but may make
a good starting point!

sage
Post by Plato Zhang
2. Except for the basic key-value functions, what features are you
most interested in?
3. Other suggestions?
Thanks!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...