Discussion:
Regarding key/value interface
Somnath Roy
2014-09-12 01:11:57 UTC
Permalink
Hi Sage/Haomai,
If I have a key/value backend that support transaction, range queries (and I don't need any explicit caching etc.) and I want to replace filestore (and leveldb omap) with that, which interface you recommend me to derive from , directly ObjectStore or KeyValueDB ?
I have already integrated this backend by deriving from ObjectStore interfaces earlier (pre keyvalueinteface days) but not tested thoroughly enough to see what functionality is broken (Basic functionalities of RGW/RBD are working fine).
Basically, I want to know what are the advantages (and disadvantages) of deriving it from the new key/value interfaces ?
Also, what state is it in ? Is it feature complete and supporting all the ObjectStore interfaces like clone and all ?

Thanks & Regards
Somnath

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
Sage Weil
2014-09-12 01:30:50 UTC
Permalink
Hi Somnath,
Post by Somnath Roy
Hi Sage/Haomai,
If I have a key/value backend that support transaction, range queries (and I
don?t need any explicit caching etc.) and I want to replace filestore (and
leveldb omap) with that,  which interface you recommend me to derive from ,
directly ObjectStore or  KeyValueDB ?
I have already integrated this backend by deriving from ObjectStore
interfaces earlier (pre keyvalueinteface days) but not tested thoroughly
enough to see what functionality is broken (Basic functionalities of RGW/RBD
are working fine).
Basically, I want to know what are the advantages (and disadvantages) of
deriving it from the new key/value interfaces ?
Also, what state is it in ? Is it feature complete and supporting all the
ObjectStore interfaces like clone and all ?
Everything is supported, I think, for perhaps some IO hints that don't
make sense in a k/v context. The big things that you get by using
KeyValueStore and plugging into the lower-level interface are:

- striping of file data across keys
- efficient clone
- a zillion smaller methods that aren't conceptually difficult to
implement bug tedious and to do so.

The other nice thing about reusing this code is that you can use a leveldb
or rocksdb backend as a reference for testing or performance or whatever.

The main thing that will be a challenge going forward, I predict, is
making storage of the object byte payload in key/value pairs efficient. I
think KeyValuestore is doing some simple striping, but it will suffer for
small overwrites (like 512-byte or 4k writes from an RBD). There are
probably some pretty simple heuristics and tricks that can be done to
mitigate the most common patterns, but there is no simple solution since
the backends generally don't support partial value updates (I assume yours
doesn't either?). But, any work done here will benefit the other backends
too so that would be a win..

sage
Somnath Roy
2014-09-12 01:46:25 UTC
Permalink
Make perfect sense Sage..

Regarding striping of filedata, You are saying KeyValue interface will do the following for me?

1. Say in case of rbd image of order 4 MB, a write request coming to Key/Value interface, it will chunk the object (say full 4MB) in smaller sizes (configurable ?) and stripe it as multiple key/value pair ?

2. Also, while reading it will take care of accumulating and send it back.


Thanks & Regards
Somnath


-----Original Message-----
From: Sage Weil [mailto:sweil-H+wXaHxf7aLQT0dZR+***@public.gmane.org]
Sent: Thursday, September 11, 2014 6:31 PM
To: Somnath Roy
Cc: Haomai Wang (haomaiwang-***@public.gmane.org); ceph-users-***@public.gmane.org; ceph-devel-***@public.gmane.org
Subject: Re: Regarding key/value interface

Hi Somnath,
Post by Somnath Roy
Hi Sage/Haomai,
If I have a key/value backend that support transaction, range queries
(and I don?t need any explicit caching etc.) and I want to replace
filestore (and leveldb omap) with that, which interface you recommend
me to derive from , directly ObjectStore or KeyValueDB ?
I have already integrated this backend by deriving from ObjectStore
interfaces earlier (pre keyvalueinteface days) but not tested
thoroughly enough to see what functionality is broken (Basic
functionalities of RGW/RBD are working fine).
Basically, I want to know what are the advantages (and disadvantages)
of deriving it from the new key/value interfaces ?
Also, what state is it in ? Is it feature complete and supporting all
the ObjectStore interfaces like clone and all ?
Everything is supported, I think, for perhaps some IO hints that don't make sense in a k/v context. The big things that you get by using KeyValueStore and plugging into the lower-level interface are:

- striping of file data across keys
- efficient clone
- a zillion smaller methods that aren't conceptually difficult to implement bug tedious and to do so.

The other nice thing about reusing this code is that you can use a leveldb or rocksdb backend as a reference for testing or performance or whatever.

The main thing that will be a challenge going forward, I predict, is making storage of the object byte payload in key/value pairs efficient. I think KeyValuestore is doing some simple striping, but it will suffer for small overwrites (like 512-byte or 4k writes from an RBD). There are probably some pretty simple heuristics and tricks that can be done to mitigate the most common patterns, but there is no simple solution since the backends generally don't support partial value updates (I assume yours doesn't either?). But, any work done here will benefit the other backends too so that would be a win..

sage

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
Sage Weil
2014-09-12 01:54:59 UTC
Permalink
Post by Somnath Roy
Make perfect sense Sage..
Regarding striping of filedata, You are saying KeyValue interface will do the following for me?
1. Say in case of rbd image of order 4 MB, a write request coming to Key/Value interface, it will chunk the object (say full 4MB) in smaller sizes (configurable ?) and stripe it as multiple key/value pair ?
2. Also, while reading it will take care of accumulating and send it back.
Precisely.

A smarter thing we might want to make it do in the future would be to take
a 4 KB write create a new key that logically overwrites part of the
larger, say, 1MB key, and apply it on read. And maybe give up and rewrite
the entire 1MB stripe after too many small overwrites have accumulated.
Something along those lines to reduce the cost of small IOs to large
objects.

sage
Post by Somnath Roy
Thanks & Regards
Somnath
-----Original Message-----
Sent: Thursday, September 11, 2014 6:31 PM
To: Somnath Roy
Subject: Re: Regarding key/value interface
Hi Somnath,
Post by Somnath Roy
Hi Sage/Haomai,
If I have a key/value backend that support transaction, range queries
(and I don?t need any explicit caching etc.) and I want to replace
filestore (and leveldb omap) with that, which interface you recommend
me to derive from , directly ObjectStore or KeyValueDB ?
I have already integrated this backend by deriving from ObjectStore
interfaces earlier (pre keyvalueinteface days) but not tested
thoroughly enough to see what functionality is broken (Basic
functionalities of RGW/RBD are working fine).
Basically, I want to know what are the advantages (and disadvantages)
of deriving it from the new key/value interfaces ?
Also, what state is it in ? Is it feature complete and supporting all
the ObjectStore interfaces like clone and all ?
- striping of file data across keys
- efficient clone
- a zillion smaller methods that aren't conceptually difficult to implement bug tedious and to do so.
The other nice thing about reusing this code is that you can use a leveldb or rocksdb backend as a reference for testing or performance or whatever.
The main thing that will be a challenge going forward, I predict, is making storage of the object byte payload in key/value pairs efficient. I think KeyValuestore is doing some simple striping, but it will suffer for small overwrites (like 512-byte or 4k writes from an RBD). There are probably some pretty simple heuristics and tricks that can be done to mitigate the most common patterns, but there is no simple solution since the backends generally don't support partial value updates (I assume yours doesn't either?). But, any work done here will benefit the other backends too so that would be a win..
sage
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Somnath Roy
2014-09-12 03:54:53 UTC
Permalink
Thanks Sage...
Basically, we are doing similar chunking in our current implementation which is derived from objectstore.
Moving to Key/value will save us from that :-)
Also, I was thinking, we may want to do compression (later may be dedupe ?) on that Key/value layer as well.

Yes, partial read/write is definitely performance killer for object stores and our objectstore is no exception. We need to see how we can counter that.

But, I think these are enough reason for me now to move our implementation to the key/value interfaces.

Regards
Somnath


-----Original Message-----
From: Sage Weil [mailto:sweil-H+wXaHxf7aLQT0dZR+***@public.gmane.org]
Sent: Thursday, September 11, 2014 6:55 PM
To: Somnath Roy
Cc: Haomai Wang (haomaiwang-***@public.gmane.org); ceph-users-***@public.gmane.org; ceph-devel-***@public.gmane.org
Subject: RE: Regarding key/value interface
Post by Somnath Roy
Make perfect sense Sage..
Regarding striping of filedata, You are saying KeyValue interface will do the following for me?
1. Say in case of rbd image of order 4 MB, a write request coming to Key/Value interface, it will chunk the object (say full 4MB) in smaller sizes (configurable ?) and stripe it as multiple key/value pair ?
2. Also, while reading it will take care of accumulating and send it back.
Precisely.

A smarter thing we might want to make it do in the future would be to take a 4 KB write create a new key that logically overwrites part of the larger, say, 1MB key, and apply it on read. And maybe give up and rewrite the entire 1MB stripe after too many small overwrites have accumulated.
Something along those lines to reduce the cost of small IOs to large objects.

sage
Post by Somnath Roy
Thanks & Regards
Somnath
-----Original Message-----
Sent: Thursday, September 11, 2014 6:31 PM
To: Somnath Roy
Subject: Re: Regarding key/value interface
Hi Somnath,
Post by Somnath Roy
Hi Sage/Haomai,
If I have a key/value backend that support transaction, range
queries (and I don?t need any explicit caching etc.) and I want to
replace filestore (and leveldb omap) with that, which interface you
recommend me to derive from , directly ObjectStore or KeyValueDB ?
I have already integrated this backend by deriving from ObjectStore
interfaces earlier (pre keyvalueinteface days) but not tested
thoroughly enough to see what functionality is broken (Basic
functionalities of RGW/RBD are working fine).
Basically, I want to know what are the advantages (and
disadvantages) of deriving it from the new key/value interfaces ?
Also, what state is it in ? Is it feature complete and supporting
all the ObjectStore interfaces like clone and all ?
- striping of file data across keys
- efficient clone
- a zillion smaller methods that aren't conceptually difficult to implement bug tedious and to do so.
The other nice thing about reusing this code is that you can use a leveldb or rocksdb backend as a reference for testing or performance or whatever.
The main thing that will be a challenge going forward, I predict, is making storage of the object byte payload in key/value pairs efficient. I think KeyValuestore is doing some simple striping, but it will suffer for small overwrites (like 512-byte or 4k writes from an RBD). There are probably some pretty simple heuristics and tricks that can be done to mitigate the most common patterns, but there is no simple solution since the backends generally don't support partial value updates (I assume yours doesn't either?). But, any work done here will benefit the other backends too so that would be a win..
sage
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
Sage Weil
2014-09-12 04:10:24 UTC
Permalink
Post by Somnath Roy
Thanks Sage...
Basically, we are doing similar chunking in our current implementation which is derived from objectstore.
Moving to Key/value will save us from that :-)
Also, I was thinking, we may want to do compression (later may be dedupe ?) on that Key/value layer as well.
Yes, partial read/write is definitely performance killer for object stores and our objectstore is no exception. We need to see how we can counter that.
But, I think these are enough reason for me now to move our implementation to the key/value interfaces.
Sounds good.

By the way, hopefully this is a pretty painless process of wrapping your
kv library with the KeyValueDB interface. If not, that will be good to
know. I'm hoping it will fit well with a broad range of backends, but so
far we've only done leveldb/rocksdb (same interface) and kinetic. I'd
like to see us try LMDB in this context as well...

sage
Post by Somnath Roy
Regards
Somnath
-----Original Message-----
Sent: Thursday, September 11, 2014 6:55 PM
To: Somnath Roy
Subject: RE: Regarding key/value interface
Post by Somnath Roy
Make perfect sense Sage..
Regarding striping of filedata, You are saying KeyValue interface will do the following for me?
1. Say in case of rbd image of order 4 MB, a write request coming to Key/Value interface, it will chunk the object (say full 4MB) in smaller sizes (configurable ?) and stripe it as multiple key/value pair ?
2. Also, while reading it will take care of accumulating and send it back.
Precisely.
A smarter thing we might want to make it do in the future would be to take a 4 KB write create a new key that logically overwrites part of the larger, say, 1MB key, and apply it on read. And maybe give up and rewrite the entire 1MB stripe after too many small overwrites have accumulated.
Something along those lines to reduce the cost of small IOs to large objects.
sage
Post by Somnath Roy
Thanks & Regards
Somnath
-----Original Message-----
Sent: Thursday, September 11, 2014 6:31 PM
To: Somnath Roy
Subject: Re: Regarding key/value interface
Hi Somnath,
Post by Somnath Roy
Hi Sage/Haomai,
If I have a key/value backend that support transaction, range
queries (and I don?t need any explicit caching etc.) and I want to
replace filestore (and leveldb omap) with that, which interface you
recommend me to derive from , directly ObjectStore or KeyValueDB ?
I have already integrated this backend by deriving from ObjectStore
interfaces earlier (pre keyvalueinteface days) but not tested
thoroughly enough to see what functionality is broken (Basic
functionalities of RGW/RBD are working fine).
Basically, I want to know what are the advantages (and
disadvantages) of deriving it from the new key/value interfaces ?
Also, what state is it in ? Is it feature complete and supporting
all the ObjectStore interfaces like clone and all ?
- striping of file data across keys
- efficient clone
- a zillion smaller methods that aren't conceptually difficult to implement bug tedious and to do so.
The other nice thing about reusing this code is that you can use a leveldb or rocksdb backend as a reference for testing or performance or whatever.
The main thing that will be a challenge going forward, I predict, is making storage of the object byte payload in key/value pairs efficient. I think KeyValuestore is doing some simple striping, but it will suffer for small overwrites (like 512-byte or 4k writes from an RBD). There are probably some pretty simple heuristics and tricks that can be done to mitigate the most common patterns, but there is no simple solution since the backends generally don't support partial value updates (I assume yours doesn't either?). But, any work done here will benefit the other backends too so that would be a win..
sage
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Allen Samuels
2014-09-12 05:48:05 UTC
Permalink
Another thing we're looking into is compression. The intersection of compression and object striping (fracturing) is interesting. Is the striping variable on a per-object basis?

Allen Samuels
Chief Software Architect, Emerging Storage Solutions

951 SanDisk Drive, Milpitas, CA 95035
T: +1 408 801 7030| M: +1 408 780 6416
***@SanDisk.com

-----Original Message-----
From: ceph-devel-***@vger.kernel.org [mailto:ceph-devel-***@vger.kernel.org] On Behalf Of Sage Weil
Sent: Thursday, September 11, 2014 6:55 PM
To: Somnath Roy
Cc: Haomai Wang (***@gmail.com); ceph-***@lists.ceph.com; ceph-***@vger.kernel.org
Subject: RE: Regarding key/value interface
Post by Somnath Roy
Make perfect sense Sage..
Regarding striping of filedata, You are saying KeyValue interface will do the following for me?
1. Say in case of rbd image of order 4 MB, a write request coming to Key/Value interface, it will chunk the object (say full 4MB) in smaller sizes (configurable ?) and stripe it as multiple key/value pair ?
2. Also, while reading it will take care of accumulating and send it back.
Precisely.

A smarter thing we might want to make it do in the future would be to take a 4 KB write create a new key that logically overwrites part of the larger, say, 1MB key, and apply it on read. And maybe give up and rewrite the entire 1MB stripe after too many small overwrites have accumulated.
Something along those lines to reduce the cost of small IOs to large objects.

sage
Post by Somnath Roy
Thanks & Regards
Somnath
-----Original Message-----
Sent: Thursday, September 11, 2014 6:31 PM
To: Somnath Roy
Subject: Re: Regarding key/value interface
Hi Somnath,
Post by Somnath Roy
Hi Sage/Haomai,
If I have a key/value backend that support transaction, range
queries (and I don?t need any explicit caching etc.) and I want to
replace filestore (and leveldb omap) with that, which interface you
recommend me to derive from , directly ObjectStore or KeyValueDB ?
I have already integrated this backend by deriving from ObjectStore
interfaces earlier (pre keyvalueinteface days) but not tested
thoroughly enough to see what functionality is broken (Basic
functionalities of RGW/RBD are working fine).
Basically, I want to know what are the advantages (and
disadvantages) of deriving it from the new key/value interfaces ?
Also, what state is it in ? Is it feature complete and supporting
all the ObjectStore interfaces like clone and all ?
- striping of file data across keys
- efficient clone
- a zillion smaller methods that aren't conceptually difficult to implement bug tedious and to do so.
The other nice thing about reusing this code is that you can use a leveldb or rocksdb backend as a reference for testing or performance or whatever.
The main thing that will be a challenge going forward, I predict, is making storage of the object byte payload in key/value pairs efficient. I think KeyValuestore is doing some simple striping, but it will suffer for small overwrites (like 512-byte or 4k writes from an RBD). There are probably some pretty simple heuristics and tricks that can be done to mitigate the most common patterns, but there is no simple solution since the backends generally don't support partial value updates (I assume yours doesn't either?). But, any work done here will benefit the other backends too so that would be a win..
sage
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to ***@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Haomai Wang
2014-09-12 02:24:54 UTC
Permalink
Post by Somnath Roy
Make perfect sense Sage..
Regarding striping of filedata, You are saying KeyValue interface will do
the following for me?
1. Say in case of rbd image of order 4 MB, a write request coming to
Key/Value interface, it will chunk the object (say full 4MB) in smaller
sizes (configurable ?) and stripe it as multiple key/value pair ?
Yes, and the stripe size can be configurated.
Post by Somnath Roy
2. Also, while reading it will take care of accumulating and send it back.
Do you have any other idea? By the way, could you tell more about your
key/value interface. I'm doing some jobs for NVMe interface with intel NVMe
SSD.
Post by Somnath Roy
Thanks & Regards
Somnath
-----Original Message-----
Sent: Thursday, September 11, 2014 6:31 PM
To: Somnath Roy
Subject: Re: Regarding key/value interface
Hi Somnath,
Post by Somnath Roy
Hi Sage/Haomai,
If I have a key/value backend that support transaction, range queries
(and I don?t need any explicit caching etc.) and I want to replace
filestore (and leveldb omap) with that, which interface you recommend
me to derive from , directly ObjectStore or KeyValueDB ?
I have already integrated this backend by deriving from ObjectStore
interfaces earlier (pre keyvalueinteface days) but not tested
thoroughly enough to see what functionality is broken (Basic
functionalities of RGW/RBD are working fine).
Basically, I want to know what are the advantages (and disadvantages)
of deriving it from the new key/value interfaces ?
Also, what state is it in ? Is it feature complete and supporting all
the ObjectStore interfaces like clone and all ?
Everything is supported, I think, for perhaps some IO hints that don't
make sense in a k/v context. The big things that you get by using
- striping of file data across keys
- efficient clone
- a zillion smaller methods that aren't conceptually difficult to
implement bug tedious and to do so.
The other nice thing about reusing this code is that you can use a leveldb
or rocksdb backend as a reference for testing or performance or whatever.
The main thing that will be a challenge going forward, I predict, is
making storage of the object byte payload in key/value pairs efficient. I
think KeyValuestore is doing some simple striping, but it will suffer for
small overwrites (like 512-byte or 4k writes from an RBD). There are
probably some pretty simple heuristics and tricks that can be done to
mitigate the most common patterns, but there is no simple solution since
the backends generally don't support partial value updates (I assume yours
doesn't either?). But, any work done here will benefit the other backends
too so that would be a win..
sage
________________________________
PLEASE NOTE: The information contained in this electronic mail message is
intended only for the use of the designated recipient(s) named above. If
the reader of this message is not the intended recipient, you are hereby
notified that you have received this message in error and that any review,
dissemination, distribution, or copying of this message is strictly
prohibited. If you have received this communication in error, please notify
the sender by telephone or e-mail (as shown above) immediately and destroy
any and all copies of this message in your possession (whether hard copies
or electronically stored copies).
--
Best Regards,

Wheat
Haomai Wang
2014-09-12 02:27:04 UTC
Permalink
Post by Somnath Roy
Make perfect sense Sage..
Regarding striping of filedata, You are saying KeyValue interface will do
the following for me?
1. Say in case of rbd image of order 4 MB, a write request coming to
Key/Value interface, it will chunk the object (say full 4MB) in smaller
sizes (configurable ?) and stripe it as multiple key/value pair ?
Yes, and the stripe size can be configurated.
Post by Somnath Roy
2. Also, while reading it will take care of accumulating and send it back.
Do you have any other idea? By the way, could you tell more about your
key/value interface. I'm doing some jobs for NVMe interface with intel NVMe
SSD.
Post by Somnath Roy
Thanks & Regards
Somnath
-----Original Message-----
Sent: Thursday, September 11, 2014 6:31 PM
To: Somnath Roy
Subject: Re: Regarding key/value interface
Hi Somnath,
Post by Somnath Roy
Hi Sage/Haomai,
If I have a key/value backend that support transaction, range queries
(and I don?t need any explicit caching etc.) and I want to replace
filestore (and leveldb omap) with that, which interface you recommend
me to derive from , directly ObjectStore or KeyValueDB ?
I have already integrated this backend by deriving from ObjectStore
interfaces earlier (pre keyvalueinteface days) but not tested
thoroughly enough to see what functionality is broken (Basic
functionalities of RGW/RBD are working fine).
Basically, I want to know what are the advantages (and disadvantages)
of deriving it from the new key/value interfaces ?
Also, what state is it in ? Is it feature complete and supporting all
the ObjectStore interfaces like clone and all ?
Everything is supported, I think, for perhaps some IO hints that don't
make sense in a k/v context. The big things that you get by using
- striping of file data across keys
- efficient clone
- a zillion smaller methods that aren't conceptually difficult to
implement bug tedious and to do so.
The other nice thing about reusing this code is that you can use a leveldb
or rocksdb backend as a reference for testing or performance or whatever.
The main thing that will be a challenge going forward, I predict, is
making storage of the object byte payload in key/value pairs efficient. I
think KeyValuestore is doing some simple striping, but it will suffer for
small overwrites (like 512-byte or 4k writes from an RBD). There are
probably some pretty simple heuristics and tricks that can be done to
mitigate the most common patterns, but there is no simple solution since
the backends generally don't support partial value updates (I assume yours
doesn't either?). But, any work done here will benefit the other backends
too so that would be a win..
sage
________________________________
PLEASE NOTE: The information contained in this electronic mail message is
intended only for the use of the designated recipient(s) named above. If
the reader of this message is not the intended recipient, you are hereby
notified that you have received this message in error and that any review,
dissemination, distribution, or copying of this message is strictly
prohibited. If you have received this communication in error, please notify
the sender by telephone or e-mail (as shown above) immediately and destroy
any and all copies of this message in your possession (whether hard copies
or electronically stored copies).
--
Best Regards,

Wheat
Haomai Wang
2014-09-12 02:27:46 UTC
Permalink
Post by Somnath Roy
Make perfect sense Sage..
Regarding striping of filedata, You are saying KeyValue interface will do the following for me?
1. Say in case of rbd image of order 4 MB, a write request coming to Key/Value interface, it will chunk the object (say full 4MB) in smaller sizes (configurable ?) and stripe it as multiple key/value pair ?
Yes, and the stripe size can be configurated.
Post by Somnath Roy
2. Also, while reading it will take care of accumulating and send it back.
Do you have any other idea? By the way, could you tell more about your
key/value interface. I'm doing some jobs for NVMe interface with intel
NVMe SSD.
Post by Somnath Roy
Thanks & Regards
Somnath
-----Original Message-----
Sent: Thursday, September 11, 2014 6:31 PM
To: Somnath Roy
Subject: Re: Regarding key/value interface
Hi Somnath,
Post by Somnath Roy
Hi Sage/Haomai,
If I have a key/value backend that support transaction, range queries
(and I don?t need any explicit caching etc.) and I want to replace
filestore (and leveldb omap) with that, which interface you recommend
me to derive from , directly ObjectStore or KeyValueDB ?
I have already integrated this backend by deriving from ObjectStore
interfaces earlier (pre keyvalueinteface days) but not tested
thoroughly enough to see what functionality is broken (Basic
functionalities of RGW/RBD are working fine).
Basically, I want to know what are the advantages (and disadvantages)
of deriving it from the new key/value interfaces ?
Also, what state is it in ? Is it feature complete and supporting all
the ObjectStore interfaces like clone and all ?
- striping of file data across keys
- efficient clone
- a zillion smaller methods that aren't conceptually difficult to implement bug tedious and to do so.
The other nice thing about reusing this code is that you can use a leveldb or rocksdb backend as a reference for testing or performance or whatever.
The main thing that will be a challenge going forward, I predict, is making storage of the object byte payload in key/value pairs efficient. I think KeyValuestore is doing some simple striping, but it will suffer for small overwrites (like 512-byte or 4k writes from an RBD). There are probably some pretty simple heuristics and tricks that can be done to mitigate the most common patterns, but there is no simple solution since the backends generally don't support partial value updates (I assume yours doesn't either?). But, any work done here will benefit the other backends too so that would be a win..
sage
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Somnath Roy
2014-09-12 04:37:00 UTC
Permalink
Hi Haomai,
<<inline.

Thanks & Regards
Somnath

-----Original Message-----
From: Haomai Wang [mailto:***@gmail.com]
Sent: Thursday, September 11, 2014 7:28 PM
To: Somnath Roy
Cc: Sage Weil; ceph-***@lists.ceph.com; ceph-***@vger.kernel.org
Subject: Re: Regarding key/value interface
Post by Somnath Roy
Make perfect sense Sage..
Regarding striping of filedata, You are saying KeyValue interface will do the following for me?
1. Say in case of rbd image of order 4 MB, a write request coming to Key/Value interface, it will chunk the object (say full 4MB) in smaller sizes (configurable ?) and stripe it as multiple key/value pair ?
Yes, and the stripe size can be configurated.

[Somnath] That's great, thanks
Post by Somnath Roy
2. Also, while reading it will take care of accumulating and send it back.
Do you have any other idea?

[Somnath] No, I was just asking

By the way, could you tell more about your key/value interface. I'm doing some jobs for NVMe interface with intel NVMe SSD.

[Somnath] It has the following interfaces.

1. Init & shutdown

2. It has container concept

3. Read/write objects, delete objects, enumerate objects, multi put/get support

4. Transaction semantics

5. Range query support

6. Container level snapshot

7. statistics

Let me know if you need anything specifics.

Thanks & Regards
Somnath
Post by Somnath Roy
Thanks & Regards
Somnath
-----Original Message-----
Sent: Thursday, September 11, 2014 6:31 PM
To: Somnath Roy
Subject: Re: Regarding key/value interface
Hi Somnath,
Post by Somnath Roy
Hi Sage/Haomai,
If I have a key/value backend that support transaction, range
queries (and I don?t need any explicit caching etc.) and I want to
replace filestore (and leveldb omap) with that, which interface you
recommend me to derive from , directly ObjectStore or KeyValueDB ?
I have already integrated this backend by deriving from ObjectStore
interfaces earlier (pre keyvalueinteface days) but not tested
thoroughly enough to see what functionality is broken (Basic
functionalities of RGW/RBD are working fine).
Basically, I want to know what are the advantages (and
disadvantages) of deriving it from the new key/value interfaces ?
Also, what state is it in ? Is it feature complete and supporting
all the ObjectStore interfaces like clone and all ?
- striping of file data across keys
- efficient clone
- a zillion smaller methods that aren't conceptually difficult to implement bug tedious and to do so.
The other nice thing about reusing this code is that you can use a leveldb or rocksdb backend as a reference for testing or performance or whatever.
The main thing that will be a challenge going forward, I predict, is making storage of the object byte payload in key/value pairs efficient. I think KeyValuestore is doing some simple striping, but it will suffer for small overwrites (like 512-byte or 4k writes from an RBD). There are probably some pretty simple heuristics and tricks that can be done to mitigate the most common patterns, but there is no simple solution since the backends generally don't support partial value updates (I assume yours doesn't either?). But, any work done here will benefit the other backends too so that would be a win..
sage
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--

Best Regards,

Wheat
��{.n�+�������+%��lzwm��b�맲��r��yǩ�ׯzX����ܨ}���Ơz�&j:+v�������zZ+
Loading...