Discussion:
Regarding key/value interface
Somnath Roy
2014-10-02 22:47:40 UTC
Permalink
Hi Sage/Haomai,

I was going through the key/value store implementation and have one basic question regarding the way it is designed.

I think key/value interface is assuming there will be a filesystem on top of the device . I saw in mount you are accessing files like superblock/fsid. So, for example, /var/lib/ceph/osd/ceph-0 should be a filesystem path, right ?
If so, this may not be the case always as there are key/value stores which can work on the raw device. In that case, these files (superblock/fsid) also need to go in the key/value db.

Let me know if I am missing anything.

Thanks & Regards
Somnath



________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Haomai Wang
2014-10-03 05:25:26 UTC
Permalink
Correctly, maybe we can move these super metadata to backend!
Post by Somnath Roy
Hi Sage/Haomai,
I was going through the key/value store implementation and have one basic question regarding the way it is designed.
I think key/value interface is assuming there will be a filesystem on top of the device . I saw in mount you are accessing files like superblock/fsid. So, for example, /var/lib/ceph/osd/ceph-0 should be a filesystem path, right ?
If so, this may not be the case always as there are key/value stores which can work on the raw device. In that case, these files (superblock/fsid) also need to go in the key/value db.
Let me know if I am missing anything.
Thanks & Regards
Somnath
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Varada Kari
2014-10-03 10:02:09 UTC
Permalink
I am not sure, if Rocksdb/LevelDB can work on a raw device. When I looked at code they were doing write to mount point/directory.

Varada

-----Original Message-----
From: ceph-devel-***@vger.kernel.org [mailto:ceph-devel-***@vger.kernel.org] On Behalf Of Haomai Wang
Sent: Friday, October 03, 2014 10:55 AM
To: Somnath Roy
Cc: Sage Weil (***@redhat.com); ceph-devel
Subject: Re: Regarding key/value interface

Correctly, maybe we can move these super metadata to backend!
Post by Somnath Roy
Hi Sage/Haomai,
I was going through the key/value store implementation and have one basic question regarding the way it is designed.
I think key/value interface is assuming there will be a filesystem on top of the device . I saw in mount you are accessing files like superblock/fsid. So, for example, /var/lib/ceph/osd/ceph-0 should be a filesystem path, right ?
If so, this may not be the case always as there are key/value stores which can work on the raw device. In that case, these files (superblock/fsid) also need to go in the key/value db.
Let me know if I am missing anything.
Thanks & Regards
Somnath
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to ***@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
N�����r��y����b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
Sage Weil
2014-10-03 15:03:10 UTC
Permalink
Post by Varada Kari
I am not sure, if Rocksdb/LevelDB can work on a raw device. When I
looked at code they were doing write to mount point/directory.
Yeah. But as Somnath points outs others will take a raw device..

I think the main challenge there will be that there is some miscellaneous
stuff that Ceph stashes in those directories to bootstrap OSDs. Mainly
there's the keyring and a 'done' file. Probably we should add a small
file that simply names the backend so that the OSD can start up with an
existing store despite a change in ceph.conf.

Somnath, I don't think this is particularly problematic, though. The dir
can remain and contain a symlink to the raw device.

If we want to have hot-swappability, maybe it's possible to carve off a
tiny partition on the device? If that doesn't work, we'll have to get
more creative (like teach ceph-disk how to interact with the raw device
:/).

sage
Post by Varada Kari
Varada
-----Original Message-----
Sent: Friday, October 03, 2014 10:55 AM
To: Somnath Roy
Subject: Re: Regarding key/value interface
Correctly, maybe we can move these super metadata to backend!
Post by Somnath Roy
Hi Sage/Haomai,
I was going through the key/value store implementation and have one basic question regarding the way it is designed.
I think key/value interface is assuming there will be a filesystem on top of the device . I saw in mount you are accessing files like superblock/fsid. So, for example, /var/lib/ceph/osd/ceph-0 should be a filesystem path, right ?
If so, this may not be the case always as there are key/value stores which can work on the raw device. In that case, these files (superblock/fsid) also need to go in the key/value db.
Let me know if I am missing anything.
Thanks & Regards
Somnath
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
Best Regards,
Wheat
--
N?????r??y??????X???v???)?{.n?????z?]z????ay?????j??f???h??????w??? ???j:+v???w????????????zZ+???????j"????i
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Allen Samuels
2014-10-03 16:05:07 UTC
Permalink
What would be good is a set of best-practices so that if you need to move the media to another location (logical or physical, i.e., different slot, different address, different server, etc.) that all of the needed support files travel along together. Some seat belts for this would be a really good thing.

Allen Samuels
Chief Software Architect, Emerging Storage Solutions

951 SanDisk Drive, Milpitas, CA 95035
T: +1 408 801 7030| M: +1 408 780 6416
***@SanDisk.com


-----Original Message-----
From: ceph-devel-***@vger.kernel.org [mailto:ceph-devel-***@vger.kernel.org] On Behalf Of Sage Weil
Sent: Friday, October 03, 2014 8:03 AM
To: Varada Kari
Cc: Haomai Wang; Somnath Roy; ceph-devel
Subject: RE: Regarding key/value interface
Post by Varada Kari
I am not sure, if Rocksdb/LevelDB can work on a raw device. When I
looked at code they were doing write to mount point/directory.
Yeah. But as Somnath points outs others will take a raw device..

I think the main challenge there will be that there is some miscellaneous stuff that Ceph stashes in those directories to bootstrap OSDs. Mainly there's the keyring and a 'done' file. Probably we should add a small file that simply names the backend so that the OSD can start up with an existing store despite a change in ceph.conf.

Somnath, I don't think this is particularly problematic, though. The dir can remain and contain a symlink to the raw device.

If we want to have hot-swappability, maybe it's possible to carve off a tiny partition on the device? If that doesn't work, we'll have to get more creative (like teach ceph-disk how to interact with the raw device :/).

sage
Post by Varada Kari
Varada
-----Original Message-----
Sent: Friday, October 03, 2014 10:55 AM
To: Somnath Roy
Subject: Re: Regarding key/value interface
Correctly, maybe we can move these super metadata to backend!
Post by Somnath Roy
Hi Sage/Haomai,
I was going through the key/value store implementation and have one basic question regarding the way it is designed.
I think key/value interface is assuming there will be a filesystem on top of the device . I saw in mount you are accessing files like superblock/fsid. So, for example, /var/lib/ceph/osd/ceph-0 should be a filesystem path, right ?
If so, this may not be the case always as there are key/value stores which can work on the raw device. In that case, these files (superblock/fsid) also need to go in the key/value db.
Let me know if I am missing anything.
Thanks & Regards
Somnath
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
info at http://vger.kernel.org/majordomo-info.html
N?????r??y??????X???v???)?{.n?????z?]z????ay?????j ??f???h??????w???
???j:+v???w???????? ????zZ+???????j"????i
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to ***@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Somnath Roy
2014-10-03 22:15:36 UTC
Permalink
Sage,
Ideally all the files should go into key/value db (for better portability purpose) but yes, I think we can live with the small partition as you mentioned in the drive for the bootstrap files and creating a sym link under current directory pointing to the other RAW partition on the disk for key/value db to use.
But, ceph-disk needs to take care of these things during installation. Is anybody looking into that part ?

Thanks & Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:***@redhat.com]
Sent: Friday, October 03, 2014 8:03 AM
To: Varada Kari
Cc: Haomai Wang; Somnath Roy; ceph-devel
Subject: RE: Regarding key/value interface
Post by Varada Kari
I am not sure, if Rocksdb/LevelDB can work on a raw device. When I
looked at code they were doing write to mount point/directory.
Yeah. But as Somnath points outs others will take a raw device..

I think the main challenge there will be that there is some miscellaneous stuff that Ceph stashes in those directories to bootstrap OSDs. Mainly there's the keyring and a 'done' file. Probably we should add a small file that simply names the backend so that the OSD can start up with an existing store despite a change in ceph.conf.

Somnath, I don't think this is particularly problematic, though. The dir can remain and contain a symlink to the raw device.

If we want to have hot-swappability, maybe it's possible to carve off a tiny partition on the device? If that doesn't work, we'll have to get more creative (like teach ceph-disk how to interact with the raw device :/).

sage
Post by Varada Kari
Varada
-----Original Message-----
Sent: Friday, October 03, 2014 10:55 AM
To: Somnath Roy
Subject: Re: Regarding key/value interface
Correctly, maybe we can move these super metadata to backend!
Post by Somnath Roy
Hi Sage/Haomai,
I was going through the key/value store implementation and have one basic question regarding the way it is designed.
I think key/value interface is assuming there will be a filesystem on top of the device . I saw in mount you are accessing files like superblock/fsid. So, for example, /var/lib/ceph/osd/ceph-0 should be a filesystem path, right ?
If so, this may not be the case always as there are key/value stores which can work on the raw device. In that case, these files (superblock/fsid) also need to go in the key/value db.
Let me know if I am missing anything.
Thanks & Regards
Somnath
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
info at http://vger.kernel.org/majordomo-info.html
N?????r??y??????X???v???)?{.n?????z?]z????ay?????j ??f???h??????w???
???j:+v???w???????? ????zZ+???????j"????i
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Sage Weil
2014-10-03 22:35:10 UTC
Permalink
Post by Somnath Roy
Sage,
Ideally all the files should go into key/value db (for better
portability purpose) but yes, I think we can live with the small
partition as you mentioned in the drive for the bootstrap files and
creating a sym link under current directory pointing to the other RAW
partition on the disk for key/value db to use.
Cool.
Post by Somnath Roy
But, ceph-disk needs to take care of these things during installation.
Is anybody looking into that part ?
Not yet. I think the high-level goal should be maintain the basic usage
of ceph-disk. i.e.,

ceph-disk prepare /dev/foo

Then we'd need to teach ceph-disk about the various ways that
it needs to prepare the device, like what partitions to create and how
big they should be. With the journal-skipping behavior Haomai just
added we're calling into ceph-osd to ask the backend what it wants. I
think that model is probably the most flexible. The question is what
ceph-disk should do then...

1) small partition for metadata, second partition used directly by the
backend library
2) one big partition

For 2, we'd need some way for ceph-disk and other tools to get at the
metadata (osd uuid, ceph auth keys, whoami file, etc.). I'm not sure it's
worth the hassle if it doesn't break your backend to carve off a tiny
partition for that...

sage
Post by Somnath Roy
Thanks & Regards
Somnath
-----Original Message-----
Sent: Friday, October 03, 2014 8:03 AM
To: Varada Kari
Cc: Haomai Wang; Somnath Roy; ceph-devel
Subject: RE: Regarding key/value interface
Post by Varada Kari
I am not sure, if Rocksdb/LevelDB can work on a raw device. When I
looked at code they were doing write to mount point/directory.
Yeah. But as Somnath points outs others will take a raw device..
I think the main challenge there will be that there is some miscellaneous stuff that Ceph stashes in those directories to bootstrap OSDs. Mainly there's the keyring and a 'done' file. Probably we should add a small file that simply names the backend so that the OSD can start up with an existing store despite a change in ceph.conf.
Somnath, I don't think this is particularly problematic, though. The dir can remain and contain a symlink to the raw device.
If we want to have hot-swappability, maybe it's possible to carve off a tiny partition on the device? If that doesn't work, we'll have to get more creative (like teach ceph-disk how to interact with the raw device :/).
sage
Post by Varada Kari
Varada
-----Original Message-----
Sent: Friday, October 03, 2014 10:55 AM
To: Somnath Roy
Subject: Re: Regarding key/value interface
Correctly, maybe we can move these super metadata to backend!
Post by Somnath Roy
Hi Sage/Haomai,
I was going through the key/value store implementation and have one basic question regarding the way it is designed.
I think key/value interface is assuming there will be a filesystem on top of the device . I saw in mount you are accessing files like superblock/fsid. So, for example, /var/lib/ceph/osd/ceph-0 should be a filesystem path, right ?
If so, this may not be the case always as there are key/value stores which can work on the raw device. In that case, these files (superblock/fsid) also need to go in the key/value db.
Let me know if I am missing anything.
Thanks & Regards
Somnath
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
info at http://vger.kernel.org/majordomo-info.html
N?????r??y??????X???v???)?{.n?????z?]z????ay?????j ??f???h??????w???
???j:+v???w???????? ????zZ+???????j"????i
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Varada Kari
2014-10-03 10:11:30 UTC
Permalink
I am not sure, if Rocksdb/LevelDB can work on a raw device. When I looked at code they were doing write to mount point/directory.

Varada

-----Original Message-----
From: ceph-devel-***@vger.kernel.org [mailto:ceph-devel-***@vger.kernel.org] On Behalf Of Haomai Wang
Sent: Friday, October 03, 2014 10:55 AM
To: Somnath Roy
Cc: Sage Weil (***@redhat.com); ceph-devel
Subject: Re: Regarding key/value interface

Correctly, maybe we can move these super metadata to backend!
Post by Somnath Roy
Hi Sage/Haomai,
I was going through the key/value store implementation and have one basic question regarding the way it is designed.
I think key/value interface is assuming there will be a filesystem on top of the device . I saw in mount you are accessing files like superblock/fsid. So, for example, /var/lib/ceph/osd/ceph-0 should be a filesystem path, right ?
If so, this may not be the case always as there are key/value stores which can work on the raw device. In that case, these files (superblock/fsid) also need to go in the key/value db.
Let me know if I am missing anything.
Thanks & Regards
Somnath
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to ***@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
N�����r��y����b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
Loading...