[ceph-users] RADOS pool snaps and RBD
(too old to reply)
Sage Weil
2014-10-21 22:46:03 UTC
Hi Xavier,

[Moving this to ceph-devel]
Hi Sage,
Yes, I know about rbd diff, but the motivation behind this was to be
able to dump an entire RBD pool via RADOS to another cluster, as our
primary cluster uses quite expensive SSD storage and we would like to
avoid constantly keeping one snapshot for every RBD image.
The idea would be to use the mtime of an object to determine if it
changed after the last backup. If it changed we just dump it to the
backup cluster. I understand it would consume more space than rbd
snapshot diff as we would dump the whole object, not just the new data,
but on the other side the space would be only wasted on the destination
cluster, which uses cheap rotational disks. I think it could be a good
idea, let's say it would be a sort of RADOS level incremental backup for
RBD pools.
So far, we found a way to do it, the only issue is that data would
change during the dump as RBD images are in use, and that's why we
wanted to use rados pool snapshots, but considering pool snapshots and
RBD snapshots are mutually exclusive it could be a problem.
Right. For RBD you need to snapshot each image independently.
After playing a little bit with RBD snapshots I realized we could use
RBD snapshots instead of rados pool snapshots to get a consistent copy
of an image, but I don't find a way to retrieve the original object
after the RBD snapshot using rados command.
Yeah, the CLI command won't do it. You need to use the C, C++, or Python
API directly. Basically, you need to map the data object back to the RBD
header (which you can do by looking for the rbd_header object with the
same fingerprint as the data block) to get the list of snaps for that
image. You use that to construct a snap context to read the right snap...

...but, that is all done for you by librbd. Just (using the C/C++/Python
librbd bindings) list the images in the pool, and for each image open a
handle for the right snapshot, and export that way. It will probably even
perform better since the IO load is spread across the cluster instead of
traversing each PG in order...

As I understand RBD uses RADOS object snapshots to provide snapshots for
RADOS object, part of an RBD image before the snapshot (output of rados
cloneid snaps size overlap
head - 4194304
cloneid snaps size overlap
8 8 4194304 [0~3145728,3149824~1044480]
head - 4194304
cloneid snaps size overlap
head - 4194304
So, if I would need a way to retrieve the "head" (which, and maybe I'm totally wrong, I understand should be the original object before the snapshot) to dump it to another ceph cluster ( Preferably using a command line tool, as I'm trying to prototype something using shell script).
How RBD works (rbd_directory, rbd_header, omap values, etc...) seems
pretty clear, but it seems RBD is using some kind of rados object level
snapshots and I could not find documentation about that feature.
Saludos cordiales,
Xavier Trilla P.
Silicon Hosting
?Sab?as que ahora en SiliconHosting
resolvemos tus dudas t?cnicas gratis?
M?s informaci?n en: siliconhosting.com/qa/
-----Mensaje original-----
Enviado el: martes, 21 de octubre de 2014 5:45
Para: Xavier Trilla
Asunto: Re: [ceph-users] RADOS pool snaps and RBD
It seems Ceph doesn't allow rados pool snapshots on RBD pools which have or had RBD snapshots. They only work on RBD pools which never had a RBD snapshot.
rados mkpool test-pool 1024 1024 replicated rbd -p test-pool create
--size=102400 test-image ceph osd pool mksnap test-pool rados-snap
rados mkpool test-pool 1024 1024 replicated rbd -p test-pool create
--size=102400 test-image rbd -p test-pool snap create
Error EINVAL: pool test-pool is in unmanaged snaps mode
I've been checking the source code and it seems to be the expecte behavior, but I did not manage to find any information regarding "unmanaged snaps mode". Also I did not find any information about RBD snapshots and pool snapshots being mutually exclusive. And even deleting all the RBD snapshots in a pool doesn't enable RADOS snapshots again.
- Are RBD and RADOS snapshots mutually exclusive?
Xinxin already mentioned this, but to confirm, yes.
- What does mean "unmanaged snaps mode" message?
It means the librados user is manaing its own snapshot metadata. I this case, that's RBD; it stores information about what snapshots apply to what images in the RBD header object.
- Is there any way to revert a pool status to allow RADOS pool snapshots after all RBD snapshots are removed?
We are designing a quite interesting way to perform incremental
backups of RBD pools managed by OpenStack Cinder. The idea is to do
the incremental backup at a RADOS level, basically using the mtime
property of the object and comparing it against the time we did the
last backup / pool snapshot. That way it should be really easy to find
modified objects transferring only them, making the implementation of
a DR solution easier.. But the issue explained here would be a big
problem, as the backup solution would stop working if just one user
creates a RBD snapshot on the pool (For example using Cinder Backup).
This is already possible using the diff-export and diff-import functions of RBD on a per-image granularity. I think the only thing it doesn't provide is the ability to build a consistency group of lots of images and snapshot them together.
Note also that listing all objects to find the changed ones is not very efficient. The export-diff function is currnetly also not very efficient (it enumerates image objects), but the 'object map' changes that Jason is working on for RBD will fix this and make it quite fast.
I hope somebody could give us more information about this "unmanaged
snaps mode" or point us to a way to revert this behavior once all RBD
snapshots have been removed from a pool.
Saludos cordiales,
Xavier Trilla P.
Silicon Hosting
?Sab?as que ahora en SiliconHosting
resolvemos tus dudas t?cnicas gratis?
M?s informaci?n en: siliconhosting.com/qa/
ceph-users mailing list
ceph-users mailing list
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html