Discussion:
set_alloc_hint old osds
Samuel Just
2014-09-11 20:19:58 UTC
Permalink
http://tracker.ceph.com/issues/9419

librbd unconditionally sends set_alloc_hint. Do we require that users
upgrade the osds first? Also, should the primary respond with
ENOTSUPP if any replicas don't support it?
-Sam
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Gregory Farnum
2014-09-11 20:30:16 UTC
Permalink
Post by Samuel Just
http://tracker.ceph.com/issues/9419
librbd unconditionally sends set_alloc_hint. Do we require that users
upgrade the osds first? Also, should the primary respond with
ENOTSUPP if any replicas don't support it?
Something closer to the second option, I think...but then you run into
the problem where maybe the PG gets moved from a set of new OSDs to a
set of old ones that don't support the op. :/ I think for anything
that goes to disk you need to go through a full features-in-the-osdmap
process like we did for erasure coding.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Samuel Just
2014-09-11 20:33:48 UTC
Permalink
That part is harmless, the transaction would be recreated for the new
acting set taking into account the new acting set features. It
doesn't have any actual affect on the contents of the object.
-Sam
Post by Gregory Farnum
Post by Samuel Just
http://tracker.ceph.com/issues/9419
librbd unconditionally sends set_alloc_hint. Do we require that users
upgrade the osds first? Also, should the primary respond with
ENOTSUPP if any replicas don't support it?
Something closer to the second option, I think...but then you run into
the problem where maybe the PG gets moved from a set of new OSDs to a
set of old ones that don't support the op. :/ I think for anything
that goes to disk you need to go through a full features-in-the-osdmap
process like we did for erasure coding.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Gregory Farnum
2014-09-11 20:40:03 UTC
Permalink
Does the hint not go into the pg log? Which could be retried on an older OSD?
Post by Samuel Just
That part is harmless, the transaction would be recreated for the new
acting set taking into account the new acting set features. It
doesn't have any actual affect on the contents of the object.
-Sam
Post by Gregory Farnum
Post by Samuel Just
http://tracker.ceph.com/issues/9419
librbd unconditionally sends set_alloc_hint. Do we require that users
upgrade the osds first? Also, should the primary respond with
ENOTSUPP if any replicas don't support it?
Something closer to the second option, I think...but then you run into
the problem where maybe the PG gets moved from a set of new OSDs to a
set of old ones that don't support the op. :/ I think for anything
that goes to disk you need to go through a full features-in-the-osdmap
process like we did for erasure coding.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Samuel Just
2014-09-11 20:46:13 UTC
Permalink
No, we don't put the transaction into the pg log.
-Sam
Post by Gregory Farnum
Does the hint not go into the pg log? Which could be retried on an older OSD?
Post by Samuel Just
That part is harmless, the transaction would be recreated for the new
acting set taking into account the new acting set features. It
doesn't have any actual affect on the contents of the object.
-Sam
Post by Gregory Farnum
Post by Samuel Just
http://tracker.ceph.com/issues/9419
librbd unconditionally sends set_alloc_hint. Do we require that users
upgrade the osds first? Also, should the primary respond with
ENOTSUPP if any replicas don't support it?
Something closer to the second option, I think...but then you run into
the problem where maybe the PG gets moved from a set of new OSDs to a
set of old ones that don't support the op. :/ I think for anything
that goes to disk you need to go through a full features-in-the-osdmap
process like we did for erasure coding.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Gregory Farnum
2014-09-11 21:05:13 UTC
Permalink
Oh, in that case the peers could just share their supported ops with
the primary or something (like we do with mon commands). That sounds
good to me, anyway?
-Greg
Post by Samuel Just
No, we don't put the transaction into the pg log.
-Sam
Post by Gregory Farnum
Does the hint not go into the pg log? Which could be retried on an older OSD?
Post by Samuel Just
That part is harmless, the transaction would be recreated for the new
acting set taking into account the new acting set features. It
doesn't have any actual affect on the contents of the object.
-Sam
Post by Gregory Farnum
Post by Samuel Just
http://tracker.ceph.com/issues/9419
librbd unconditionally sends set_alloc_hint. Do we require that users
upgrade the osds first? Also, should the primary respond with
ENOTSUPP if any replicas don't support it?
Something closer to the second option, I think...but then you run into
the problem where maybe the PG gets moved from a set of new OSDs to a
set of old ones that don't support the op. :/ I think for anything
that goes to disk you need to go through a full features-in-the-osdmap
process like we did for erasure coding.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Samuel Just
2014-09-11 21:21:17 UTC
Permalink
Yeah, so that's part of it. The larger question is whether it's ok
for the client to indiscriminately send that op in the first place.
-Sam
Post by Gregory Farnum
Oh, in that case the peers could just share their supported ops with
the primary or something (like we do with mon commands). That sounds
good to me, anyway?
-Greg
Post by Samuel Just
No, we don't put the transaction into the pg log.
-Sam
Post by Gregory Farnum
Does the hint not go into the pg log? Which could be retried on an older OSD?
Post by Samuel Just
That part is harmless, the transaction would be recreated for the new
acting set taking into account the new acting set features. It
doesn't have any actual affect on the contents of the object.
-Sam
Post by Gregory Farnum
Post by Samuel Just
http://tracker.ceph.com/issues/9419
librbd unconditionally sends set_alloc_hint. Do we require that users
upgrade the osds first? Also, should the primary respond with
ENOTSUPP if any replicas don't support it?
Something closer to the second option, I think...but then you run into
the problem where maybe the PG gets moved from a set of new OSDs to a
set of old ones that don't support the op. :/ I think for anything
that goes to disk you need to go through a full features-in-the-osdmap
process like we did for erasure coding.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2014-09-12 08:15:26 UTC
Permalink
Post by Samuel Just
Yeah, so that's part of it. The larger question is whether it's ok
for the client to indiscriminately send that op in the first place.
FWIW, I think it's got to be. We don't control all the clients, and
I believe I mentioned this to Sage or Josh a while back. We set FAILOK
to make older OSDs ignore alloc hint op, but that of course that
doesn't help if it's (one of) the replica OSDs that is older. When
merging alloc hint, it was understood that if there are any older OSDs
in the acting set they will crash in FileStore, but nothing was done
about it..

The full feature bit sounded like an overkill, especially given that
alloc hint doesn't affect the data layout, older OSDs can still read
and write fine, and after all it's just a hint. Having the primary
return -EOPNOTSUPP based on lists of supported ops sounds to me like
a good idea, both for alloc hint op and future ops.

Thanks,

Ilya
Post by Samuel Just
-Sam
Post by Gregory Farnum
Oh, in that case the peers could just share their supported ops with
the primary or something (like we do with mon commands). That sounds
good to me, anyway?
-Greg
Post by Samuel Just
No, we don't put the transaction into the pg log.
-Sam
Post by Gregory Farnum
Does the hint not go into the pg log? Which could be retried on an older OSD?
Post by Samuel Just
That part is harmless, the transaction would be recreated for the new
acting set taking into account the new acting set features. It
doesn't have any actual affect on the contents of the object.
-Sam
Post by Gregory Farnum
Post by Samuel Just
http://tracker.ceph.com/issues/9419
librbd unconditionally sends set_alloc_hint. Do we require that users
upgrade the osds first? Also, should the primary respond with
ENOTSUPP if any replicas don't support it?
Something closer to the second option, I think...but then you run into
the problem where maybe the PG gets moved from a set of new OSDs to a
set of old ones that don't support the op. :/ I think for anything
that goes to disk you need to go through a full features-in-the-osdmap
process like we did for erasure coding.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...