Guang Yang
2014-08-19 06:30:11 UTC
Hi ceph-devel,
David (cc=92ed) reported a bug (http://tracker.ceph.com/issues/9128) wh=
ich we came across in our test cluster during our failure testing, basi=
cally the way to reproduce it was to leave one OSD daemon down and in f=
or a day, at the same time, keep giving write traffic. When the OSD dae=
mon was started again, it hit suicide timeout and kill itself.
After some analysis (details in the bug), David found that the op threa=
d was busy searching for missing objects and once the volume to search =
increase, the thread is expected to work that long time, please refer t=
o the bug for detailed logs.
One simple fix is to let the op thread reset the suicide timeout period=
ically when it is doing long-time work, other fix might be to cut the w=
ork into smaller pieces?
Any suggestion is welcome.
Thanks,
Guang--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
David (cc=92ed) reported a bug (http://tracker.ceph.com/issues/9128) wh=
ich we came across in our test cluster during our failure testing, basi=
cally the way to reproduce it was to leave one OSD daemon down and in f=
or a day, at the same time, keep giving write traffic. When the OSD dae=
mon was started again, it hit suicide timeout and kill itself.
After some analysis (details in the bug), David found that the op threa=
d was busy searching for missing objects and once the volume to search =
increase, the thread is expected to work that long time, please refer t=
o the bug for detailed logs.
One simple fix is to let the op thread reset the suicide timeout period=
ically when it is doing long-time work, other fix might be to cut the w=
ork into smaller pieces?
Any suggestion is welcome.
Thanks,
Guang--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html