Discussion:
ms_tcp_nodelay no effect on kernel-rbd
Chaitanya Huilgol
2014-09-04 11:39:38 UTC
Permalink
Hi,

In our benchmarking tests we observed that the ms_tcp_nodelay in ceph.conf option is not affecting the kernel rbd and as expected we see poor latency numbers for lower queue depths and 4K rand reads. There is significant increase in latency from qd=2 to 24 and starts tapering down for higher queue depths.
We did not find relevant kernel_setsockopt with TCP_NODELAY in the kernel RBD/libceph (messenger.c) source. Unless we are missing something, looks like currently the kernel RBD is not setting this and this is affecting latency numbers are lower queue depths.

I have tested with userspace fio(rbd engine) and rados bench and we see similar latency behavior when ms_tcp_nodelay is set to false. However setting this to true gives consistent low latency numbers for all queue depths

Any ideas/thoughts on this?

OS Ubuntu 14.04
Kernel: 3.13.0-24-generic #46-Ubuntu SMP
Ceph: Latest Master

Regards,
Chaitanya




________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2014-09-04 13:50:06 UTC
Permalink
On Thu, Sep 4, 2014 at 3:39 PM, Chaitanya Huilgol
Post by Chaitanya Huilgol
Hi,
In our benchmarking tests we observed that the ms_tcp_nodelay in ceph.conf option is not affecting the kernel rbd and as expected we see poor latency numbers for lower queue depths and 4K rand reads. There is significant increase in latency from qd=2 to 24 and starts tapering down for higher queue depths.
We did not find relevant kernel_setsockopt with TCP_NODELAY in the kernel RBD/libceph (messenger.c) source. Unless we are missing something, looks like currently the kernel RBD is not setting this and this is affecting latency numbers are lower queue depths.
I have tested with userspace fio(rbd engine) and rados bench and we see similar latency behavior when ms_tcp_nodelay is set to false. However setting this to true gives consistent low latency numbers for all queue depths
Any ideas/thoughts on this?
OS Ubuntu 14.04
Kernel: 3.13.0-24-generic #46-Ubuntu SMP
Ceph: Latest Master
No, we don't set TCP_NODELAY in the kernel client, but I think we can
add it as a rbd map/mount option. Sage?

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Sage Weil
2014-09-06 13:22:55 UTC
Permalink
Post by Ilya Dryomov
On Thu, Sep 4, 2014 at 3:39 PM, Chaitanya Huilgol
Post by Chaitanya Huilgol
Hi,
In our benchmarking tests we observed that the ms_tcp_nodelay in ceph.conf option is not affecting the kernel rbd and as expected we see poor latency numbers for lower queue depths and 4K rand reads. There is significant increase in latency from qd=2 to 24 and starts tapering down for higher queue depths.
We did not find relevant kernel_setsockopt with TCP_NODELAY in the kernel RBD/libceph (messenger.c) source. Unless we are missing something, looks like currently the kernel RBD is not setting this and this is affecting latency numbers are lower queue depths.
I have tested with userspace fio(rbd engine) and rados bench and we see similar latency behavior when ms_tcp_nodelay is set to false. However setting this to true gives consistent low latency numbers for all queue depths
Any ideas/thoughts on this?
OS Ubuntu 14.04
Kernel: 3.13.0-24-generic #46-Ubuntu SMP
Ceph: Latest Master
No, we don't set TCP_NODELAY in the kernel client, but I think we can
add it as a rbd map/mount option. Sage?
We definitely can, and I think more importantly it should be on by
default, as it is in userspace. I'm surpised we missed that. :( IIRC we
are carefully setting the MORE (or CORK?) flag on all but the last write
for a message, but I take it there is a socket-level option we missed?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2014-09-06 14:44:17 UTC
Permalink
Post by Sage Weil
Post by Ilya Dryomov
On Thu, Sep 4, 2014 at 3:39 PM, Chaitanya Huilgol
Post by Chaitanya Huilgol
Hi,
In our benchmarking tests we observed that the ms_tcp_nodelay in ceph.conf option is not affecting the kernel rbd and as expected we see poor latency numbers for lower queue depths and 4K rand reads. There is significant increase in latency from qd=2 to 24 and starts tapering down for higher queue depths.
We did not find relevant kernel_setsockopt with TCP_NODELAY in the kernel RBD/libceph (messenger.c) source. Unless we are missing something, looks like currently the kernel RBD is not setting this and this is affecting latency numbers are lower queue depths.
I have tested with userspace fio(rbd engine) and rados bench and we see similar latency behavior when ms_tcp_nodelay is set to false. However setting this to true gives consistent low latency numbers for all queue depths
Any ideas/thoughts on this?
OS Ubuntu 14.04
Kernel: 3.13.0-24-generic #46-Ubuntu SMP
Ceph: Latest Master
No, we don't set TCP_NODELAY in the kernel client, but I think we can
add it as a rbd map/mount option. Sage?
We definitely can, and I think more importantly it should be on by
default, as it is in userspace. I'm surpised we missed that. :( IIRC we
are carefully setting the MORE (or CORK?) flag on all but the last write
for a message, but I take it there is a socket-level option we missed?
Yeah, but also see http://tracker.ceph.com/issues/9345.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...