Discussion:
10/14/2014 Weekly Ceph Performance Meeting
Mark Nelson
2014-10-15 03:40:54 UTC
Permalink
Hi All,

Just a reminder that the weekly performance meeting is on Wednesdays at
8AM PST!

Etherpad URL:
http://pad.ceph.com/p/performance_weekly

To join the Meeting:
https://bluejeans.com/268261044

To join via Browser:
https://bluejeans.com/268261044/browser

To join with Lync:
https://bluejeans.com/268261044/lync


To join via Room System:
Video Conferencing System: bjn.vc -or- 199.48.152.152
Meeting ID: 268261044

To join via Phone:
1) Dial:
+1 408 740 7256
+1 888 240 2560(US Toll Free)
+1 408 317 9253(Alternate Number)
(see all numbers - http://bluejeans.com/numbers)
2) Enter Conference ID: 268261044

Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Alexandre DERUMIER
2014-10-15 06:22:11 UTC
Permalink
Hi,

about performance, maybe could it be great to also include client side performance ?

Currently I see 2 performance problems with librbd:

1) The cpu usage is quite huge. (I'm cpu bound with 8cores CPU E5-2603 v2 @ 1.80GHz, with 40000iops 4k read using fio-rbd)

2) in qemu, it's impossible to reach more than around 7000iops with 1 disk. (maybe is it also related to cpu or threads number).
I have also try with the new qemu iothread/dataplane feature, but it doesn't help.





--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Nelson
2014-10-15 13:26:20 UTC
Permalink
Post by Alexandre DERUMIER
Hi,
about performance, maybe could it be great to also include client side performance ?
Sure! Please feel free to add this or other topics that are
useful/interesting to the etherpad. Please include your name though so
we know who's brought it up. Even if we don't get to everything it will
provide useful topics for the subsequent weeks.
Interesting. Have you taken a look with perf or other tools to see
where time is being spent?
Post by Alexandre DERUMIER
2) in qemu, it's impossible to reach more than around 7000iops with 1 disk. (maybe is it also related to cpu or threads number).
I have also try with the new qemu iothread/dataplane feature, but it doesn't help.
1 disk meaning 1 OSD, or 1 disk meaning 1 volume on a VM? If its 1
volume, does adding another volume on the same VM help? I'm not
familiar with the new qemu options, so that would be good to discuss at
the meeting too!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Alexandre DERUMIER
2014-10-15 14:42:04 UTC
Permalink
Sure! Please feel free to add this or other topics that are=20
useful/interesting to the etherpad. Please include your name though s=
o=20
we know who's brought it up. Even if we don't get to everything it wi=
ll=20
provide useful topics for the subsequent weeks.=20
Ok,great,I'll do it.
=20
Currently I see 2 performance problems with librbd:=20
=20
1) The cpu usage is quite huge. (I'm cpu bound with 8cores CPU E5-260=
Interesting. Have you taken a look with perf or other tools to see=20
where time is being spent?=20
Not yet, but I can try to do it, I'll have time next week.
=20
2) in qemu, it's impossible to reach more than around 7000iops with 1=
disk. (maybe is it also related to cpu or threads number).=20
I have also try with the new qemu iothread/dataplane feature, but it =
doesn't help.=20
1 disk meaning 1 OSD, or 1 disk meaning 1 volume on a VM?=20
yes, 1 disk =3D 1 volume on VM
If its 1 volume, does adding another volume on the same VM help?=20
As far I remember, yes . I'll test again to confirm.

Note that when benching with fio-rbd, I need to increase the client num=
ber too.

(1client - queue depth 32: +- 8000iops
2clients - queue depth 32: +- 16000iops=20
....
)
So maybe it's related
I'm not familiar with the new qemu options, so that would be good to =
discuss at=20
the meeting too!=20

The dataplane/iothread feature allow virtio disk to reach around 1.000.=
000 iops vs 100.000 iops without dataplane
http://www.linux-kvm.org/wiki/images/1/17/Kvm-forum-2013-Effective-mult=
ithreading-in-QEMU.pdf

Syntax to enable it:
qemu -object iothread,id=3Diothread0 -device virtio-blk-pci,iothread=
=3Diothread0,....



Regards,

Alexandre

----- Mail original -----=20

De: "Mark Nelson" <***@inktank.com>=20
=C3=80: "Alexandre DERUMIER" <***@odiso.com>, "Mark Nelson" <mark=
=***@inktank.com>=20
Cc: ceph-***@vger.kernel.org=20
Envoy=C3=A9: Mercredi 15 Octobre 2014 15:26:20=20
Objet: Re: 10/14/2014 Weekly Ceph Performance Meeting=20

On 10/15/2014 01:22 AM, Alexandre DERUMIER wrote:=20
Hi,=20
=20
about performance, maybe could it be great to also include client sid=
e performance ?=20

Sure! Please feel free to add this or other topics that are=20
useful/interesting to the etherpad. Please include your name though so=20
we know who's brought it up. Even if we don't get to everything it will=
=20
provide useful topics for the subsequent weeks.=20
=20
Currently I see 2 performance problems with librbd:=20
=20
1) The cpu usage is quite huge. (I'm cpu bound with 8cores CPU E5-260=
3 v2 @ 1.80GHz, with 40000iops 4k read using fio-rbd)=20

Interesting. Have you taken a look with perf or other tools to see=20
where time is being spent?=20
=20
2) in qemu, it's impossible to reach more than around 7000iops with 1=
disk. (maybe is it also related to cpu or threads number).=20
I have also try with the new qemu iothread/dataplane feature, but it =
doesn't help.=20

1 disk meaning 1 OSD, or 1 disk meaning 1 volume on a VM? If its 1=20
volume, does adding another volume on the same VM help? I'm not=20
familiar with the new qemu options, so that would be good to discuss at=
=20
the meeting too!=20
=20
=20
=20
=20
=20
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Alexandre DERUMIER
2014-10-15 15:54:35 UTC
Permalink
2) in qemu, it's impossible to reach more than around 7000iops with 1=
disk. (maybe is it also related to cpu or threads number).=20
I have also try with the new qemu iothread/dataplane feature, but it =
doesn't help.=20
If its 1 volume, does adding another volume on the same VM help?
Post by Alexandre DERUMIER
As far I remember, yes . I'll test again to confirm.
I have done test, It's scaling with multiple virtio disk on multiple rb=
d volume.

(Not sure, but maybe it's related to iodepth bug showed in this meeting=
in intel slides ?)


----- Mail original -----=20

De: "Alexandre DERUMIER" <***@odiso.com>=20
=C3=80: "Mark Nelson" <***@inktank.com>=20
Cc: ceph-***@vger.kernel.org=20
Envoy=C3=A9: Mercredi 15 Octobre 2014 16:42:04=20
Objet: Re: 10/14/2014 Weekly Ceph Performance Meeting=20
Sure! Please feel free to add this or other topics that are=20
useful/interesting to the etherpad. Please include your name though s=
o=20
we know who's brought it up. Even if we don't get to everything it wi=
ll=20
provide useful topics for the subsequent weeks.=20
Ok,great,I'll do it.=20
=20
Currently I see 2 performance problems with librbd:=20
=20
1) The cpu usage is quite huge. (I'm cpu bound with 8cores CPU E5-260=
Interesting. Have you taken a look with perf or other tools to see=20
where time is being spent?=20
Not yet, but I can try to do it, I'll have time next week.=20
=20
2) in qemu, it's impossible to reach more than around 7000iops with 1=
disk. (maybe is it also related to cpu or threads number).=20
I have also try with the new qemu iothread/dataplane feature, but it =
doesn't help.=20
1 disk meaning 1 OSD, or 1 disk meaning 1 volume on a VM?=20
yes, 1 disk =3D 1 volume on VM=20
If its 1 volume, does adding another volume on the same VM help?=20
As far I remember, yes . I'll test again to confirm.=20

Note that when benching with fio-rbd, I need to increase the client num=
ber too.=20

(1client - queue depth 32: +- 8000iops=20
2clients - queue depth 32: +- 16000iops=20
=2E...=20
)=20
So maybe it's related=20
I'm not familiar with the new qemu options, so that would be good to =
discuss at=20
the meeting too!=20

The dataplane/iothread feature allow virtio disk to reach around 1.000.=
000 iops vs 100.000 iops without dataplane=20
http://www.linux-kvm.org/wiki/images/1/17/Kvm-forum-2013-Effective-mult=
ithreading-in-QEMU.pdf=20

Syntax to enable it:=20
qemu -object iothread,id=3Diothread0 -device virtio-blk-pci,iothread=3D=
iothread0,....=20



Regards,=20

Alexandre=20

----- Mail original -----=20

De: "Mark Nelson" <***@inktank.com>=20
=C3=80: "Alexandre DERUMIER" <***@odiso.com>, "Mark Nelson" <mark=
=***@inktank.com>=20
Cc: ceph-***@vger.kernel.org=20
Envoy=C3=A9: Mercredi 15 Octobre 2014 15:26:20=20
Objet: Re: 10/14/2014 Weekly Ceph Performance Meeting=20

On 10/15/2014 01:22 AM, Alexandre DERUMIER wrote:=20
Hi,=20
=20
about performance, maybe could it be great to also include client sid=
e performance ?=20

Sure! Please feel free to add this or other topics that are=20
useful/interesting to the etherpad. Please include your name though so=20
we know who's brought it up. Even if we don't get to everything it will=
=20
provide useful topics for the subsequent weeks.=20
=20
Currently I see 2 performance problems with librbd:=20
=20
1) The cpu usage is quite huge. (I'm cpu bound with 8cores CPU E5-260=
3 v2 @ 1.80GHz, with 40000iops 4k read using fio-rbd)=20

Interesting. Have you taken a look with perf or other tools to see=20
where time is being spent?=20
=20
2) in qemu, it's impossible to reach more than around 7000iops with 1=
disk. (maybe is it also related to cpu or threads number).=20
I have also try with the new qemu iothread/dataplane feature, but it =
doesn't help.=20

1 disk meaning 1 OSD, or 1 disk meaning 1 volume on a VM? If its 1=20
volume, does adding another volume on the same VM help? I'm not=20
familiar with the new qemu options, so that would be good to discuss at=
=20
the meeting too!=20
=20
=20
=20
=20
=20
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Alexandre DERUMIER
2014-10-15 16:04:27 UTC
Permalink
I have done test, It's scaling with multiple virtio disk on multiple =
rbd volume.=20
(Not sure, but maybe it's related to iodepth bug showed in this meeti=
ng in intel slides ?)=20

I'll do test next week with virtio-scsi, it seem that it's possible to =
use multiple queues

-device virtio-scsi-pci,id=3Dscsi0,num_queues=3D8,...

So, maybe it'll help.



----- Mail original -----=20

De: "Alexandre DERUMIER" <***@odiso.com>=20
=C3=80: "Mark Nelson" <***@inktank.com>=20
Cc: ceph-***@vger.kernel.org=20
Envoy=C3=A9: Mercredi 15 Octobre 2014 17:54:35=20
Objet: Re: 10/14/2014 Weekly Ceph Performance Meeting=20
2) in qemu, it's impossible to reach more than around 7000iops with 1=
disk. (maybe is it also related to cpu or threads number).=20
I have also try with the new qemu iothread/dataplane feature, but it =
doesn't help.=20
If its 1 volume, does adding another volume on the same VM help?=20
Post by Alexandre DERUMIER
As far I remember, yes . I'll test again to confirm.=20
I have done test, It's scaling with multiple virtio disk on multiple rb=
d volume.=20

(Not sure, but maybe it's related to iodepth bug showed in this meeting=
in intel slides ?)=20


----- Mail original -----=20

De: "Alexandre DERUMIER" <***@odiso.com>=20
=C3=80: "Mark Nelson" <***@inktank.com>=20
Cc: ceph-***@vger.kernel.org=20
Envoy=C3=A9: Mercredi 15 Octobre 2014 16:42:04=20
Objet: Re: 10/14/2014 Weekly Ceph Performance Meeting=20
Sure! Please feel free to add this or other topics that are=20
useful/interesting to the etherpad. Please include your name though s=
o=20
we know who's brought it up. Even if we don't get to everything it wi=
ll=20
provide useful topics for the subsequent weeks.=20
Ok,great,I'll do it.=20
=20
Currently I see 2 performance problems with librbd:=20
=20
1) The cpu usage is quite huge. (I'm cpu bound with 8cores CPU E5-260=
Interesting. Have you taken a look with perf or other tools to see=20
where time is being spent?=20
Not yet, but I can try to do it, I'll have time next week.=20
=20
2) in qemu, it's impossible to reach more than around 7000iops with 1=
disk. (maybe is it also related to cpu or threads number).=20
I have also try with the new qemu iothread/dataplane feature, but it =
doesn't help.=20
1 disk meaning 1 OSD, or 1 disk meaning 1 volume on a VM?=20
yes, 1 disk =3D 1 volume on VM=20
If its 1 volume, does adding another volume on the same VM help?=20
As far I remember, yes . I'll test again to confirm.=20

Note that when benching with fio-rbd, I need to increase the client num=
ber too.=20

(1client - queue depth 32: +- 8000iops=20
2clients - queue depth 32: +- 16000iops=20
=2E...=20
)=20
So maybe it's related=20
I'm not familiar with the new qemu options, so that would be good to =
discuss at=20
the meeting too!=20

The dataplane/iothread feature allow virtio disk to reach around 1.000.=
000 iops vs 100.000 iops without dataplane=20
http://www.linux-kvm.org/wiki/images/1/17/Kvm-forum-2013-Effective-mult=
ithreading-in-QEMU.pdf=20

Syntax to enable it:=20
qemu -object iothread,id=3Diothread0 -device virtio-blk-pci,iothread=3D=
iothread0,....=20



Regards,=20

Alexandre=20

----- Mail original -----=20

De: "Mark Nelson" <***@inktank.com>=20
=C3=80: "Alexandre DERUMIER" <***@odiso.com>, "Mark Nelson" <mark=
=***@inktank.com>=20
Cc: ceph-***@vger.kernel.org=20
Envoy=C3=A9: Mercredi 15 Octobre 2014 15:26:20=20
Objet: Re: 10/14/2014 Weekly Ceph Performance Meeting=20

On 10/15/2014 01:22 AM, Alexandre DERUMIER wrote:=20
Hi,=20
=20
about performance, maybe could it be great to also include client sid=
e performance ?=20

Sure! Please feel free to add this or other topics that are=20
useful/interesting to the etherpad. Please include your name though so=20
we know who's brought it up. Even if we don't get to everything it will=
=20
provide useful topics for the subsequent weeks.=20
=20
Currently I see 2 performance problems with librbd:=20
=20
1) The cpu usage is quite huge. (I'm cpu bound with 8cores CPU E5-260=
3 v2 @ 1.80GHz, with 40000iops 4k read using fio-rbd)=20

Interesting. Have you taken a look with perf or other tools to see=20
where time is being spent?=20
=20
2) in qemu, it's impossible to reach more than around 7000iops with 1=
disk. (maybe is it also related to cpu or threads number).=20
I have also try with the new qemu iothread/dataplane feature, but it =
doesn't help.=20

1 disk meaning 1 OSD, or 1 disk meaning 1 volume on a VM? If its 1=20
volume, does adding another volume on the same VM help? I'm not=20
familiar with the new qemu options, so that would be good to discuss at=
=20
the meeting too!=20
=20
=20
=20
=20
=20
--=20
To unsubscribe from this list: send the line "unsubscribe ceph-devel" i=
n=20
the body of a message to ***@vger.kernel.org=20
More majordomo info at http://vger.kernel.org/majordomo-info.html=20
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chen, Xiaoxi
2014-10-16 03:27:13 UTC
Permalink
We also find this before, seems it's because QEMU use single thread for IO, I tried to enable the debug log of in librbd, find that the threaded is always the same. Assuming the backend is power enough, so how many IOs can be sent out by qemu == how many IOPS can we get.

The upper bound may varies depends on how fast QEMU can send out a request, I remember we have tried different CPU model and get best with an very old NHM CPU(with high frequency to 2.93Ghz, take ~0.03 to send out a request and we get 22K read IOPS bound), better than new SNB CPU. Frequency do play important part since it's single thread.

-----Original Message-----
From: ceph-devel-***@vger.kernel.org [mailto:ceph-devel-***@vger.kernel.org] On Behalf Of Alexandre DERUMIER
Sent: Wednesday, October 15, 2014 11:55 PM
To: Mark Nelson
Cc: ceph-***@vger.kernel.org
Subject: Re: 10/14/2014 Weekly Ceph Performance Meeting
Post by Alexandre DERUMIER
2) in qemu, it's impossible to reach more than around 7000iops with 1 disk. (maybe is it also related to cpu or threads number).
I have also try with the new qemu iothread/dataplane feature, but it doesn't help.
If its 1 volume, does adding another volume on the same VM help?
Post by Alexandre DERUMIER
As far I remember, yes . I'll test again to confirm.
I have done test, It's scaling with multiple virtio disk on multiple rbd volume.

(Not sure, but maybe it's related to iodepth bug showed in this meeting in intel slides ?)


----- Mail original -----

De: "Alexandre DERUMIER" <***@odiso.com>
À: "Mark Nelson" <***@inktank.com>
Cc: ceph-***@vger.kernel.org
Envoyé: Mercredi 15 Octobre 2014 16:42:04
Objet: Re: 10/14/2014 Weekly Ceph Performance Meeting
Post by Alexandre DERUMIER
Sure! Please feel free to add this or other topics that are
useful/interesting to the etherpad. Please include your name though so
we know who's brought it up. Even if we don't get to everything it
will provide useful topics for the subsequent weeks.
Ok,great,I'll do it.
Post by Alexandre DERUMIER
1) The cpu usage is quite huge. (I'm cpu bound with 8cores CPU E5-2603
Interesting. Have you taken a look with perf or other tools to see
where time is being spent?
Not yet, but I can try to do it, I'll have time next week.
Post by Alexandre DERUMIER
2) in qemu, it's impossible to reach more than around 7000iops with 1 disk. (maybe is it also related to cpu or threads number).
I have also try with the new qemu iothread/dataplane feature, but it doesn't help.
1 disk meaning 1 OSD, or 1 disk meaning 1 volume on a VM?
yes, 1 disk = 1 volume on VM
Post by Alexandre DERUMIER
If its 1 volume, does adding another volume on the same VM help?
As far I remember, yes . I'll test again to confirm.

Note that when benching with fio-rbd, I need to increase the client number too.

(1client - queue depth 32: +- 8000iops
2clients - queue depth 32: +- 16000iops ....
)
So maybe it's related
Post by Alexandre DERUMIER
I'm not familiar with the new qemu options, so that would be good to
discuss at
the meeting too!

The dataplane/iothread feature allow virtio disk to reach around 1.000.000 iops vs 100.000 iops without dataplane http://www.linux-kvm.org/wiki/images/1/17/Kvm-forum-2013-Effective-multithreading-in-QEMU.pdf

Syntax to enable it:
qemu -object iothread,id=iothread0 -device virtio-blk-pci,iothread=iothread0,....



Regards,

Alexandre

----- Mail original -----

De: "Mark Nelson" <***@inktank.com>
À: "Alexandre DERUMIER" <***@odiso.com>, "Mark Nelson" <***@inktank.com>
Cc: ceph-***@vger.kernel.org
Envoyé: Mercredi 15 Octobre 2014 15:26:20
Objet: Re: 10/14/2014 Weekly Ceph Performance Meeting
Post by Alexandre DERUMIER
Hi,
about performance, maybe could it be great to also include client side performance ?
Sure! Please feel free to add this or other topics that are useful/interesting to the etherpad. Please include your name though so we know who's brought it up. Even if we don't get to everything it will provide useful topics for the subsequent weeks.
Interesting. Have you taken a look with perf or other tools to see
where time is being spent?
Post by Alexandre DERUMIER
2) in qemu, it's impossible to reach more than around 7000iops with 1 disk. (maybe is it also related to cpu or threads number).
I have also try with the new qemu iothread/dataplane feature, but it doesn't help.
1 disk meaning 1 OSD, or 1 disk meaning 1 volume on a VM? If its 1
volume, does adding another volume on the same VM help? I'm not
familiar with the new qemu options, so that would be good to discuss at
the meeting too!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
��{.n�+�������+%��lzwm��b�맲��r��yǩ�ׯzX����ܨ}���Ơz�&j:+v�������zZ+
Alexandre DERUMIER
2014-10-16 06:25:25 UTC
Permalink
We also find this before, seems it's because QEMU use single thread f=
or IO, I tried to enable the debug log of in librbd, find that the thre=
aded is always the same. Assuming the backend is power enough, so how >=
many IOs can be sent out by qemu =3D=3D how many IOPS can we get.
The upper bound may varies depends on how fast QEMU can send out a re=
quest, I remember we have tried different CPU model and get best with =
an very old NHM CPU(with high frequency to 2.93Ghz, take ~0.03 to send =
out a request and we get 22K read IOPS bound), better than new SNB CP=
U. Frequency do play important part since it's single thread.

Hi, indeed qemu use a single thread per queue.
I think it's not a problem with common storage (nfs,scsi,..) because th=
ey use less cpu ressource than librbd, so you can reach easily > 100000=
iops with 1thread.


Now, the good new, virtio-scsi support multiqueue , with num_queues opt=
ion.
I have done bench with num_queues, and now I can finally reach 50000iop=
s with a single qemu disk-> rbd volume :)

If you use libvirt, here the config:
http://www.redhat.com/archives/libvir-list/2013-April/msg00021.html






Now, about fio-rbd cpu usage, something is really strange,

fio (rbd engine) on qemu host : 40000iops - 8 cores 100%
fio (aio) inside qemu with virtio-scsi + num_queues : 50000 iops - arou=
nd 4 cores 100%

So, maybe something is bad in fio-rbd implementation ?

I need to bench again to be sure about this, I'll send results in some =
days.



----- Mail original -----=20

De: "Xiaoxi Chen" <***@intel.com>=20
=C3=80: "Alexandre DERUMIER" <***@odiso.com>, "Mark Nelson" <mark=
=***@inktank.com>=20
Cc: ceph-***@vger.kernel.org=20
Envoy=C3=A9: Jeudi 16 Octobre 2014 05:27:13=20
Objet: RE: 10/14/2014 Weekly Ceph Performance Meeting=20

We also find this before, seems it's because QEMU use single thread for=
IO, I tried to enable the debug log of in librbd, find that the thread=
ed is always the same. Assuming the backend is power enough, so how man=
y IOs can be sent out by qemu =3D=3D how many IOPS can we get.=20

The upper bound may varies depends on how fast QEMU can send out a requ=
est, I remember we have tried different CPU model and get best with an =
very old NHM CPU(with high frequency to 2.93Ghz, take ~0.03 to send out=
a request and we get 22K read IOPS bound), better than new SNB CPU. Fr=
equency do play important part since it's single thread.=20

-----Original Message-----=20
=46rom: ceph-devel-***@vger.kernel.org [mailto:ceph-devel-***@vger.=
kernel.org] On Behalf Of Alexandre DERUMIER=20
Sent: Wednesday, October 15, 2014 11:55 PM=20
To: Mark Nelson=20
Cc: ceph-***@vger.kernel.org=20
Subject: Re: 10/14/2014 Weekly Ceph Performance Meeting=20
2) in qemu, it's impossible to reach more than around 7000iops with 1=
disk. (maybe is it also related to cpu or threads number).=20
I have also try with the new qemu iothread/dataplane feature, but it =
doesn't help.=20
If its 1 volume, does adding another volume on the same VM help?=20
Post by Alexandre DERUMIER
As far I remember, yes . I'll test again to confirm.=20
I have done test, It's scaling with multiple virtio disk on multiple rb=
d volume.=20

(Not sure, but maybe it's related to iodepth bug showed in this meeting=
in intel slides ?)=20


----- Mail original -----=20

De: "Alexandre DERUMIER" <***@odiso.com>=20
=C3=80: "Mark Nelson" <***@inktank.com>=20
Cc: ceph-***@vger.kernel.org=20
Envoy=C3=A9: Mercredi 15 Octobre 2014 16:42:04=20
Objet: Re: 10/14/2014 Weekly Ceph Performance Meeting=20
Sure! Please feel free to add this or other topics that are=20
useful/interesting to the etherpad. Please include your name though s=
o=20
we know who's brought it up. Even if we don't get to everything it=20
will provide useful topics for the subsequent weeks.=20
Ok,great,I'll do it.=20
=20
Currently I see 2 performance problems with librbd:=20
=20
1) The cpu usage is quite huge. (I'm cpu bound with 8cores CPU E5-260=
3=20
Interesting. Have you taken a look with perf or other tools to see=20
where time is being spent?=20
Not yet, but I can try to do it, I'll have time next week.=20
=20
2) in qemu, it's impossible to reach more than around 7000iops with 1=
disk. (maybe is it also related to cpu or threads number).=20
I have also try with the new qemu iothread/dataplane feature, but it =
doesn't help.=20
1 disk meaning 1 OSD, or 1 disk meaning 1 volume on a VM?=20
yes, 1 disk =3D 1 volume on VM=20
If its 1 volume, does adding another volume on the same VM help?=20
As far I remember, yes . I'll test again to confirm.=20

Note that when benching with fio-rbd, I need to increase the client num=
ber too.=20

(1client - queue depth 32: +- 8000iops=20
2clients - queue depth 32: +- 16000iops ....=20
)=20
So maybe it's related=20
I'm not familiar with the new qemu options, so that would be good to=20
discuss at=20
the meeting too!=20

The dataplane/iothread feature allow virtio disk to reach around 1.000.=
000 iops vs 100.000 iops without dataplane http://www.linux-kvm.org/wik=
i/images/1/17/Kvm-forum-2013-Effective-multithreading-in-QEMU.pdf=20

Syntax to enable it:=20
qemu -object iothread,id=3Diothread0 -device virtio-blk-pci,iothread=3D=
iothread0,....=20



Regards,=20

Alexandre=20

----- Mail original -----=20

De: "Mark Nelson" <***@inktank.com>=20
=C3=80: "Alexandre DERUMIER" <***@odiso.com>, "Mark Nelson" <mark=
=***@inktank.com>=20
Cc: ceph-***@vger.kernel.org=20
Envoy=C3=A9: Mercredi 15 Octobre 2014 15:26:20=20
Objet: Re: 10/14/2014 Weekly Ceph Performance Meeting=20

On 10/15/2014 01:22 AM, Alexandre DERUMIER wrote:=20
Hi,=20
=20
about performance, maybe could it be great to also include client sid=
e performance ?=20

Sure! Please feel free to add this or other topics that are useful/inte=
resting to the etherpad. Please include your name though so we know who=
's brought it up. Even if we don't get to everything it will provide us=
eful topics for the subsequent weeks.=20
=20
Currently I see 2 performance problems with librbd:=20
=20
1) The cpu usage is quite huge. (I'm cpu bound with 8cores CPU E5-260=
3 v2 @ 1.80GHz, with 40000iops 4k read using fio-rbd)=20

Interesting. Have you taken a look with perf or other tools to see=20
where time is being spent?=20
=20
2) in qemu, it's impossible to reach more than around 7000iops with 1=
disk. (maybe is it also related to cpu or threads number).=20
I have also try with the new qemu iothread/dataplane feature, but it =
doesn't help.=20

1 disk meaning 1 OSD, or 1 disk meaning 1 volume on a VM? If its 1=20
volume, does adding another volume on the same VM help? I'm not=20
familiar with the new qemu options, so that would be good to discuss at=
=20
the meeting too!=20
=20
=20
=20
=20
=20
--=20
To unsubscribe from this list: send the line "unsubscribe ceph-devel" i=
n=20
the body of a message to ***@vger.kernel.org=20
More majordomo info at http://vger.kernel.org/majordomo-info.html=20
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chen, Xiaoxi
2014-10-16 06:53:05 UTC
Permalink
Post by Alexandre DERUMIER
Hi, indeed qemu use a single thread per queue.
I think it's not a problem with common storage (nfs,scsi,..) because they use less cpu ressource than librbd, so you can reach
easily > 100000iops with 1thread.
I think it's a common problem shared by all storage backend(nfs,scsi), but Ceph take longer time to sent out an IO(30us), while NFS and scsi is really simple(no crush, no stripping, etc) that may only take 3us to send out an IO, so the upper bound is 10X than ceph.
Post by Alexandre DERUMIER
Now, the good new, virtio-scsi support multiqueue , with num_queues option.
I have done bench with num_queues, and now I can finally reach 50000iops with a single qemu disk-> rbd volume :)
http://www.redhat.com/archives/libvir-list/2013-April/msg00021.html
Thanks, this is really very helpful for us, we will try it out, thank you

-----Original Message-----
From: Alexandre DERUMIER [mailto:***@odiso.com]
Sent: Thursday, October 16, 2014 2:25 PM
To: Chen, Xiaoxi
Cc: ceph-***@vger.kernel.org; Mark Nelson
Subject: Re: 10/14/2014 Weekly Ceph Performance Meeting
Post by Alexandre DERUMIER
We also find this before, seems it's because QEMU use single thread for IO, I tried to enable the debug log of in librbd, find that the threaded is always the same. Assuming the backend is power enough, so how >>many IOs can be sent out by qemu == how many IOPS can we get.
The upper bound may varies depends on how fast QEMU can send out a request, I remember we have tried different CPU model and get best with an very old NHM CPU(with high frequency to 2.93Ghz, take ~0.03 to send >>out a request and we get 22K read IOPS bound), better than new SNB CPU. Frequency do play important part since it's single thread.
Hi, indeed qemu use a single thread per queue.
I think it's not a problem with common storage (nfs,scsi,..) because they use less cpu ressource than librbd, so you can reach easily > 100000iops with 1thread.


Now, the good new, virtio-scsi support multiqueue , with num_queues option.
I have done bench with num_queues, and now I can finally reach 50000iops with a single qemu disk-> rbd volume :)

If you use libvirt, here the config:
http://www.redhat.com/archives/libvir-list/2013-April/msg00021.html






Now, about fio-rbd cpu usage, something is really strange,

fio (rbd engine) on qemu host : 40000iops - 8 cores 100% fio (aio) inside qemu with virtio-scsi + num_queues : 50000 iops - around 4 cores 100%

So, maybe something is bad in fio-rbd implementation ?

I need to bench again to be sure about this, I'll send results in some days.



----- Mail original -----

De: "Xiaoxi Chen" <***@intel.com>
À: "Alexandre DERUMIER" <***@odiso.com>, "Mark Nelson" <***@inktank.com>
Cc: ceph-***@vger.kernel.org
Envoyé: Jeudi 16 Octobre 2014 05:27:13
Objet: RE: 10/14/2014 Weekly Ceph Performance Meeting

We also find this before, seems it's because QEMU use single thread for IO, I tried to enable the debug log of in librbd, find that the threaded is always the same. Assuming the backend is power enough, so how many IOs can be sent out by qemu == how many IOPS can we get.

The upper bound may varies depends on how fast QEMU can send out a request, I remember we have tried different CPU model and get best with an very old NHM CPU(with high frequency to 2.93Ghz, take ~0.03 to send out a request and we get 22K read IOPS bound), better than new SNB CPU. Frequency do play important part since it's single thread.

-----Original Message-----
From: ceph-devel-***@vger.kernel.org [mailto:ceph-devel-***@vger.kernel.org] On Behalf Of Alexandre DERUMIER
Sent: Wednesday, October 15, 2014 11:55 PM
To: Mark Nelson
Cc: ceph-***@vger.kernel.org
Subject: Re: 10/14/2014 Weekly Ceph Performance Meeting
Post by Alexandre DERUMIER
2) in qemu, it's impossible to reach more than around 7000iops with 1 disk. (maybe is it also related to cpu or threads number).
I have also try with the new qemu iothread/dataplane feature, but it doesn't help.
If its 1 volume, does adding another volume on the same VM help?
Post by Alexandre DERUMIER
As far I remember, yes . I'll test again to confirm.
I have done test, It's scaling with multiple virtio disk on multiple rbd volume.

(Not sure, but maybe it's related to iodepth bug showed in this meeting in intel slides ?)


----- Mail original -----

De: "Alexandre DERUMIER" <***@odiso.com>
À: "Mark Nelson" <***@inktank.com>
Cc: ceph-***@vger.kernel.org
Envoyé: Mercredi 15 Octobre 2014 16:42:04
Objet: Re: 10/14/2014 Weekly Ceph Performance Meeting
Post by Alexandre DERUMIER
Sure! Please feel free to add this or other topics that are
useful/interesting to the etherpad. Please include your name though so
we know who's brought it up. Even if we don't get to everything it
will provide useful topics for the subsequent weeks.
Ok,great,I'll do it.
Post by Alexandre DERUMIER
1) The cpu usage is quite huge. (I'm cpu bound with 8cores CPU E5-2603
Interesting. Have you taken a look with perf or other tools to see
where time is being spent?
Not yet, but I can try to do it, I'll have time next week.
Post by Alexandre DERUMIER
2) in qemu, it's impossible to reach more than around 7000iops with 1 disk. (maybe is it also related to cpu or threads number).
I have also try with the new qemu iothread/dataplane feature, but it doesn't help.
1 disk meaning 1 OSD, or 1 disk meaning 1 volume on a VM?
yes, 1 disk = 1 volume on VM
Post by Alexandre DERUMIER
If its 1 volume, does adding another volume on the same VM help?
As far I remember, yes . I'll test again to confirm.

Note that when benching with fio-rbd, I need to increase the client number too.

(1client - queue depth 32: +- 8000iops
2clients - queue depth 32: +- 16000iops ....
)
So maybe it's related
Post by Alexandre DERUMIER
I'm not familiar with the new qemu options, so that would be good to
discuss at
the meeting too!

The dataplane/iothread feature allow virtio disk to reach around 1.000.000 iops vs 100.000 iops without dataplane http://www.linux-kvm.org/wiki/images/1/17/Kvm-forum-2013-Effective-multithreading-in-QEMU.pdf

Syntax to enable it:
qemu -object iothread,id=iothread0 -device virtio-blk-pci,iothread=iothread0,....



Regards,

Alexandre

----- Mail original -----

De: "Mark Nelson" <***@inktank.com>
À: "Alexandre DERUMIER" <***@odiso.com>, "Mark Nelson" <***@inktank.com>
Cc: ceph-***@vger.kernel.org
Envoyé: Mercredi 15 Octobre 2014 15:26:20
Objet: Re: 10/14/2014 Weekly Ceph Performance Meeting
Post by Alexandre DERUMIER
Hi,
about performance, maybe could it be great to also include client side performance ?
Sure! Please feel free to add this or other topics that are useful/interesting to the etherpad. Please include your name though so we know who's brought it up. Even if we don't get to everything it will provide useful topics for the subsequent weeks.
Interesting. Have you taken a look with perf or other tools to see
where time is being spent?
Post by Alexandre DERUMIER
2) in qemu, it's impossible to reach more than around 7000iops with 1 disk. (maybe is it also related to cpu or threads number).
I have also try with the new qemu iothread/dataplane feature, but it doesn't help.
1 disk meaning 1 OSD, or 1 disk meaning 1 volume on a VM? If its 1
volume, does adding another volume on the same VM help? I'm not
familiar with the new qemu options, so that would be good to discuss at
the meeting too!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
��칻�&�~�&���+-��ݶ��w��˛���m��^��b��^n�r���z���h�����&���G���h�
Sage Weil
2014-10-23 21:53:57 UTC
Permalink
Post by Chen, Xiaoxi
Post by Alexandre DERUMIER
Hi, indeed qemu use a single thread per queue.
I think it's not a problem with common storage (nfs,scsi,..) because they use less cpu ressource than librbd, so you can reach
easily > 100000iops with 1thread.
I think it's a common problem shared by all storage backend(nfs,scsi),
but Ceph take longer time to sent out an IO(30us), while NFS and scsi is
really simple(no crush, no stripping, etc) that may only take 3us to
send out an IO, so the upper bound is 10X than ceph.
I would be very interested in seeing where the CPU time is actually spent.
I know there are is some locking contention in librbd, but with the
librados changes that layer at least should have have a much lower
overhead. We also should be preserving mappings for PGs in most cases to
avoid much time spent in CRUSH. It would be very interesting to be
proven wrong, though!

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Milosz Tanski
2014-10-23 22:06:17 UTC
Permalink
Post by Sage Weil
Post by Chen, Xiaoxi
Post by Alexandre DERUMIER
Hi, indeed qemu use a single thread per queue.
I think it's not a problem with common storage (nfs,scsi,..) because they use less cpu ressource than librbd, so you can reach
easily > 100000iops with 1thread.
I think it's a common problem shared by all storage backend(nfs,scsi),
but Ceph take longer time to sent out an IO(30us), while NFS and scsi is
really simple(no crush, no stripping, etc) that may only take 3us to
send out an IO, so the upper bound is 10X than ceph.
I would be very interested in seeing where the CPU time is actually spent.
I know there are is some locking contention in librbd, but with the
librados changes that layer at least should have have a much lower
overhead. We also should be preserving mappings for PGs in most cases to
avoid much time spent in CRUSH. It would be very interesting to be
proven wrong, though!
Measuring both should be simple. pref is a really great at getting
these kinds of answers quickly. The CPU time part is pretty easy
measure so I'll skip that. And to measure contention you can have pref
watch for the sys:futex_enter tracepoint event. This way you measure
all the times a lock was content... and not only which lock but also
what code path.

Make sure you compile the application (or at very least the library)
without omit-frame pointer gcc flag (it's default on x86_64). And then
the perf command to do that is:

shell $ perf record -g -e syscalls:sys_exit_futex ./my_appname
shell $ perf report

You should have an answer to contention points at the end of your
test. I would offer to help more, but don't use RBD and have no
experience with those parts.

Make sure you're using a newish kernel / newish perf with support for
tracepoints, and you make sure you have the right permission to use
tracepoint events in perf.
Post by Sage Weil
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: ***@adfin.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Matt W. Benjamin
2014-10-15 15:50:35 UTC
Permalink
Having trouble joining. No feedback on why from the BlueJeans plugin.

Matt
Post by Mark Nelson
Hi All,
Just a reminder that the weekly performance meeting is on Wednesdays at
8AM PST!
http://pad.ceph.com/p/performance_weekly
https://bluejeans.com/268261044
https://bluejeans.com/268261044/browser
https://bluejeans.com/268261044/lync
Video Conferencing System: bjn.vc -or- 199.48.152.152
Meeting ID: 268261044
+1 408 740 7256
+1 888 240 2560(US Toll Free)
+1 408 317 9253(Alternate Number)
(see all numbers - http://bluejeans.com/numbers)
2) Enter Conference ID: 268261044
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI 48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Nelson
2014-10-15 15:58:41 UTC
Permalink
No good! What browser? I can try to find out if anyone else has had
problems since RH uses bluejeans pretty much across the company.

Mark
Post by Matt W. Benjamin
Having trouble joining. No feedback on why from the BlueJeans plugin.
Matt
Post by Mark Nelson
Hi All,
Just a reminder that the weekly performance meeting is on Wednesdays at
8AM PST!
http://pad.ceph.com/p/performance_weekly
https://bluejeans.com/268261044
https://bluejeans.com/268261044/browser
https://bluejeans.com/268261044/lync
Video Conferencing System: bjn.vc -or- 199.48.152.152
Meeting ID: 268261044
+1 408 740 7256
+1 888 240 2560(US Toll Free)
+1 408 317 9253(Alternate Number)
(see all numbers - http://bluejeans.com/numbers)
2) Enter Conference ID: 268261044
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Matt W. Benjamin
2014-10-15 16:44:12 UTC
Permalink
I was running on (ahem) Windows. I tried with both Firefox and Chrome. It worked last week and previously.

Matt
Post by Mark Nelson
No good! What browser? I can try to find out if anyone else has had
problems since RH uses bluejeans pretty much across the company.
Mark
Post by Matt W. Benjamin
Having trouble joining. No feedback on why from the BlueJeans
plugin.
Post by Matt W. Benjamin
Matt
Post by Mark Nelson
Hi All,
Just a reminder that the weekly performance meeting is on
Wednesdays
Post by Matt W. Benjamin
Post by Mark Nelson
at
8AM PST!
http://pad.ceph.com/p/performance_weekly
https://bluejeans.com/268261044
https://bluejeans.com/268261044/browser
https://bluejeans.com/268261044/lync
Video Conferencing System: bjn.vc -or- 199.48.152.152
Meeting ID: 268261044
+1 408 740 7256
+1 888 240 2560(US Toll Free)
+1 408 317 9253(Alternate Number)
(see all numbers - http://bluejeans.com/numbers)
2) Enter Conference ID: 268261044
Mark
--
To unsubscribe from this list: send the line "unsubscribe
ceph-devel"
Post by Matt W. Benjamin
Post by Mark Nelson
in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI 48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Zhang, Jian
2014-10-17 05:58:37 UTC
Permalink
Hi,
This is a followup on the OSD wbthrottle slow issue casued by fdatasync discussed on the weekly performance meeting.
For a 4K random write tests (qd=2) with qem-rbd driver and 50 volumes:
* We are able to get ~3100IOPS on our setup with Firefly (0.80.5) default parameter
* We can get ~3200 IOPS on the same setup with if we set wbthrottle = 1
* if we replace fdatasync with sync_file_range, we are albe to get 4457 IOPS, 41% improvement compared with all the default parameter.

Setup configuration:
* 4 OSD nodes (E3-1275 @3.5GHz processor with 32GB memory)
* each attached 10x Seagate 3TB 7200rpm HDD, with 2x Intel 400GB SSD as Journal
* network is 10GB.

Wbthrottle tunings:
filestore_wbthrottle_xfs_ios_start_flusher=1
filestore_wbthrottle_xfs_bytes_start_flusher=1
filestore_wbthrottle_xfs_inodes_start_flusher=1

fdatasync -> sync_file_range:

diff codes_backup/WBThrottle.cc ceph-0.80.5/src/os/WBThrottle.cc
163c163
< #ifdef HAVE_FDATASYNC
---
/*#ifdef HAVE_FDATASYNC
164a165,167
*/
#ifdef HAVE_SYNC_FILE_RANGE
::sync_file_range(**wb.get<1>(), wb.get<2>().offset, wb.get<2>().len, SYNC_FILE_RANGE_WRITE);
210c213
< wbiter->second.first.add(nocache, len, 1);
---
wbiter->second.first.add(nocache, len, 1, offset, len);
[***@a-ceph04 opt]# diff codes_backup/WBThrottle.h ceph-0.80.5/src/os/WBThrottle.h
72a73,74
uint64_t offset;
uint64_t len;
74c76
< void add(bool _nocache, uint64_t _size, uint64_t _ios) {
---
void add(bool _nocache, uint64_t _size, uint64_t _ios, uint64_t _offset, uint64_t _len) {
78a81,82
offset = _offset;
len = _len;
Thanks
Jian
N�����r��y����b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
Loading...