Discussion:
Weekly Ceph Performance Meeting Invitation
Mark Nelson
2014-09-30 19:27:59 UTC
Permalink
Hi All,

I put together a bluejeans meeting for the Ceph performance meeting
tomorrow at 8AM PST. Hope to see you there!

To join the Meeting:
https://bluejeans.com/268261044

To join via Browser:
https://bluejeans.com/268261044/browser

To join with Lync:
https://bluejeans.com/268261044/lync


To join via Room System:
Video Conferencing System: bjn.vc -or- 199.48.152.152
Meeting ID: 268261044

To join via Phone:
1) Dial:
+1 408 740 7256
+1 888 240 2560(US Toll Free)
+1 408 317 9253(Alternate Number)
(see all numbers - http://bluejeans.com/numbers)
2) Enter Conference ID: 268261044

Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Haomai Wang
2014-10-01 07:00:54 UTC
Permalink
Thanks for Mark!

It's a pity that I can't join it while on the flight. I hope I have
time to view the video(exists?).

As a reminder, AsyncMessenger(https://github.com/yuyuyu101/ceph/tree/msg-event-worker-mode)
is ready to test for developer. For io depth 1(4k randwrite),
AsyncMessenger saves 10% latency compared to SimpleMessenger. But I'm
still have no time to deploy a large cluster for performance test
purpose.
Post by Mark Nelson
Hi All,
I put together a bluejeans meeting for the Ceph performance meeting tomorrow
at 8AM PST. Hope to see you there!
https://bluejeans.com/268261044
https://bluejeans.com/268261044/browser
https://bluejeans.com/268261044/lync
Video Conferencing System: bjn.vc -or- 199.48.152.152
Meeting ID: 268261044
+1 408 740 7256
+1 888 240 2560(US Toll Free)
+1 408 317 9253(Alternate Number)
(see all numbers - http://bluejeans.com/numbers)
2) Enter Conference ID: 268261044
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Sage Weil
2014-10-01 16:17:37 UTC
Permalink
Thanks, everyone, for joining! I took some notes during the session in
the etherpad:

http://pad.ceph.com/p/performance_weekly

The session was also recorded, although I'm not sure how we get to that.
:)

I think we covered most of the notes that people had added leading up to
the meeting. Hopefully everyone has a better view of what work in
progress and who is contributing and we can follow up with more detailed
discussions on ceph-devel.

Let's plan on the same time slot next week?

Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Nelson
2014-10-01 16:23:03 UTC
Permalink
Post by Sage Weil
Thanks, everyone, for joining! I took some notes during the session in
http://pad.ceph.com/p/performance_weekly
The session was also recorded, although I'm not sure how we get to that.
:)
Just verified that it worked, I'll send it out in a separate email to
make it easy to find.
Post by Sage Weil
I think we covered most of the notes that people had added leading up to
the meeting. Hopefully everyone has a better view of what work in
progress and who is contributing and we can follow up with more detailed
discussions on ceph-devel.
Let's plan on the same time slot next week?
Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Matt W. Benjamin
2014-10-01 17:33:03 UTC
Permalink
Sorry we missed this one, we had a combination of meeting conflict and a technical problem,
we'll join next week.

Matt
Post by Sage Weil
Thanks, everyone, for joining! I took some notes during the session in
http://pad.ceph.com/p/performance_weekly
--
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI 48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Somnath Roy
2014-10-02 17:39:43 UTC
Permalink
Please share your opinion on this..

-----Original Message-----
From: Sage Weil [mailto:***@redhat.com]
Sent: Wednesday, October 01, 2014 3:57 PM
To: Somnath Roy
Cc: Mark Nelson; Kasper Dieter; Andreas Bluemle; Paul Von-Stamwitz
Subject: RE: Weekly Ceph Performance Meeting Invitation
Yes Sage, it's all read..Each call to lfn_open() will incur this
lookup in case of FDCache miss (which will be in 99% of cases).
The following patch will certainly help the write path (which is
exciting!) but not read as read is not through the transaction path.
My understanding is in the read path per io only two calls are going
to filestore , one xattr ("_") and followed by read to the same
object. If somehow, we can club (or something) this two requests,
reads will be benefitted. I did some prototype earlier by passing the
fd (and path) to the replicated pg during getattr call and pass the
same fd/path during next read. This improving performance as well as
cpu usage. But, this is against the objectstore interface logic :-(
Basically, sole purpose of FDCache for serving this kind of scenario
but since it is sharded based on object hash now (and FDCache itself
is cpu
intensive) it is not helping much. May be sharding based on PG
(Col_id) could help here ?
I suspect a more fruitful approach would be to make a read-side handle-based API for objectstore... so you can 'open' an object, keep that handle to the ObjectContext, and then do subsequent read operations against that.

Sharding the FDCache per PG would help with lock contention, yes, but is that the limiter or are we burning CPU?
Also, I don't think ceph io path is very memory intensive and we can
leverage some memory for cache usage. For example, if we can have a
object_context cache at Replicated PG level (now the cache is there
but the contexts are not persisted), the performance (and cpu
usage)will be improved dramatically. I know that there can be lot of
PGs and thus memory usage can be a challenge. But, certainly we can
control that by limiting per cache size and what not. What could be
the size of an object_context instance, shouldn't be much I guess. I
did some prototyping on that too and got significant improvement. This
will eliminate the getattr path in case of cache hit.
Can you propose this on ceph-devel? I think this is promising. And probably quite easy to implement.
Another challenge for read ( and for write too probably) is the
sequential io in case of rbd . With the Linux default read_ahead ,
performance of sequential read is significantly less than random read
with latest code in case of io_size say 64K. The obvious reason is
that with rbd, the default object size being 4MB, lot of sequential
64K reads are coming to same PG and getting bottlenecked there.
Increasing read_ahead size improving performance but that will have an
effect in random workload. I think PG level cache should help here.
Striped images from librbd will not be facing this problem I guess but
krbd is not supporting striping and it is definitely a problem there.
I still think the key here is a comprehensive set of IO hints. Then it's a problem of making sure we are using them effectively...
We can discuss these in next meeting if this sounds interesting.
Yeah, but let's discuss on list first, no reason to wait!

s
Thanks & Regards
Somnath
-----Original Message-----
Sent: Wednesday, October 01, 2014 1:14 PM
To: Somnath Roy
Cc: Mark Nelson; Kasper Dieter; Andreas Bluemle; Paul Von-Stamwitz
Subject: RE: Weekly Ceph Performance Meeting Invitation
CPU wise the following are still hurting us in Giant. Lot of fixes
like IndexManager stuff went in Giant that helped cpu consumption
wise as well.
1. LFNIndex lookup logic . I have a fix that will save around one
cpu core on that path. I am yet to address comments made by Greg/Sam
on that. But, lot of improvement can happen here.
Have you looked at
https://github.com/ceph/ceph/commit/74b1cf8bf1a7a160e6ce14603df63a46b2
2d8b98
The patch is incomplete, but with that change we should be able to drop to a single path lookup per ObjectStore::Transaction (as opposed to one for each op in the transaction that touches the given object). I'm not sure if you were looking at ops that had a lot of those or they were simple single-io type operations? That would only help on the write path; I think you said you've been focusing in reads.
2. Buffer class is very cpu intensive. Fixing that part will be
helping every ceph components.
+1
sage
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Haomai Wang
2014-10-03 05:28:14 UTC
Permalink
Post by Somnath Roy
Please share your opinion on this..
-----Original Message-----
Sent: Wednesday, October 01, 2014 3:57 PM
To: Somnath Roy
Cc: Mark Nelson; Kasper Dieter; Andreas Bluemle; Paul Von-Stamwitz
Subject: RE: Weekly Ceph Performance Meeting Invitation
Yes Sage, it's all read..Each call to lfn_open() will incur this
lookup in case of FDCache miss (which will be in 99% of cases).
The following patch will certainly help the write path (which is
exciting!) but not read as read is not through the transaction path.
My understanding is in the read path per io only two calls are going
to filestore , one xattr ("_") and followed by read to the same
object. If somehow, we can club (or something) this two requests,
reads will be benefitted. I did some prototype earlier by passing the
fd (and path) to the replicated pg during getattr call and pass the
same fd/path during next read. This improving performance as well as
cpu usage. But, this is against the objectstore interface logic :-(
Basically, sole purpose of FDCache for serving this kind of scenario
but since it is sharded based on object hash now (and FDCache itself
is cpu
intensive) it is not helping much. May be sharding based on PG
(Col_id) could help here ?
I suspect a more fruitful approach would be to make a read-side handle-based API for objectstore... so you can 'open' an object, keep that handle to the ObjectContext, and then do subsequent read operations against that.
Sharding the FDCache per PG would help with lock contention, yes, but is that the limiter or are we burning CPU?
Also, I don't think ceph io path is very memory intensive and we can
leverage some memory for cache usage. For example, if we can have a
object_context cache at Replicated PG level (now the cache is there
but the contexts are not persisted), the performance (and cpu
usage)will be improved dramatically. I know that there can be lot of
PGs and thus memory usage can be a challenge. But, certainly we can
control that by limiting per cache size and what not. What could be
the size of an object_context instance, shouldn't be much I guess. I
did some prototyping on that too and got significant improvement. This
will eliminate the getattr path in case of cache hit.
Can you propose this on ceph-devel? I think this is promising. And probably quite easy to implement.
Yes, we have done this impl and it can help reduce nearly 100us for
each IO if hit cache. We will make a pull request next week. :-)
Post by Somnath Roy
Another challenge for read ( and for write too probably) is the
sequential io in case of rbd . With the Linux default read_ahead ,
performance of sequential read is significantly less than random read
with latest code in case of io_size say 64K. The obvious reason is
that with rbd, the default object size being 4MB, lot of sequential
64K reads are coming to same PG and getting bottlenecked there.
Increasing read_ahead size improving performance but that will have an
effect in random workload. I think PG level cache should help here.
Striped images from librbd will not be facing this problem I guess but
krbd is not supporting striping and it is definitely a problem there.
I still think the key here is a comprehensive set of IO hints. Then it's a problem of making sure we are using them effectively...
We can discuss these in next meeting if this sounds interesting.
Yeah, but let's discuss on list first, no reason to wait!
s
Thanks & Regards
Somnath
-----Original Message-----
Sent: Wednesday, October 01, 2014 1:14 PM
To: Somnath Roy
Cc: Mark Nelson; Kasper Dieter; Andreas Bluemle; Paul Von-Stamwitz
Subject: RE: Weekly Ceph Performance Meeting Invitation
CPU wise the following are still hurting us in Giant. Lot of fixes
like IndexManager stuff went in Giant that helped cpu consumption
wise as well.
1. LFNIndex lookup logic . I have a fix that will save around one
cpu core on that path. I am yet to address comments made by Greg/Sam
on that. But, lot of improvement can happen here.
Have you looked at
https://github.com/ceph/ceph/commit/74b1cf8bf1a7a160e6ce14603df63a46b2
2d8b98
The patch is incomplete, but with that change we should be able to drop to a single path lookup per ObjectStore::Transaction (as opposed to one for each op in the transaction that touches the given object). I'm not sure if you were looking at ops that had a lot of those or they were simple single-io type operations? That would only help on the write path; I think you said you've been focusing in reads.
2. Buffer class is very cpu intensive. Fixing that part will be
helping every ceph components.
+1
sage
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Somnath Roy
2014-10-03 22:17:46 UTC
Permalink
That's great Haomai, looking forward to this pull request.

Thanks & Regards
Somnath

-----Original Message-----
From: Haomai Wang [mailto:***@gmail.com]
Sent: Thursday, October 02, 2014 10:28 PM
To: Somnath Roy
Cc: ceph-devel
Subject: Re: FW: Weekly Ceph Performance Meeting Invitation
Post by Somnath Roy
Please share your opinion on this..
-----Original Message-----
Sent: Wednesday, October 01, 2014 3:57 PM
To: Somnath Roy
Cc: Mark Nelson; Kasper Dieter; Andreas Bluemle; Paul Von-Stamwitz
Subject: RE: Weekly Ceph Performance Meeting Invitation
Yes Sage, it's all read..Each call to lfn_open() will incur this
lookup in case of FDCache miss (which will be in 99% of cases).
The following patch will certainly help the write path (which is
exciting!) but not read as read is not through the transaction path.
My understanding is in the read path per io only two calls are going
to filestore , one xattr ("_") and followed by read to the same
object. If somehow, we can club (or something) this two requests,
reads will be benefitted. I did some prototype earlier by passing the
fd (and path) to the replicated pg during getattr call and pass the
same fd/path during next read. This improving performance as well as
cpu usage. But, this is against the objectstore interface logic :-(
Basically, sole purpose of FDCache for serving this kind of scenario
but since it is sharded based on object hash now (and FDCache itself
is cpu
intensive) it is not helping much. May be sharding based on PG
(Col_id) could help here ?
I suspect a more fruitful approach would be to make a read-side handle-based API for objectstore... so you can 'open' an object, keep that handle to the ObjectContext, and then do subsequent read operations against that.
Sharding the FDCache per PG would help with lock contention, yes, but is that the limiter or are we burning CPU?
Also, I don't think ceph io path is very memory intensive and we can
leverage some memory for cache usage. For example, if we can have a
object_context cache at Replicated PG level (now the cache is there
but the contexts are not persisted), the performance (and cpu
usage)will be improved dramatically. I know that there can be lot of
PGs and thus memory usage can be a challenge. But, certainly we can
control that by limiting per cache size and what not. What could be
the size of an object_context instance, shouldn't be much I guess. I
did some prototyping on that too and got significant improvement.
This will eliminate the getattr path in case of cache hit.
Can you propose this on ceph-devel? I think this is promising. And probably quite easy to implement.
Yes, we have done this impl and it can help reduce nearly 100us for each IO if hit cache. We will make a pull request next week. :-)
Post by Somnath Roy
Another challenge for read ( and for write too probably) is the
sequential io in case of rbd . With the Linux default read_ahead ,
performance of sequential read is significantly less than random read
with latest code in case of io_size say 64K. The obvious reason is
that with rbd, the default object size being 4MB, lot of sequential
64K reads are coming to same PG and getting bottlenecked there.
Increasing read_ahead size improving performance but that will have
an effect in random workload. I think PG level cache should help here.
Striped images from librbd will not be facing this problem I guess
but krbd is not supporting striping and it is definitely a problem there.
I still think the key here is a comprehensive set of IO hints. Then it's a problem of making sure we are using them effectively...
We can discuss these in next meeting if this sounds interesting.
Yeah, but let's discuss on list first, no reason to wait!
s
Thanks & Regards
Somnath
-----Original Message-----
Sent: Wednesday, October 01, 2014 1:14 PM
To: Somnath Roy
Cc: Mark Nelson; Kasper Dieter; Andreas Bluemle; Paul Von-Stamwitz
Subject: RE: Weekly Ceph Performance Meeting Invitation
CPU wise the following are still hurting us in Giant. Lot of fixes
like IndexManager stuff went in Giant that helped cpu consumption
wise as well.
1. LFNIndex lookup logic . I have a fix that will save around one
cpu core on that path. I am yet to address comments made by
Greg/Sam on that. But, lot of improvement can happen here.
Have you looked at
https://github.com/ceph/ceph/commit/74b1cf8bf1a7a160e6ce14603df63a46b
2
2d8b98
The patch is incomplete, but with that change we should be able to drop to a single path lookup per ObjectStore::Transaction (as opposed to one for each op in the transaction that touches the given object). I'm not sure if you were looking at ops that had a lot of those or they were simple single-io type operations? That would only help on the write path; I think you said you've been focusing in reads.
2. Buffer class is very cpu intensive. Fixing that part will be
helping every ceph components.
+1
sage
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
info at http://vger.kernel.org/majordomo-info.html
--
Best Regards,

Wheat
��칻�&�~�&���+-��ݶ��w��˛���m��^��b��^n�r���z���h�����&���G���h�
Loading...