Discussion:
Weekly performance meeting
Sage Weil
2014-09-25 18:27:00 UTC
Permalink
Hi everyone,

A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.

Ideally, we'd like to move as much of this dicussion into the public
forums: ceph-***@vger.kernel.org and #ceph-devel. That isn't always
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.

Among other things, we can discuss:

- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward

One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's

8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)

That is surely not the ideal time for everyone but it can hopefully be a
starting point.

I've also created an etherpad for collecting discussion/agenda items at

http://pad.ceph.com/p/performance_weekly

Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad above
with the topics you would like to discuss.

Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Matt W. Benjamin
2014-09-25 19:03:14 UTC
Permalink
Hi Sage,

Great idea, we'd certainly be interested in attending.

We have a variety of potentially relevant work, some (but not all) at least tangentially related to RDMA. Our attention is necessarily split between upstream and internal branches (that currently don't have CRUSH, PGs, etc), but we'd like to upstream as much generally useful work as we can.

Thanks,

Matt
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
--
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI 48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Nelson
2014-09-25 19:30:50 UTC
Permalink
I'll be there too, and 100% agree with you Kasper! :)

Mark
Post by Matt W. Benjamin
Hi Sage,
I'm definitely interested in joining this weekly call starting Oct 1st.
Thanks for this initiative!
- how can we reduce the number of threads in the system
-- including to avoid the context switches in between
-- including to avoid the queues and locks in between
- how we can reduce the number of lines of code
-- including the multiple system calls for each IO
- how we can introduce a high efficient timestamp collection of the most important FN check-points
(see for example the attached file)
to measure the change and effect of our actions
Best Regards,
-Dieter
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully be a
starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad above
with the topics you would like to discuss.
Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Nelson
2014-09-25 19:31:44 UTC
Permalink
Oops, replied to fast and meant to put Dieter! :)

Mark
Post by Mark Nelson
I'll be there too, and 100% agree with you Kasper! :)
Mark
Post by Matt W. Benjamin
Hi Sage,
I'm definitely interested in joining this weekly call starting Oct 1st.
Thanks for this initiative!
- how can we reduce the number of threads in the system
-- including to avoid the context switches in between
-- including to avoid the queues and locks in between
- how we can reduce the number of lines of code
-- including the multiple system calls for each IO
- how we can introduce a high efficient timestamp collection of the
most important FN check-points
(see for example the attached file)
to measure the change and effect of our actions
Best Regards,
-Dieter
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully be a
starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad above
with the topics you would like to discuss.
Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Sage Weil
2014-09-25 19:41:37 UTC
Permalink
Hi Dieter,
Post by Matt W. Benjamin
Hi Sage,
I'm definitely interested in joining this weekly call starting Oct 1st.
Thanks for this initiative!
Great! Please add these notes to

http://pad.ceph.com/p/performance_weekly
Post by Matt W. Benjamin
- how can we reduce the number of threads in the system
-- including to avoid the context switches in between
-- including to avoid the queues and locks in between
You should take a look at Haomai's AsyncMessenger implementation he posted
a few weeks back. I'm not sure how much testing it's seen, but it
essentially refactors SimpleMessenger into a state machine and uses
libevent to schedule work.
Post by Matt W. Benjamin
- how we can reduce the number of lines of code
-- including the multiple system calls for each IO
I have some planned changes to ObjectStore::Transaction that make it
handle-based. This will make it easier to cache and avoid lots of dup
lookups (and lay some of the groundwork for the proposed KeyFileStore).
Post by Matt W. Benjamin
- how we can introduce a high efficient timestamp collection of the most important FN check-points
(see for example the attached file)
to measure the change and effect of our actions
I think most of us are looking to LTTng or systemtap for this. You should
also check out the thread '?nstrumenting RADOS with Zipkin + LTTng' from
several weeks ago for a pretty promising tracing strategy.

sage
Post by Matt W. Benjamin
Best Regards,
-Dieter
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully be a
starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad above
with the topics you would like to discuss.
Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Somnath Roy
2014-09-25 20:20:05 UTC
Permalink
Sage,
It will be helpful, I am planning to attend too.

Thanks & Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:***@redhat.com]
Sent: Thursday, September 25, 2014 12:42 PM
To: Kasper Dieter
Cc: ceph-***@vger.kernel.org; Somnath Roy; Andreas Bluemle; Allen Samuels; ***@us.fujitsu.com; ***@intel.com; ***@gmail.com; ***@profihost.ag; ***@intel.com; ***@adfin.com; ***@intel.com; ***@intel.com; ***@mellanox.com; ***@mellanox.com; ***@inktank.com
Subject: Re: Weekly performance meeting

Hi Dieter,
Post by Matt W. Benjamin
Hi Sage,
I'm definitely interested in joining this weekly call starting Oct 1st.
Thanks for this initiative!
Great! Please add these notes to

http://pad.ceph.com/p/performance_weekly
Post by Matt W. Benjamin
- how can we reduce the number of threads in the system
-- including to avoid the context switches in between
-- including to avoid the queues and locks in between
You should take a look at Haomai's AsyncMessenger implementation he posted a few weeks back. I'm not sure how much testing it's seen, but it essentially refactors SimpleMessenger into a state machine and uses libevent to schedule work.
Post by Matt W. Benjamin
- how we can reduce the number of lines of code
-- including the multiple system calls for each IO
I have some planned changes to ObjectStore::Transaction that make it handle-based. This will make it easier to cache and avoid lots of dup lookups (and lay some of the groundwork for the proposed KeyFileStore).
Post by Matt W. Benjamin
- how we can introduce a high efficient timestamp collection of the most important FN check-points
(see for example the attached file)
to measure the change and effect of our actions
I think most of us are looking to LTTng or systemtap for this. You should also check out the thread '?nstrumenting RADOS with Zipkin + LTTng' from several weeks ago for a pretty promising tracing strategy.

sage
Post by Matt W. Benjamin
Best Regards,
-Dieter
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved
with the current work on improving performance and how to better
coordinate with other interested parties. A few meetings have taken
place offline with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
always sufficient, however. I'd like to also set up a regular
weekly meeting using google hangouts or bluejeans so that all
interested parties can share progress. There are a lot of things we
can do during the Hammer cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China,
Europe, and Israel who may want to join. As a starting point, how
about next Wednesday, 15:00 UTC? If I didn't do my tz math wrong,
that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully
be a starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are
actively working in this area and/or would like to join, and update
the pad above with the topics you would like to discuss.
Thanks!
sage
________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Vu Pham
2014-09-25 21:15:41 UTC
Permalink
Hi Sage,

Thanks for this initiative.
I'll definitely attend this weekly call.

I'd like to discuss on how to make XioMessenger/rdma being useful and
improving performance

thanks
-vu
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully be a
starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad above
with the topics you would like to discuss.
Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Dror Goldenberg
2014-09-26 06:42:37 UTC
Permalink
Great initiative Sage.
I will attend as well.

-Dror

Dror Goldenberg |VP Software Architecture| Mellanox Technologies Ltd.
Work: +972 74 7237324 | Cell +972 54 4478308 |Fax: +972 4 959 3245




-----Original Message-----
From: Vu Pham
Sent: Friday, September 26, 2014 12:16 AM
To: Sage Weil
Cc: ceph-***@vger.kernel.org; ***@sandisk.com; ***@sandisk.com; ***@ts.fujitsu.com; ***@us.fujitsu.com; ***@intel.com; ***@gmail.com; ***@profihost.ag; ***@intel.com; ***@adfin.com; ***@intel.com; ***@intel.com; Dror Goldenberg; ***@inktank.com
Subject: Re: Weekly performance meeting

Hi Sage,

Thanks for this initiative.
I'll definitely attend this weekly call.

I'd like to discuss on how to make XioMessenger/rdma being useful and improving performance

thanks
-vu
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved
with the current work on improving performance and how to better
coordinate with other interested parties. A few meetings have taken
place offline with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China,
Europe, and Israel who may want to join. As a starting point, how
about next Wednesday, 15:00 UTC? If I didn't do my tz math wrong,
that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully be
a starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad
above with the topics you would like to discuss.
Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Paul Von-Stamwitz
2014-09-26 01:50:32 UTC
Permalink
Thanks, Sage.

I'm in, too.

Paul

-----Original Message-----
From: Sage Weil [mailto:***@redhat.com]
Sent: Thursday, September 25, 2014 11:27 AM
To: ceph-***@vger.kernel.org
Cc: ***@sandisk.com; ***@sandisk.com; ***@ts.fujitsu.com; Paul Von-Stamwitz; ***@intel.com; ***@gmail.com; ***@profihost.ag; ***@intel.com; ***@adfin.com; ***@intel.com; ***@intel.com; ***@mellanox.com; ***@mellanox.com; ***@inktank.com
Subject: Weekly performance meeting

Hi everyone,

A number of people have approached me about how to get more involved with the current work on improving performance and how to better coordinate with other interested parties. A few meetings have taken place offline with good results but only a few interested parties were involved.

Ideally, we'd like to move as much of this dicussion into the public
forums: ceph-***@vger.kernel.org and #ceph-devel. That isn't always sufficient, however. I'd like to also set up a regular weekly meeting using google hangouts or bluejeans so that all interested parties can share progress. There are a lot of things we can do during the Hammer cycle to improve things but it will require some coordination of effort.

Among other things, we can discuss:

- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward

One challenge is timezones: there are developers in the US, China, Europe, and Israel who may want to join. As a starting point, how about next Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's

8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)

That is surely not the ideal time for everyone but it can hopefully be a starting point.

I've also created an etherpad for collecting discussion/agenda items at

http://pad.ceph.com/p/performance_weekly

Is there interest here? Please let everyone know if you are actively working in this area and/or would like to join, and update the pad above with the topics you would like to discuss.

Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Haomai Wang
2014-09-26 02:27:46 UTC
Permalink
Thanks for sage!

I'm on the flight at Oct 1. :-(

Now my team is mainly worked on the performance of ceph, we have
observed these points:

1. encode/decode plays remarkable latency, especially in
ObjectStore::Transaction. I'm urgen in refactor ObjectStore API to
avoid encode/decode codes. It seemed has be signed in note(- remove
serialization from ObjectStore::Transaction (ymmv))
2. obvious latency for threadpool/workqueue model. Do we consider to
impl performance optimization workqueue to replace existing critical
workqueue such as op_wq in OSD.h and op_wq in FileStore.h. Now in my
AsyncMessenger impl, I will try to use custom and simple workqueue
impl to improve performance.
3. Large lock in client library such as ObjectCacher
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully be a
starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad above
with the topics you would like to discuss.
Thanks!
sage
--
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Haomai Wang
2014-09-26 02:45:38 UTC
Permalink
Some detail optimization points:

1. FileStore/KeyValueStore worker threads will complete with a global
object("meta" collection, "infos" oid) which is used only by omap_*
methods. (https://github.com/ceph/ceph/pull/2502)
2. Sparse recovery when using fiemap(https://github.com/ceph/ceph/pull/2137)
Post by Haomai Wang
Thanks for sage!
I'm on the flight at Oct 1. :-(
Now my team is mainly worked on the performance of ceph, we have
1. encode/decode plays remarkable latency, especially in
ObjectStore::Transaction. I'm urgen in refactor ObjectStore API to
avoid encode/decode codes. It seemed has be signed in note(- remove
serialization from ObjectStore::Transaction (ymmv))
2. obvious latency for threadpool/workqueue model. Do we consider to
impl performance optimization workqueue to replace existing critical
workqueue such as op_wq in OSD.h and op_wq in FileStore.h. Now in my
AsyncMessenger impl, I will try to use custom and simple workqueue
impl to improve performance.
3. Large lock in client library such as ObjectCacher
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully be a
starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad above
with the topics you would like to discuss.
Thanks!
sage
--
Best Regards,
Wheat
--
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Dong Yuan
2014-09-26 06:30:54 UTC
Permalink
Some data can support Haomai's points.
Post by Haomai Wang
1. encode/decode plays remarkable latency, especially in
ObjectStore::Transaction. I'm urgen in refactor ObjectStore API to
avoid encode/decode codes. It seemed has be signed in note(- remove
serialization from ObjectStore::Transaction (ymmv))
My environment, single OSD on a single SSD with filestore_blackhole = true.

With All transaction encode, 10000 4K WriteFull operations by single
thread need about 14.3s. While without transaction encode, the same
test can be finished in about 11.5s.

Considering the FileStore needs to decode the bufferlist too,
encode/decode cost more than 20% time!

Oprofile results can validate this problem too: methods used by
encode/decode sometimes take 9 of the top 10.
Post by Haomai Wang
2. obvious latency for threadpool/workqueue model. Do we consider to
impl performance optimization workqueue to replace existing critical
workqueue such as op_wq in OSD.h and op_wq in FileStore.h. Now in my
AsyncMessenger impl, I will try to use custom and simple workqueue
impl to improve performance.
When I analyze the latency of a 4K object WriteFull operation, I put
static probes into codes to measure times used by OpWQ. I test 10000
4K object WriteFull operations and average the results.

I found it spends 158us for the OpWQ for each IO, including 30us to
enqueue, 108us in the queue, and 20us to dequeue. It takes more than
20% time of PG layer (not including msg and os layer) when encode is
ignored.

Maybe a more effective ThreadPool/WorkQueue Model is needed or at
least some improvement for WorkQueues in the IO path to reduce the
latency.
Post by Haomai Wang
Thanks for sage!
I'm on the flight at Oct 1. :-(
Now my team is mainly worked on the performance of ceph, we have
1. encode/decode plays remarkable latency, especially in
ObjectStore::Transaction. I'm urgen in refactor ObjectStore API to
avoid encode/decode codes. It seemed has be signed in note(- remove
serialization from ObjectStore::Transaction (ymmv))
2. obvious latency for threadpool/workqueue model. Do we consider to
impl performance optimization workqueue to replace existing critical
workqueue such as op_wq in OSD.h and op_wq in FileStore.h. Now in my
AsyncMessenger impl, I will try to use custom and simple workqueue
impl to improve performance.
3. Large lock in client library such as ObjectCacher
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully be a
starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad above
with the topics you would like to discuss.
Thanks!
sage
--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Dong Yuan
Email:***@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Somnath Roy
2014-09-26 06:40:44 UTC
Permalink
Haomai/Dong
Have you tried this with latest shardedpool/WQ model which is already in the Giant branch ?
IOS will be going with this path in the latest code not with op_wq.
Yes, we also saw encode/decode was consuming lot of cpu times and if I remember correctly profiler was pointing bufferlist::append in many of such cases.

Thanks & Regards
Somnath

-----Original Message-----
From: Dong Yuan [mailto:***@gmail.com]
Sent: Thursday, September 25, 2014 11:31 PM
To: Haomai Wang
Cc: Sage Weil; ceph-***@vger.kernel.org; Somnath Roy; Allen Samuels; ***@ts.fujitsu.com; ***@us.fujitsu.com; Shu, Xinxin; Stefan Priebe - Profihost AG; ***@intel.com; Milosz Tanski; ***@intel.com; ***@intel.com; ***@mellanox.com; ***@mellanox.com; Mark Nelson
Subject: Re: Weekly performance meeting

Some data can support Haomai's points.
Post by Haomai Wang
1. encode/decode plays remarkable latency, especially in
ObjectStore::Transaction. I'm urgen in refactor ObjectStore API to
avoid encode/decode codes. It seemed has be signed in note(- remove
serialization from ObjectStore::Transaction (ymmv))
My environment, single OSD on a single SSD with filestore_blackhole = true.

With All transaction encode, 10000 4K WriteFull operations by single thread need about 14.3s. While without transaction encode, the same test can be finished in about 11.5s.

Considering the FileStore needs to decode the bufferlist too, encode/decode cost more than 20% time!

Oprofile results can validate this problem too: methods used by encode/decode sometimes take 9 of the top 10.
Post by Haomai Wang
2. obvious latency for threadpool/workqueue model. Do we consider to
impl performance optimization workqueue to replace existing critical
workqueue such as op_wq in OSD.h and op_wq in FileStore.h. Now in my
AsyncMessenger impl, I will try to use custom and simple workqueue
impl to improve performance.
When I analyze the latency of a 4K object WriteFull operation, I put static probes into codes to measure times used by OpWQ. I test 10000 4K object WriteFull operations and average the results.

I found it spends 158us for the OpWQ for each IO, including 30us to enqueue, 108us in the queue, and 20us to dequeue. It takes more than 20% time of PG layer (not including msg and os layer) when encode is ignored.

Maybe a more effective ThreadPool/WorkQueue Model is needed or at least some improvement for WorkQueues in the IO path to reduce the latency.
Post by Haomai Wang
Thanks for sage!
I'm on the flight at Oct 1. :-(
Now my team is mainly worked on the performance of ceph, we have
1. encode/decode plays remarkable latency, especially in
ObjectStore::Transaction. I'm urgen in refactor ObjectStore API to
avoid encode/decode codes. It seemed has be signed in note(- remove
serialization from ObjectStore::Transaction (ymmv)) 2. obvious latency
for threadpool/workqueue model. Do we consider to impl performance
optimization workqueue to replace existing critical workqueue such as
op_wq in OSD.h and op_wq in FileStore.h. Now in my AsyncMessenger
impl, I will try to use custom and simple workqueue impl to improve
performance.
3. Large lock in client library such as ObjectCacher
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved
with the current work on improving performance and how to better
coordinate with other interested parties. A few meetings have taken
place offline with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
always sufficient, however. I'd like to also set up a regular weekly
meeting using google hangouts or bluejeans so that all interested
parties can share progress. There are a lot of things we can do
during the Hammer cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China,
Europe, and Israel who may want to join. As a starting point, how
about next Wednesday, 15:00 UTC? If I didn't do my tz math wrong,
that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully
be a starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad
above with the topics you would like to discuss.
Thanks!
sage
--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
info at http://vger.kernel.org/majordomo-info.html
--
Dong Yuan
Email:***@gmail.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

N�����r��y����b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
Dong Yuan
2014-09-26 07:20:06 UTC
Permalink
Post by Somnath Roy
Have you tried this with latest shardedpool/WQ model which is already in the Giant branch ?
IOS will be going with this path in the latest code not with op_wq.
Not yet. I mainly work on firefly now.

I think I will try it when I have time and I will give the report. :)
Post by Somnath Roy
Yes, we also saw encode/decode was consuming lot of cpu times and if I remember correctly profiler was pointing bufferlist::append in many of such cases.
As right as my glove. :)
Post by Somnath Roy
Haomai/Dong
Have you tried this with latest shardedpool/WQ model which is already in the Giant branch ?
IOS will be going with this path in the latest code not with op_wq.
Yes, we also saw encode/decode was consuming lot of cpu times and if I remember correctly profiler was pointing bufferlist::append in many of such cases.
Thanks & Regards
Somnath
-----Original Message-----
Sent: Thursday, September 25, 2014 11:31 PM
To: Haomai Wang
Subject: Re: Weekly performance meeting
Some data can support Haomai's points.
Post by Haomai Wang
1. encode/decode plays remarkable latency, especially in
ObjectStore::Transaction. I'm urgen in refactor ObjectStore API to
avoid encode/decode codes. It seemed has be signed in note(- remove
serialization from ObjectStore::Transaction (ymmv))
My environment, single OSD on a single SSD with filestore_blackhole = true.
With All transaction encode, 10000 4K WriteFull operations by single thread need about 14.3s. While without transaction encode, the same test can be finished in about 11.5s.
Considering the FileStore needs to decode the bufferlist too, encode/decode cost more than 20% time!
Oprofile results can validate this problem too: methods used by encode/decode sometimes take 9 of the top 10.
Post by Haomai Wang
2. obvious latency for threadpool/workqueue model. Do we consider to
impl performance optimization workqueue to replace existing critical
workqueue such as op_wq in OSD.h and op_wq in FileStore.h. Now in my
AsyncMessenger impl, I will try to use custom and simple workqueue
impl to improve performance.
When I analyze the latency of a 4K object WriteFull operation, I put static probes into codes to measure times used by OpWQ. I test 10000 4K object WriteFull operations and average the results.
I found it spends 158us for the OpWQ for each IO, including 30us to enqueue, 108us in the queue, and 20us to dequeue. It takes more than 20% time of PG layer (not including msg and os layer) when encode is ignored.
Maybe a more effective ThreadPool/WorkQueue Model is needed or at least some improvement for WorkQueues in the IO path to reduce the latency.
Post by Haomai Wang
Thanks for sage!
I'm on the flight at Oct 1. :-(
Now my team is mainly worked on the performance of ceph, we have
1. encode/decode plays remarkable latency, especially in
ObjectStore::Transaction. I'm urgen in refactor ObjectStore API to
avoid encode/decode codes. It seemed has be signed in note(- remove
serialization from ObjectStore::Transaction (ymmv)) 2. obvious latency
for threadpool/workqueue model. Do we consider to impl performance
optimization workqueue to replace existing critical workqueue such as
op_wq in OSD.h and op_wq in FileStore.h. Now in my AsyncMessenger
impl, I will try to use custom and simple workqueue impl to improve
performance.
3. Large lock in client library such as ObjectCacher
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved
with the current work on improving performance and how to better
coordinate with other interested parties. A few meetings have taken
place offline with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
always sufficient, however. I'd like to also set up a regular weekly
meeting using google hangouts or bluejeans so that all interested
parties can share progress. There are a lot of things we can do
during the Hammer cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China,
Europe, and Israel who may want to join. As a starting point, how
about next Wednesday, 15:00 UTC? If I didn't do my tz math wrong,
that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully
be a starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad
above with the topics you would like to discuss.
Thanks!
sage
--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
info at http://vger.kernel.org/majordomo-info.html
--
Dong Yuan
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
Dong Yuan
Email:***@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Milosz Tanski
2014-09-26 12:58:56 UTC
Permalink
Post by Dong Yuan
Some data can support Haomai's points.
Post by Haomai Wang
1. encode/decode plays remarkable latency, especially in
ObjectStore::Transaction. I'm urgen in refactor ObjectStore API to
avoid encode/decode codes. It seemed has be signed in note(- remove
serialization from ObjectStore::Transaction (ymmv))
My environment, single OSD on a single SSD with filestore_blackhole = true.
With All transaction encode, 10000 4K WriteFull operations by single
thread need about 14.3s. While without transaction encode, the same
test can be finished in about 11.5s.
Considering the FileStore needs to decode the bufferlist too,
encode/decode cost more than 20% time!
Oprofile results can validate this problem too: methods used by
encode/decode sometimes take 9 of the top 10.
Post by Haomai Wang
2. obvious latency for threadpool/workqueue model. Do we consider to
impl performance optimization workqueue to replace existing critical
workqueue such as op_wq in OSD.h and op_wq in FileStore.h. Now in my
AsyncMessenger impl, I will try to use custom and simple workqueue
impl to improve performance.
When I analyze the latency of a 4K object WriteFull operation, I put
static probes into codes to measure times used by OpWQ. I test 10000
4K object WriteFull operations and average the results.
I found it spends 158us for the OpWQ for each IO, including 30us to
enqueue, 108us in the queue, and 20us to dequeue. It takes more than
20% time of PG layer (not including msg and os layer) when encode is
ignored.
Maybe a more effective ThreadPool/WorkQueue Model is needed or at
least some improvement for WorkQueues in the IO path to reduce the
latency.
There's a number of things here. I haven't look at the code in Giant
so take my statements here with a grain of salt.

First, I have recently submitted a series of patches to kernel to add
a new preadv2 syscall that lets you do a "fast read" out of the page
cache the point being that you can skip the whole disk IO queue in
user space in the cases it's already cached (thus reducing the
latency). Obviously this doesn't do much for writes (yet, Christoph
Heldwig is working on that). Samba expressed an interest using these
new syscalls as well.

LWN article about it: http://lwn.net/Articles/612483/
Here's the latest patch:
http://thread.gmane.org/gmane.linux.kernel.aio.general/4306
The architecture that would benefit from "fast reads":
Loading Image...
Previous version of the patch (mostly because there was a lot more
conversation there): https://lkml.org/lkml/2014/9/17/671

Second, when you have a very fast SSD device that can do up to 100k
iops the naive queueing/thread pool implementation becomes an issue.
Like you mentioned it's a lot of extra latency. The solution for this
is not easy and thankfully people have done lots of research work for
you, you'll still need lots of trial an error to get it figured out.
Here's some common strategies:

You're going to consider your queue. Obviously you're going to want to
get away from a single crude mutex. First is multiple queues each with
mutex / non-locking queues? Then if you choose non-locking how are you
going to build your queueing system. Is it going to be a single MPMC
queue (slowest), MPSC (faster, but things will get stuck behind slow
requests), MPSC with work stealing (complicated) or FastFlow style
network of SPSC (needs arbiter thread).
- How do you handle empty queue? Spin with fallback reduces latency
but it does waste CPU cycles which could be used by a different OSD
process / EC decoding.
- Eventcount versus Seamphore (for blocking / notification) after all
you don't want to spin forever. You really want an Eventcount since
you don't want to have have a mutex with you're semaphore (since
that's what you tried getting rid of). Here you get into platform
specific implementations (futexes).
- If the queue has priorities, is it okay if our priorities aren't
perfectly enforced? In general this really complicates things and you
pretty much best off having a FastFlow like queue so your arbiter
thread can do some kind of prioritization.
Post by Dong Yuan
Post by Haomai Wang
Thanks for sage!
I'm on the flight at Oct 1. :-(
Now my team is mainly worked on the performance of ceph, we have
1. encode/decode plays remarkable latency, especially in
ObjectStore::Transaction. I'm urgen in refactor ObjectStore API to
avoid encode/decode codes. It seemed has be signed in note(- remove
serialization from ObjectStore::Transaction (ymmv))
2. obvious latency for threadpool/workqueue model. Do we consider to
impl performance optimization workqueue to replace existing critical
workqueue such as op_wq in OSD.h and op_wq in FileStore.h. Now in my
AsyncMessenger impl, I will try to use custom and simple workqueue
impl to improve performance.
3. Large lock in client library such as ObjectCacher
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
I'd love to participate and contributed to the discussion and solution
but due to my obligations it's hard to commit to a weekly time so it's
my hope that a lot of this is done on the mailing list.
Post by Dong Yuan
Post by Haomai Wang
Post by Sage Weil
That is surely not the ideal time for everyone but it can hopefully be a
starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad above
with the topics you would like to discuss.
Thanks!
sage
--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Dong Yuan
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: ***@adfin.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig
2014-09-26 13:02:39 UTC
Permalink
Post by Milosz Tanski
First, I have recently submitted a series of patches to kernel to add
a new preadv2 syscall that lets you do a "fast read" out of the page
cache the point being that you can skip the whole disk IO queue in
user space in the cases it's already cached (thus reducing the
latency). Obviously this doesn't do much for writes (yet, Christoph
Heldwig is working on that). Samba expressed an interest using these
new syscalls as well.
We could also implement it for writes, but if would be a bit more
complicated. If there is a compelling use case it might be worth
exploring.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Sage Weil
2014-09-26 15:37:56 UTC
Permalink
Post by Milosz Tanski
Post by Sage Weil
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
I'd love to participate and contributed to the discussion and solution
but due to my obligations it's hard to commit to a weekly time so it's
my hope that a lot of this is done on the mailing list.
I agree. One thing that has quickly become clear is that there are a
*lot* of areas to address and we can't possibly discuss them all in any
detail in a single meeting. I think the key will be to break things down
into specific areas of investigation that can be discussed in detail
on-list, and to use the meeting to coordinate activities, share latest
results, and so forth.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Nelson
2014-09-26 15:57:26 UTC
Permalink
Post by Sage Weil
Post by Milosz Tanski
Post by Sage Weil
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
I'd love to participate and contributed to the discussion and solution
but due to my obligations it's hard to commit to a weekly time so it's
my hope that a lot of this is done on the mailing list.
I agree. One thing that has quickly become clear is that there are a
*lot* of areas to address and we can't possibly discuss them all in any
detail in a single meeting. I think the key will be to break things down
into specific areas of investigation that can be discussed in detail
on-list, and to use the meeting to coordinate activities, share latest
results, and so forth.
That sounds like a great plan to me. There's a *ton* of work to do and
more than enough to go around! I'm really excited by the response we've
gotten. Not sure if google hangouts will be able to accommodate the
number of people!
Post by Sage Weil
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Guang Yang
2014-09-26 02:47:29 UTC
Permalink
Hi Sage,
We are very interested to join (and contribute effort) as well. Following are a list of issues we have particular interests:
1> Large number of small files bring performance degradation most due to file system lookup (even worst with EC).
2> Messenger uses too many threads which bring burden for high density hardware (which I believe Haomai already has great progress).

Thanks,
Guang
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully be a
starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad above
with the topics you would like to discuss.
Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Nelson
2014-09-26 13:12:40 UTC
Permalink
Post by Matt W. Benjamin
Hi Sage,
1> Large number of small files bring performance degradation most due to file system lookup (even worst with EC).
Have you tried decreasing vfs_cache_pressure to retain dentries and
inodes in cache? I've had good luck improve performance for medium
sized IO workloads doing this.
Post by Matt W. Benjamin
2> Messenger uses too many threads which bring burden for high density hardware (which I believe Haomai already has great progress).
Yes, The biggest thing on my personal wish list has been to move to a
hybrid threading/event processing model.
Post by Matt W. Benjamin
Thanks,
Guang
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully be a
starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad above
with the topics you would like to discuss.
Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Guang Yang
2014-09-27 03:05:12 UTC
Permalink
Post by Matt W. Benjamin
Hi Sage,
1> Large number of small files bring performance degradation most due to file system lookup (even worst with EC).
Have you tried decreasing vfs_cache_pressure to retain dentries and inodes in cache? I've had good luck improve performance for medium sized IO workloads doing this.
Yeah we changed the setting from its default value 100 to 20 and it turned out improvement for dentry/inode cache (we also tried setting it to 1 but got OOM in some traffic pattern). Even with the setting change, given the object size is several hundred KB, we still observed lookup miss which increase latency, this became worst when we turned to EC as: 1) More files on each system. 2) The long tail determine the latency.
Post by Matt W. Benjamin
2> Messenger uses too many threads which bring burden for high density hardware (which I believe Haomai already has great progress).
Yes, The biggest thing on my personal wish list has been to move to a hybrid threading/event processing model.
Post by Matt W. Benjamin
Thanks,
Guang
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully be a
starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad above
with the topics you would like to discuss.
Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Matt W. Benjamin
2014-09-26 03:39:33 UTC
Permalink
Hi,

1. Agree (our item in the pad). We have prototyped something that works for us, but some work might be needed to make it useful upstream.

2. Accelio/XioMessenger does well on the front side. It will be interesting to compare notes.

Matt
Post by Haomai Wang
Thanks for sage!
I'm on the flight at Oct 1. :-(
Now my team is mainly worked on the performance of ceph, we have
1. encode/decode plays remarkable latency, especially in
ObjectStore::Transaction. I'm urgen in refactor ObjectStore API to
avoid encode/decode codes. It seemed has be signed in note(- remove
serialization from ObjectStore::Transaction (ymmv))
2. obvious latency for threadpool/workqueue model. Do we consider to
impl performance optimization workqueue to replace existing critical
workqueue such as op_wq in OSD.h and op_wq in FileStore.h. Now in my
AsyncMessenger impl, I will try to use custom and simple workqueue
impl to improve performance.
3. Large lock in client library such as ObjectCacher
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved
with
Post by Sage Weil
the current work on improving performance and how to better
coordinate
Post by Sage Weil
with other interested parties. A few meetings have taken place
offline
Post by Sage Weil
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the
public
always
Post by Sage Weil
sufficient, however. I'd like to also set up a regular weekly
meeting
Post by Sage Weil
using google hangouts or bluejeans so that all interested parties
can
Post by Sage Weil
share progress. There are a lot of things we can do during the
Hammer
Post by Sage Weil
cycle to improve things but it will require some coordination of
effort.
Post by Sage Weil
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China,
Europe,
Post by Sage Weil
and Israel who may want to join. As a starting point, how about
next
Post by Sage Weil
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully
be a
Post by Sage Weil
starting point.
I've also created an etherpad for collecting discussion/agenda items
at
Post by Sage Weil
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are
actively
Post by Sage Weil
working in this area and/or would like to join, and update the pad
above
Post by Sage Weil
with the topics you would like to discuss.
Thanks!
sage
--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI 48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Zhang, Jian
2014-09-26 06:40:01 UTC
Permalink
Sage,
It is really great to have a performance meeting running. We'd like to join the meeting.
So far, we have a list of topics that can be discussed on the next meeting(seems we already have many topics listed here, and its PRC holiday next week).
* Full SSD setup performance limitations
* Cache tiring optimization proposals
* EC Performance enhancement

Thanks
Jian


-----Original Message-----
From: ceph-devel-***@vger.kernel.org [mailto:ceph-devel-***@vger.kernel.org] On Behalf Of Sage Weil
Sent: Friday, September 26, 2014 2:27 AM
To: ceph-***@vger.kernel.org
Cc: ***@sandisk.com; ***@sandisk.com; ***@ts.fujitsu.com; ***@us.fujitsu.com; Shu, Xinxin; ***@gmail.com; ***@profihost.ag; Chen, Xiaoxi; ***@adfin.com; Wang, Zhiqiang; Ma, Jianpeng; ***@mellanox.com; ***@mellanox.com; ***@inktank.com
Subject: Weekly performance meeting

Hi everyone,

A number of people have approached me about how to get more involved with the current work on improving performance and how to better coordinate with other interested parties. A few meetings have taken place offline with good results but only a few interested parties were involved.

Ideally, we'd like to move as much of this dicussion into the public
forums: ceph-***@vger.kernel.org and #ceph-devel. That isn't always sufficient, however. I'd like to also set up a regular weekly meeting using google hangouts or bluejeans so that all interested parties can share progress. There are a lot of things we can do during the Hammer cycle to improve things but it will require some coordination of effort.

Among other things, we can discuss:

- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward

One challenge is timezones: there are developers in the US, China, Europe, and Israel who may want to join. As a starting point, how about next Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's

8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)

That is surely not the ideal time for everyone but it can hopefully be a starting point.

I've also created an etherpad for collecting discussion/agenda items at

http://pad.ceph.com/p/performance_weekly

Is there interest here? Please let everyone know if you are actively working in this area and/or would like to join, and update the pad above with the topics you would like to discuss.

Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to ***@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loic Dachary
2014-09-26 07:25:18 UTC
Permalink
Hi,

I added a section to http://pad.ceph.com/p/performance_weekly about the current efforts to optimize/benchmark erasure code plugins. @Zhang, Jian : the ISA erasure code plugin Andreas Peters worked on is not mentioned because it does not need any optimization, as far as I can tell ;-)

Cheers
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved with
the current work on improving performance and how to better coordinate
with other interested parties. A few meetings have taken place offline
with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China, Europe,
and Israel who may want to join. As a starting point, how about next
Wednesday, 15:00 UTC? If I didn't do my tz math wrong, that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully be a
starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad above
with the topics you would like to discuss.
Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Loïc Dachary, Artisan Logiciel Libre
Zhang, Jian
2014-09-26 08:37:00 UTC
Permalink
Loic,=20
Sorry for the confusion. We have observed up to ~20% EC performance deg=
radation and have some initial findings and want to have a discussion. =
=20

Thanks
Jian


-----Original Message-----
=46rom: Loic Dachary [mailto:***@dachary.org]=20
Sent: Friday, September 26, 2014 3:25 PM
To: ceph-***@vger.kernel.org; Zhang, Jian
Subject: Re: Weekly performance meeting

Hi,

I added a section to http://pad.ceph.com/p/performance_weekly about the=
current efforts to optimize/benchmark erasure code plugins. @Zhang, Ji=
an : the ISA erasure code plugin Andreas Peters worked on is not mentio=
ned because it does not need any optimization, as far as I can tell ;-)=
=20

Cheers
Post by Sage Weil
Hi everyone,
=20
A number of people have approached me about how to get more involved=20
with the current work on improving performance and how to better=20
coordinate with other interested parties. A few meetings have taken=20
place offline with good results but only a few interested parties wer=
e involved.
Post by Sage Weil
=20
Ideally, we'd like to move as much of this dicussion into the public
s=20
Post by Sage Weil
sufficient, however. I'd like to also set up a regular weekly meetin=
g=20
Post by Sage Weil
using google hangouts or bluejeans so that all interested parties can=
=20
Post by Sage Weil
share progress. There are a lot of things we can do during the Hamme=
r=20
Post by Sage Weil
cycle to improve things but it will require some coordination of effo=
rt.
Post by Sage Weil
=20
=20
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
=20
One challenge is timezones: there are developers in the US, China,=20
Europe, and Israel who may want to join. As a starting point, how=20
about next Wednesday, 15:00 UTC? If I didn't do my tz math wrong,=20
that's
=20
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
=20
That is surely not the ideal time for everyone but it can hopefully b=
e=20
Post by Sage Weil
a starting point.
=20
I've also created an etherpad for collecting discussion/agenda items=20
at
=20
http://pad.ceph.com/p/performance_weekly
=20
Is there interest here? Please let everyone know if you are actively=
=20
Post by Sage Weil
working in this area and/or would like to join, and update the pad=20
above with the topics you would like to discuss.
=20
Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"=
=20
Post by Sage Weil
info at http://vger.kernel.org/majordomo-info.html
=20
--
Lo=EFc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loic Dachary
2014-09-26 08:56:16 UTC
Permalink
Hi,

I added

- isa : performances improvement expected ~20% [jian.zhang]

to http://pad.ceph.com/p/performance_weekly . Do you have more information about this degradation ?

Cheers
Loic,
Sorry for the confusion. We have observed up to ~20% EC performance degradation and have some initial findings and want to have a discussion.
Thanks
Jian
-----Original Message-----
Sent: Friday, September 26, 2014 3:25 PM
Subject: Re: Weekly performance meeting
Hi,
Cheers
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more involved
with the current work on improving performance and how to better
coordinate with other interested parties. A few meetings have taken
place offline with good results but only a few interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the public
sufficient, however. I'd like to also set up a regular weekly meeting
using google hangouts or bluejeans so that all interested parties can
share progress. There are a lot of things we can do during the Hammer
cycle to improve things but it will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China,
Europe, and Israel who may want to join. As a starting point, how
about next Wednesday, 15:00 UTC? If I didn't do my tz math wrong,
that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully be
a starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are actively
working in this area and/or would like to join, and update the pad
above with the topics you would like to discuss.
Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
info at http://vger.kernel.org/majordomo-info.html
--
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Loïc Dachary, Artisan Logiciel Libre
Andreas Bluemle
2014-10-01 09:30:58 UTC
Permalink
Hi,

to illustrate the "number of threads issue":
we had setup a cluster with 11 storage nodes and a total of 375 osd's,
i.e. like 30 to 40 osd's per storage node.
Looking at one of the storage nodes when the
cluster is idle (no client I/O, no scrub) we encounter

- up to 82.000 ceph-osd threads
or approx. 2.000 threads per osd

- a CPU load of 20%:
this is on a storage node with 12 CPU cores
which means that more than 2 CPU cores are busy

- a network load of almost 50.000 packets/second:
separate cluster and public network,
12.000 packets per second on each network interface,
outgoing and incoming (heartbeats?)


Regards

Andreas Bluemle


On Thu, 25 Sep 2014 21:27:28 +0200
Post by Matt W. Benjamin
Hi Sage,
I'm definitely interested in joining this weekly call starting Oct
1st. Thanks for this initiative!
- how can we reduce the number of threads in the system
-- including to avoid the context switches in between
-- including to avoid the queues and locks in between
- how we can reduce the number of lines of code
-- including the multiple system calls for each IO
- how we can introduce a high efficient timestamp collection of the
most important FN check-points (see for example the attached file)
to measure the change and effect of our actions
Best Regards,
-Dieter
Post by Sage Weil
Hi everyone,
A number of people have approached me about how to get more
involved with the current work on improving performance and how to
better coordinate with other interested parties. A few meetings
have taken place offline with good results but only a few
interested parties were involved.
Ideally, we'd like to move as much of this dicussion into the
isn't always sufficient, however. I'd like to also set up a
regular weekly meeting using google hangouts or bluejeans so that
all interested parties can share progress. There are a lot of
things we can do during the Hammer cycle to improve things but it
will require some coordination of effort.
- observed performance limitations
- high level strategies for addressing them
- proposed patch sets and their performance impact
- anything else that will move us forward
One challenge is timezones: there are developers in the US, China,
Europe, and Israel who may want to join. As a starting point, how
about next Wednesday, 15:00 UTC? If I didn't do my tz math wrong,
that's
8:00 (PDT, California)
15:00 (UTC)
18:00 (IDT, Israel)
23:00 (CST, China)
That is surely not the ideal time for everyone but it can hopefully
be a starting point.
I've also created an etherpad for collecting discussion/agenda items at
http://pad.ceph.com/p/performance_weekly
Is there interest here? Please let everyone know if you are
actively working in this area and/or would like to join, and update
the pad above with the topics you would like to discuss.
Thanks!
sage
--
Andreas Bluemle mailto:***@itxperts.de
ITXperts GmbH http://www.itxperts.de
Balanstrasse 73, Geb. 08 Phone: (+49) 89 89044917
D-81541 Muenchen (Germany) Fax: (+49) 89 89044910

Company details: http://www.itxperts.de/imprint.htm
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...