[RFC]New Message Implementation Based on Event

Discussion:

Haomai Wang

2014-09-12 03:09:59 UTC

Hi all,

Recently, I did some basic work on new message implementation based on
event(https://github.com/yuyuyu101/ceph/tree/msg-event). The basic
idea is that we use a Processor thread for each Messenger to monitor
all sockets and dispatch fd to threadpool. The event mechanism can be
epoll, kqueue, poll or select. The thread in threadpool will
read/write with this socket and dispatch message later.

Now the branch has passed basic tests and before make it more stable
and pass more QA suites. I want to do some benchmark tests compared to
pipe implementation with large-scale cluster. I would like to use at
least 100 OSDs(SSD) and hundreds of clients to test it. And now the
benchmark for only one OSD, the client can get the same latency with
pipe implementation and the latency stdev will be smaller.

The background for this implementation is that pipe implementation
consumes too much overhead on context switch and thread resource. In
our env, several ceph-osd is running on compute node which also runs
KVM process.

Do you have any ideas about this, or some serious concerns compared to pipe.

--
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Sage Weil

2014-09-15 15:51:35 UTC

Permalink

Hi Haomai,

Post by Haomai Wang
Hi all,
Recently, I did some basic work on new message implementation based on
event(https://github.com/yuyuyu101/ceph/tree/msg-event). The basic
idea is that we use a Processor thread for each Messenger to monitor
all sockets and dispatch fd to threadpool. The event mechanism can be
epoll, kqueue, poll or select. The thread in threadpool will
read/write with this socket and dispatch message later.
Now the branch has passed basic tests and before make it more stable
and pass more QA suites. I want to do some benchmark tests compared to
pipe implementation with large-scale cluster. I would like to use at
least 100 OSDs(SSD) and hundreds of clients to test it. And now the
benchmark for only one OSD, the client can get the same latency with
pipe implementation and the latency stdev will be smaller.
The background for this implementation is that pipe implementation
consumes too much overhead on context switch and thread resource. In
our env, several ceph-osd is running on compute node which also runs
KVM process.
Do you have any ideas about this, or some serious concerns compared to pipe.

I haven't had time to look at this in much detail yet, but at a high
level, this looks awesome! It sounds like using an event lib for this is
a good approach, and from a quick skim it looks like you've already done
the hard work of breaking all of the logic in Pipe.cc into a state
machine.

How much testing have you done with this?

I hope to find more time this week to look in more detail, but wanted to
let you know I didn't miss this before that :)

Cheers-
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Haomai Wang

2014-09-16 02:33:47 UTC

Permalink

As for testing, now I mainly passed tests in src/tests such as
ceph_test_rados. Because of the lack of Messenger's unittest, I have
to deploy this branch into my dev cluster to test. I'm thinking in
make ms_inject* options available in this Messenger for failure
coverage.

Post by Sage Weil
Hi Haomai,

I haven't had time to look at this in much detail yet, but at a high
level, this looks awesome! It sounds like using an event lib for this is
a good approach, and from a quick skim it looks like you've already done
the hard work of breaking all of the logic in Pipe.cc into a state
machine.
How much testing have you done with this?
I hope to find more time this week to look in more detail, but wanted to
let you know I didn't miss this before that :)
Cheers-
sage

Sage Weil

2014-10-07 14:59:33 UTC

Permalink

Hi Haomai,

A branch cleaning up the messenger interface a bit more just merged.
Everything is now using the Messenger::create() factory method, and the
type of messenger instantiated is controlled by the ms_type config option.
There's also a reorg of the SimpleMessenger files into msg/simple/.

Do you mind rebasing your series onto the latest master?

Since this is an optional backend I think we can take a similar route as
KeyValueStore and merge it early so that it is easier to test and
improve.

Will you be able to join the performance call tomorrow?
sage

Post by Haomai Wang
As for testing, now I mainly passed tests in src/tests such as
ceph_test_rados. Because of the lack of Messenger's unittest, I have
to deploy this branch into my dev cluster to test. I'm thinking in
make ms_inject* options available in this Messenger for failure
coverage.

Post by Sage Weil
Hi Haomai,

I haven't had time to look at this in much detail yet, but at a high
level, this looks awesome! It sounds like using an event lib for this is
a good approach, and from a quick skim it looks like you've already done
the hard work of breaking all of the logic in Pipe.cc into a state
machine.
How much testing have you done with this?
I hope to find more time this week to look in more detail, but wanted to
let you know I didn't miss this before that :)
Cheers-
sage

--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Mark Nelson

2014-10-07 17:50:00 UTC

Permalink

Btw, just wanted to say that this is fantastic and exactly what I was
hoping we'd eventually do. :)

Mark

Post by Sage Weil
Hi Haomai,
A branch cleaning up the messenger interface a bit more just merged.
Everything is now using the Messenger::create() factory method, and the
type of messenger instantiated is controlled by the ms_type config option.
There's also a reorg of the SimpleMessenger files into msg/simple/.
Do you mind rebasing your series onto the latest master?
Since this is an optional backend I think we can take a similar route as
KeyValueStore and merge it early so that it is easier to test and
improve.
Will you be able to join the performance call tomorrow?
sage

Post by Sage Weil
Hi Haomai,

I haven't had time to look at this in much detail yet, but at a high
level, this looks awesome! It sounds like using an event lib for this is
a good approach, and from a quick skim it looks like you've already done
the hard work of breaking all of the logic in Pipe.cc into a state
machine.
How much testing have you done with this?
I hope to find more time this week to look in more detail, but wanted to
let you know I didn't miss this before that :)
Cheers-
sage

--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

M Ranga Swami Reddy

2014-10-07 18:52:36 UTC

Permalink

Hi,
Could you please clarify, if this event message can be used with
radosgw and this message can be populated to outside of ceph (ie like
openstack metering services)?

Thanks
Swami

Btw, just wanted to say that this is fantastic and exactly what I was hoping
we'd eventually do. :)
Mark

Post by Sage Weil
Hi Haomai,

I haven't had time to look at this in much detail yet, but at a high
level, this looks awesome! It sounds like using an event lib for this is
a good approach, and from a quick skim it looks like you've already done
the hard work of breaking all of the logic in Pipe.cc into a state
machine.
How much testing have you done with this?
I hope to find more time this week to look in more detail, but wanted to
let you know I didn't miss this before that :)
Cheers-
sage

--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Haomai Wang

2014-10-08 01:46:54 UTC

Permalink

No problem, I just came from a long vacation. I will create a PR ASAP

Post by Sage Weil
Hi Haomai,

I haven't had time to look at this in much detail yet, but at a high
level, this looks awesome! It sounds like using an event lib for this is
a good approach, and from a quick skim it looks like you've already done
the hard work of breaking all of the logic in Pipe.cc into a state
machine.
How much testing have you done with this?
I hope to find more time this week to look in more detail, but wanted to
let you know I didn't miss this before that :)
Cheers-
sage

--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html