RBD readahead strategies

Sage Weil

2014-09-12 03:05:39 UTC

Post by Adam Crume
I've been testing a few strategies for RBD readahead and wanted to
share my results as well as ask for input.
I have four sample workloads that I replayed at maximum speed with
rbd-replay. boot-ide and boot-virtio are captured from booting a VM
with the image on the IDE and virtio buses, respectively. Likewise,
grep-ide and grep-virtio are captured from a large grep run. (I'm not
entirely sure why the IDE and virtio workloads are different, but part
of it is the number of pending requests allowed.)
- none: No readahead.
- plain: My initial implementation. The readahead window doubles for
each readahead request, up to a limit, and resets when a random
request is detected.
- aligned: Same as above, but readahead requests are aligned with
object boundaries, when possible.
- eager: When activated, read to the end of the object.
For all of these, 10 sequential requests trigger readahead, the
maximum readahead size is 4 MB, and "rbd readahead disable after
bytes" is disabled (meaning that readahead is enabled for the entire
workload). The object size is the default 4 MB, and data is striped
over a single object. (Alignment with stripes or object sets is
ignored for now.)
workload strategy time (seconds) RA ops RA MB read ops read MB
boot-ide none 46.22 +/- 0.41 0 0 57516 407
boot-ide plain 11.42 +/- 0.25 281 203 57516 407
boot-ide aligned 11.46 +/- 0.13 276 201 57516 407
boot-ide eager 12.48 +/- 0.61 111 303 57516 407
boot-virtio none 9.05 +/- 0.25 0 0 11851 393
boot-virtio plain 8.05 +/- 0.38 451 221 11851 393
boot-virtio aligned 7.86 +/- 0.27 452 213 11851 393
boot-virtio eager 9.17 +/- 0.34 249 600 11851 393
grep-ide none 138.55 +/- 1.67 0 0 130104 3044
grep-ide plain 136.07 +/- 1.57 397 867 130104 3044
grep-ide aligned 137.30 +/- 1.77 379 844 130104 3044
grep-ide eager 138.77 +/- 1.52 346 993 130104 3044
grep-virtio none 120.73 +/- 1.33 0 0 130061 2820
grep-virtio plain 121.29 +/- 1.28 1186 1485 130061 2820
grep-virtio aligned 123.32 +/- 1.29 1139 1409 130061 2820
grep-virtio eager 127.75 +/- 1.32 842 2218 130061 2820
(The time is the mean wall-clock time +/- the margin of error with
99.7% confidence. RA=readahead.)
Right off the bat, readahead is a huge improvement for the boot-ide
workload, which is no surprise because it issues 50,000 sequential,
single-sector reads. (Why the early boot process is so inefficient is
open for speculation, but that's a real, natural workload.)
boot-virtio also sees an improvement, although not nearly so dramatic.
The grep workloads show no statistically significant improvement.
One conclusion I draw is that 'eager' is, well, too eager. 'aligned'
shows no statistically significant difference from 'plain', and
'plain' is no worse than 'none' (at statistically significant levels)
and sometimes better.
Should the readahead strategy be configurable, or should we just stick
with whichever seems the best one? Is there anything big I'm missing?

Aligned seems like, even if it is no faster fromthe client's perspective,
will result in fewer IOs on teh backend, right? That makes me think we
should go with that if we have to choose one.

Have you looked at what it might take to put the readahead logic in
ObjectCacher somewhere, or in some other piece of shared code that would
allow us to subsume the Client.cc readahead code as well? Perhaps simply
wrapping the readahead logic in a single class such that the calling code
is super simple (just feeds in current offset and conditionally issues a
readahead IO) would work as well.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html