Discussion:
[PATCH 0/3] libceph, rbd: don't lockup under memory pressure
Ilya Dryomov
2014-10-10 18:27:03 UTC
Permalink
Hello,

These are fixes for the problem that Micha was running into and helped
debug. It appeared to be a stupid bug in messenger and to a lesser
extent in rbd - workqueues in memory reclaim paths really do need to
have WQ_MEM_RECLAIM set.

Thanks,

Ilya


Ilya Dryomov (3):
libceph: ceph-msgr workqueue needs a resque worker
rbd: rbd workqueues need a resque worker
rbd: use a single workqueue for all devices

drivers/block/rbd.c | 32 ++++++++++++++++++--------------
net/ceph/messenger.c | 6 +++++-
2 files changed, 23 insertions(+), 15 deletions(-)
--
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2014-10-10 18:27:04 UTC
Permalink
Commit f363e45fd118 ("net/ceph: make ceph_msgr_wq non-reentrant")
effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq. This is
wrong - libceph is very much a memory reclaim path, so restore it.

Cc: ***@vger.kernel.org # needs backporting for < 3.12
Signed-off-by: Ilya Dryomov <***@redhat.com>
Tested-by: Micha Krause <***@krausam.de>
---
net/ceph/messenger.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 9764c771cfb1..559c9f619c20 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -292,7 +292,11 @@ int ceph_msgr_init(void)
if (ceph_msgr_slab_init())
return -ENOMEM;

- ceph_msgr_wq = alloc_workqueue("ceph-msgr", 0, 0);
+ /*
+ * The number of active work items is limited by the number of
+ * connections, so leave @max_active at default.
+ */
+ ceph_msgr_wq = alloc_workqueue("ceph-msgr", WQ_MEM_RECLAIM, 0);
if (ceph_msgr_wq)
return 0;
--
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Sage Weil
2014-10-14 15:56:19 UTC
Permalink
Post by Ilya Dryomov
Commit f363e45fd118 ("net/ceph: make ceph_msgr_wq non-reentrant")
effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq. This is
wrong - libceph is very much a memory reclaim path, so restore it.
---
net/ceph/messenger.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 9764c771cfb1..559c9f619c20 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -292,7 +292,11 @@ int ceph_msgr_init(void)
if (ceph_msgr_slab_init())
return -ENOMEM;
- ceph_msgr_wq = alloc_workqueue("ceph-msgr", 0, 0);
+ /*
+ * The number of active work items is limited by the number of
+ */
+ ceph_msgr_wq = alloc_workqueue("ceph-msgr", WQ_MEM_RECLAIM, 0);
if (ceph_msgr_wq)
return 0;
--
1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2014-10-10 18:27:05 UTC
Permalink
Need to use WQ_MEM_RECLAIM for our workqueues to prevent I/O lockups
under memory pressure - we sit on the memory reclaim path.

Cc: ***@vger.kernel.org # 3.17, needs backporting for 3.16
Signed-off-by: Ilya Dryomov <***@redhat.com>
Tested-by: Micha Krause <***@krausam.de>
---
drivers/block/rbd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 7712ae65753c..0a54c588e433 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -5242,7 +5242,8 @@ static int rbd_dev_device_setup(struct rbd_device *rbd_dev)
set_capacity(rbd_dev->disk, rbd_dev->mapping.size / SECTOR_SIZE);
set_disk_ro(rbd_dev->disk, rbd_dev->mapping.read_only);

- rbd_dev->rq_wq = alloc_workqueue("%s", 0, 0, rbd_dev->disk->disk_name);
+ rbd_dev->rq_wq = alloc_workqueue("%s", WQ_MEM_RECLAIM, 0,
+ rbd_dev->disk->disk_name);
if (!rbd_dev->rq_wq) {
ret = -ENOMEM;
goto err_out_mapping;
--
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Sage Weil
2014-10-14 15:57:57 UTC
Permalink
Post by Ilya Dryomov
Need to use WQ_MEM_RECLAIM for our workqueues to prevent I/O lockups
under memory pressure - we sit on the memory reclaim path.
---
drivers/block/rbd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 7712ae65753c..0a54c588e433 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -5242,7 +5242,8 @@ static int rbd_dev_device_setup(struct rbd_device *rbd_dev)
set_capacity(rbd_dev->disk, rbd_dev->mapping.size / SECTOR_SIZE);
set_disk_ro(rbd_dev->disk, rbd_dev->mapping.read_only);
- rbd_dev->rq_wq = alloc_workqueue("%s", 0, 0, rbd_dev->disk->disk_name);
+ rbd_dev->rq_wq = alloc_workqueue("%s", WQ_MEM_RECLAIM, 0,
+ rbd_dev->disk->disk_name);
if (!rbd_dev->rq_wq) {
ret = -ENOMEM;
goto err_out_mapping;
--
1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ilya Dryomov
2014-10-10 18:27:06 UTC
Permalink
Using one queue per device doesn't make much sense given that our
workfn processes "devices" and not "requests". Switch to a single
workqueue for all devices.

Signed-off-by: Ilya Dryomov <***@redhat.com>
---
drivers/block/rbd.c | 33 ++++++++++++++++++---------------
1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 0a54c588e433..be8d44af6ae1 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -342,7 +342,6 @@ struct rbd_device {

struct list_head rq_queue; /* incoming rq queue */
spinlock_t lock; /* queue, flags, open_count */
- struct workqueue_struct *rq_wq;
struct work_struct rq_work;

struct rbd_image_header header;
@@ -402,6 +401,8 @@ static struct kmem_cache *rbd_segment_name_cache;
static int rbd_major;
static DEFINE_IDA(rbd_dev_id_ida);

+static struct workqueue_struct *rbd_wq;
+
/*
* Default to false for now, as single-major requires >= 0.75 version of
* userspace rbd utility.
@@ -3452,7 +3453,7 @@ static void rbd_request_fn(struct request_queue *q)
}

if (queued)
- queue_work(rbd_dev->rq_wq, &rbd_dev->rq_work);
+ queue_work(rbd_wq, &rbd_dev->rq_work);
}

/*
@@ -5242,16 +5243,9 @@ static int rbd_dev_device_setup(struct rbd_device *rbd_dev)
set_capacity(rbd_dev->disk, rbd_dev->mapping.size / SECTOR_SIZE);
set_disk_ro(rbd_dev->disk, rbd_dev->mapping.read_only);

- rbd_dev->rq_wq = alloc_workqueue("%s", WQ_MEM_RECLAIM, 0,
- rbd_dev->disk->disk_name);
- if (!rbd_dev->rq_wq) {
- ret = -ENOMEM;
- goto err_out_mapping;
- }
-
ret = rbd_bus_add_dev(rbd_dev);
if (ret)
- goto err_out_workqueue;
+ goto err_out_mapping;

/* Everything's ready. Announce the disk to the world. */

@@ -5263,9 +5257,6 @@ static int rbd_dev_device_setup(struct rbd_device *rbd_dev)

return ret;

-err_out_workqueue:
- destroy_workqueue(rbd_dev->rq_wq);
- rbd_dev->rq_wq = NULL;
err_out_mapping:
rbd_dev_mapping_clear(rbd_dev);
err_out_disk:
@@ -5512,7 +5503,6 @@ static void rbd_dev_device_release(struct device *dev)
{
struct rbd_device *rbd_dev = dev_to_rbd_dev(dev);

- destroy_workqueue(rbd_dev->rq_wq);
rbd_free_disk(rbd_dev);
clear_bit(RBD_DEV_FLAG_EXISTS, &rbd_dev->flags);
rbd_dev_mapping_clear(rbd_dev);
@@ -5716,11 +5706,21 @@ static int __init rbd_init(void)
if (rc)
return rc;

+ /*
+ * The number of active work items is limited by the number of
+ * rbd devices, so leave @max_active at default.
+ */
+ rbd_wq = alloc_workqueue(RBD_DRV_NAME, WQ_MEM_RECLAIM, 0);
+ if (!rbd_wq) {
+ rc = -ENOMEM;
+ goto err_out_slab;
+ }
+
if (single_major) {
rbd_major = register_blkdev(0, RBD_DRV_NAME);
if (rbd_major < 0) {
rc = rbd_major;
- goto err_out_slab;
+ goto err_out_wq;
}
}

@@ -5738,6 +5738,8 @@ static int __init rbd_init(void)
err_out_blkdev:
if (single_major)
unregister_blkdev(rbd_major, RBD_DRV_NAME);
+err_out_wq:
+ destroy_workqueue(rbd_wq);
err_out_slab:
rbd_slab_exit();
return rc;
@@ -5749,6 +5751,7 @@ static void __exit rbd_exit(void)
rbd_sysfs_cleanup();
if (single_major)
unregister_blkdev(rbd_major, RBD_DRV_NAME);
+ destroy_workqueue(rbd_wq);
rbd_slab_exit();
}
--
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...