Discussion:
70+ OSD are DOWN and not coming up
Karan Singh
2014-05-20 09:02:18 UTC
Permalink
Hello Cephers , need your suggestion for troubleshooting.

My cluster is terribly struggling , 70+ osd are down out of 165

Problem —> OSD are getting marked out of cluster and are down. The cluster is degraded. On checking logs of failed OSD we are getting wired entries that are continuously getting generated.

Osd Debug logs :: http://pastebin.com/agTKh6zB


2014-05-20 10:19:03.699886 7f2328e237a0 0 osd.158 357532 done with init, starting boot process
2014-05-20 10:19:03.700093 7f22ff621700 0 -- 192.168.1.112:6802/3807 >> 192.168.1.109:6802/910005982 pipe(0x8698500 sd=35 :33500 s=1 pgs=0 cs=0 l=0 c=0x83018c0).connect claims to be 192.168.1.109:6802/63896 not 192.168.1.109:6802/910005982 - wrong node!
2014-05-20 10:19:03.700152 7f22ff621700 0 -- 192.168.1.112:6802/3807 >> 192.168.1.109:6802/910005982 pipe(0x8698500 sd=35 :33500 s=1 pgs=0 cs=0 l=0 c=0x83018c0).fault with nothing to send, going to standby
2014-05-20 10:19:09.551269 7f22fdd12700 0 -- 192.168.1.112:6802/3807 >> 192.168.1.109:6803/1176009454 pipe(0x56aee00 sd=53 :40060 s=1 pgs=0 cs=0 l=0 c=0x533fd20).connect claims to be 192.168.1.109:6803/63896 not 192.168.1.109:6803/1176009454 - wrong node!
2014-05-20 10:19:09.551347 7f22fdd12700 0 -- 192.168.1.112:6802/3807 >> 192.168.1.109:6803/1176009454 pipe(0x56aee00 sd=53 :40060 s=1 pgs=0 cs=0 l=0 c=0x533fd20).fault with nothing to send, going to standby
2014-05-20 10:19:09.703901 7f22fd80d700 0 -- 192.168.1.112:6802/3807 >> 192.168.1.113:6802/13870 pipe(0x56adf00 sd=137 :42889 s=1 pgs=0 cs=0 l=0 c=0x8302aa0).connect claims to be 192.168.1.113:6802/24612 not 192.168.1.113:6802/13870 - wrong node!
2014-05-20 10:19:09.704039 7f22fd80d700 0 -- 192.168.1.112:6802/3807 >> 192.168.1.113:6802/13870 pipe(0x56adf00 sd=137 :42889 s=1 pgs=0 cs=0 l=0 c=0x8302aa0).fault with nothing to send, going to standby
2014-05-20 10:19:10.243139 7f22fd005700 0 -- 192.168.1.112:6802/3807 >> 192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0 c=0x8304780).connect claims to be 192.168.1.112:6800/2852 not 192.168.1.112:6800/14114 - wrong node!
2014-05-20 10:19:10.243190 7f22fd005700 0 -- 192.168.1.112:6802/3807 >> 192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0 c=0x8304780).fault with nothing to send, going to standby
2014-05-20 10:19:10.349693 7f22fc7fd700 0 -- 192.168.1.112:6802/3807 >> 192.168.1.109:6800/13492 pipe(0x8698c80 sd=156 :0 s=1 pgs=0 cs=0 l=0 c=0x83070c0).fault with nothing to send, going to standby


ceph -v
ceph version 0.80-469-g991f7f1 (991f7f15a6e107b33a24bbef1169f21eb7fcce2c) #
ceph osd stat
osdmap e357073: 165 osds: 91 up, 165 in
flags noout #
I have tried doing :

1. Restarting the problematic OSDs , but no luck
2. i restarted entire host but no luck, still osds are down and getting the same mesage

2014-05-20 10:19:10.243139 7f22fd005700 0 -- 192.168.1.112:6802/3807 >> 192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0 c=0x8304780).connect claims to be 192.168.1.112:6800/2852 not 192.168.1.112:6800/14114 - wrong node!
2014-05-20 10:19:10.243190 7f22fd005700 0 -- 192.168.1.112:6802/3807 >> 192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0 c=0x8304780).fault with nothing to send, going to standby
2014-05-20 10:19:10.349693 7f22fc7fd700 0 -- 192.168.1.112:6802/3807 >> 192.168.1.109:6800/13492 pipe(0x8698c80 sd=156 :0 s=1 pgs=0 cs=0 l=0 c=0x83070c0).fault with nothing to send, going to standby
2014-05-20 10:22:23.312473 7f2307e61700 0 osd.158 357781 do_command r=0
2014-05-20 10:22:23.326110 7f2307e61700 0 osd.158 357781 do_command r=0 debug_osd=0/5
2014-05-20 10:22:23.326123 7f2307e61700 0 log [INF] : debug_osd=0/5
2014-05-20 10:34:08.161864 7f230224d700 0 -- 192.168.1.112:6802/3807 >> 192.168.1.102:6808/13276 pipe(0x8698280 sd=22 :41078 s=2 pgs=603 cs=1 l=0 c=0x8301600).fault with nothing to send, going to standby

3. Disks do not have errors , no message in dmesg and /var/log/messages

4. there was a bug in the past http://tracker.ceph.com/issues/4006 , dont know it again came bacin in Firefly

5. Recently no activity performed on cluster , except some pool and keys creation for cinder /glance integration

6. Nodes have enough free resources for osds.

7. No issues with network , osds are down on all cluster nodes. not from a single node.


****************************************************************
Karan Singh
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/
****************************************************************
Sage Weil
2014-05-20 15:18:08 UTC
Permalink
Post by Karan Singh
Hello Cephers , need your suggestion for troubleshooting.
My cluster is terribly struggling , 70+ osd are down out of 165
Problem ?>OSD are getting marked out of cluster and are down. The cluster is
degraded. On checking logs of failed OSD we are getting wired entries that
are continuously getting generated.
Tracking this at http://tracker.ceph.com/issues/8387

The most recent bits you posted in the ticket don't quite make sense: the
OSD is trying to connect to an address for an OSD that is currently marked
down. I suspect this is just timing between when the logs were captured
and when teh ceph osd dump was captured. To get a complete pictures,
please:

1) add

debug osd = 20
debug ms = 1

in [osd] and restart all osds

2) ceph osd set nodown

(to prevent flapping)

3) find some OSD that is showing these messages

4) capture a 'ceph osd dump' output.

Also happy to debug this interactively over IRC; that will likely be
faster!

Thanks-
sage
Post by Karan Singh
Osd Debug logs ::  http://pastebin.com/agTKh6zB
1. 2014-05-20 10:19:03.699886 7f2328e237a0  0 osd.158 357532 done with
init, starting boot process
2. 2014-05-20 10:19:03.700093 7f22ff621700  0 -- 192.168.1.112:6802/3807 >>
192.168.1.109:6802/910005982 pipe(0x8698500 sd=35 :33500 s=1 pgs=0 cs=0
l=0 c=0x83018c0).connect claims to be 192.168.1.109:6802/63896 not
192.168.1.109:6802/910005982 - wrong node!
3. 2014-05-20 10:19:03.700152 7f22ff621700  0 -- 192.168.1.112:6802/3807 >>
192.168.1.109:6802/910005982 pipe(0x8698500 sd=35 :33500 s=1 pgs=0 cs=0
l=0 c=0x83018c0).fault with nothing to send, going to standby
4. 2014-05-20 10:19:09.551269 7f22fdd12700  0 -- 192.168.1.112:6802/3807 >>
192.168.1.109:6803/1176009454 pipe(0x56aee00 sd=53 :40060 s=1 pgs=0 cs=0
l=0 c=0x533fd20).connect claims to be 192.168.1.109:6803/63896 not
192.168.1.109:6803/1176009454 - wrong node!
5. 2014-05-20 10:19:09.551347 7f22fdd12700  0 -- 192.168.1.112:6802/3807 >>
192.168.1.109:6803/1176009454 pipe(0x56aee00 sd=53 :40060 s=1 pgs=0 cs=0
l=0 c=0x533fd20).fault with nothing to send, going to standby
6. 2014-05-20 10:19:09.703901 7f22fd80d700  0 -- 192.168.1.112:6802/3807 >>
192.168.1.113:6802/13870 pipe(0x56adf00 sd=137 :42889 s=1 pgs=0 cs=0 l=0
c=0x8302aa0).connect claims to be 192.168.1.113:6802/24612 not
192.168.1.113:6802/13870 - wrong node!
7. 2014-05-20 10:19:09.704039 7f22fd80d700  0 -- 192.168.1.112:6802/3807 >>
192.168.1.113:6802/13870 pipe(0x56adf00 sd=137 :42889 s=1 pgs=0 cs=0 l=0
c=0x8302aa0).fault with nothing to send, going to standby
8. 2014-05-20 10:19:10.243139 7f22fd005700  0 -- 192.168.1.112:6802/3807 >>
192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0
c=0x8304780).connect claims to be 192.168.1.112:6800/2852 not
192.168.1.112:6800/14114 - wrong node!
9. 2014-05-20 10:19:10.243190 7f22fd005700  0 -- 192.168.1.112:6802/3807 >>
192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0
c=0x8304780).fault with nothing to send, going to standby
10. 2014-05-20 10:19:10.349693 7f22fc7fd700  0 -- 192.168.1.112:6802/3807 >>
192.168.1.109:6800/13492 pipe(0x8698c80 sd=156 :0 s=1 pgs=0 cs=0 l=0
c=0x83070c0).fault with nothing to send, going to standby
1. ceph -v
ceph version 0.80-469-g991f7f1
(991f7f15a6e107b33a24bbef1169f21eb7fcce2c) #
1. ceph osd stat
osdmap e357073: 165 osds: 91 up, 165 in
flags noout #
1. Restarting the problematic OSDs , but no luck
2.  i restarted entire host but no luck, still osds are down and getting the
same mesage
1. 2014-05-20 10:19:10.243139 7f22fd005700  0 -- 192.168.1.112:6802/3807 >>
192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0
c=0x8304780).connect claims to be 192.168.1.112:6800/2852 not
192.168.1.112:6800/14114 - wrong node!
2. 2014-05-20 10:19:10.243190 7f22fd005700  0 -- 192.168.1.112:6802/3807 >>
192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0
c=0x8304780).fault with nothing to send, going to standby
3. 2014-05-20 10:19:10.349693 7f22fc7fd700  0 -- 192.168.1.112:6802/3807 >>
192.168.1.109:6800/13492 pipe(0x8698c80 sd=156 :0 s=1 pgs=0 cs=0 l=0
c=0x83070c0).fault with nothing to send, going to standby
4. 2014-05-20 10:22:23.312473 7f2307e61700  0 osd.158 357781 do_command r=0
5. 2014-05-20 10:22:23.326110 7f2307e61700  0 osd.158 357781 do_command r=0
debug_osd=0/5
6. 2014-05-20 10:22:23.326123 7f2307e61700  0 log [INF] : debug_osd=0/5
7. 2014-05-20 10:34:08.161864 7f230224d700  0 -- 192.168.1.112:6802/3807 >>
192.168.1.102:6808/13276 pipe(0x8698280 sd=22 :41078 s=2 pgs=603 cs=1
l=0 c=0x8301600).fault with nothing to send, going to standby
3. Disks do not have errors , no message in dmesg and /var/log/messages
4. there was a bug in the past http://tracker.ceph.com/issues/4006 , dont
know it again came bacin in Firefly
5. Recently no activity performed on cluster , except some pool and keys
creation for cinder /glance integration
6. Nodes have enough free resources for osds.
7. No issues with network , osds are down on all cluster nodes. not from a single node.
****************************************************************
Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/
****************************************************************
Karan Singh
2014-05-21 12:37:50 UTC
Permalink
Hello Sage

nodown, noout set on cluster

# ceph status
cluster 009d3518-e60d-4f74-a26d-c08c1976263c
health HEALTH_WARN 1133 pgs degraded; 44 pgs incomplete; 42 pgs stale; 45 pgs stuck inactive; 42 pgs stuck stale; 2602 pgs stuck unclean; recovery 206/2199 objects degraded (9.368%); 40/165 in osds are down; nodown,noout flag(s) set
monmap e4: 4 mons at {storage0101-ib=192.168.100.101:6789/0,storage0110-ib=192.168.100.110:6789/0,storage0114-ib=192.168.100.114:6789/0,storage0115-ib=192.168.100.115:6789/0}, election epoch 18, quorum 0,1,2,3 storage0101-ib,storage0110-ib,storage0114-ib,storage0115-ib
osdmap e358031: 165 osds: 125 up, 165 in
flags nodown,noout
pgmap v604305: 4544 pgs, 6 pools, 4309 MB data, 733 objects
3582 GB used, 357 TB / 361 TB avail
206/2199 objects degraded (9.368%)
1 inactive
5 stale+active+degraded+remapped
1931 active+clean
2 stale+incomplete
21 stale+active+remapped
380 active+degraded+remapped
38 incomplete
1403 active+remapped
2 stale+active+degraded
1 stale+remapped+incomplete
746 active+degraded
11 stale+active+clean
3 remapped+incomplete


Here is my ceph.conf http://pastebin.com/KZdgPJm7 (debus osd , ms set )
I tried restarting all OSD services of node-13 , services came up after several attempts of “service ceph restart” http://pastebin.com/yMk86YHh
For Node : 14
All services are up

[***@storage0114-ib ~]# service ceph status
=== osd.142 ===
osd.142: running {"version":"0.80-475-g9e80c29"}
=== osd.36 ===
osd.36: running {"version":"0.80-475-g9e80c29"}
=== osd.83 ===
osd.83: running {"version":"0.80-475-g9e80c29"}
=== osd.107 ===
osd.107: running {"version":"0.80-475-g9e80c29"}
=== osd.47 ===
osd.47: running {"version":"0.80-475-g9e80c29"}
=== osd.130 ===
osd.130: running {"version":"0.80-475-g9e80c29"}
=== osd.155 ===
osd.155: running {"version":"0.80-475-g9e80c29"}
=== osd.60 ===
osd.60: running {"version":"0.80-475-g9e80c29"}
=== osd.118 ===
osd.118: running {"version":"0.80-475-g9e80c29"}
=== osd.98 ===
osd.98: running {"version":"0.80-475-g9e80c29"}
=== osd.70 ===
osd.70: running {"version":"0.80-475-g9e80c29"}
=== mon.storage0114-ib ===
mon.storage0114-ib: running {"version":"0.80-475-g9e80c29"}
[***@storage0114-ib ~]#

— But ceph osd tree says , osd.118 is down

-10 29.93 host storage0114-ib
36 2.63 osd.36 up 1
47 2.73 osd.47 up 1
60 2.73 osd.60 up 1
70 2.73 osd.70 up 1
83 2.73 osd.83 up 1
98 2.73 osd.98 up 1
107 2.73 osd.107 up 1
118 2.73 osd.118 down 1
130 2.73 osd.130 up 1
142 2.73 osd.142 up 1
155 2.73 osd.155 up 1

— I restarted osd.118 service and it was successful , But still its showing as down in ceph osd tree . I waited for 30 minutes to get it stable but still not showing UP in ceph osd tree.
Moreover its generating HUGE logs http://pastebin.com/mDYnjAni



The problem now is if i manually visit every host and check for “service ceph status “ all services are running on all 15 hosts. But this is not getting reflected to ceph osd tree and ceph -s and they continue to show as OSD DOWN.

My irc id is ksingh , let me know by email once you are available on IRC (my time zone is Finland +2)



- Karan Singh -
Post by Sage Weil
Post by Karan Singh
Hello Cephers , need your suggestion for troubleshooting.
My cluster is terribly struggling , 70+ osd are down out of 165
Problem ?>OSD are getting marked out of cluster and are down. The cluster is
degraded. On checking logs of failed OSD we are getting wired entries that
are continuously getting generated.
Tracking this at http://tracker.ceph.com/issues/8387
The most recent bits you posted in the ticket don't quite make sense: the
OSD is trying to connect to an address for an OSD that is currently marked
down. I suspect this is just timing between when the logs were captured
and when teh ceph osd dump was captured. To get a complete pictures,
1) add
debug osd = 20
debug ms = 1
in [osd] and restart all osds
2) ceph osd set nodown
(to prevent flapping)
3) find some OSD that is showing these messages
4) capture a 'ceph osd dump' output.
Also happy to debug this interactively over IRC; that will likely be
faster!
Thanks-
sage
Post by Karan Singh
Osd Debug logs :: http://pastebin.com/agTKh6zB
1. 2014-05-20 10:19:03.699886 7f2328e237a0 0 osd.158 357532 done with
init, starting boot process
2. 2014-05-20 10:19:03.700093 7f22ff621700 0 -- 192.168.1.112:6802/3807 >>
192.168.1.109:6802/910005982 pipe(0x8698500 sd=35 :33500 s=1 pgs=0 cs=0
l=0 c=0x83018c0).connect claims to be 192.168.1.109:6802/63896 not
192.168.1.109:6802/910005982 - wrong node!
3. 2014-05-20 10:19:03.700152 7f22ff621700 0 -- 192.168.1.112:6802/3807 >>
192.168.1.109:6802/910005982 pipe(0x8698500 sd=35 :33500 s=1 pgs=0 cs=0
l=0 c=0x83018c0).fault with nothing to send, going to standby
4. 2014-05-20 10:19:09.551269 7f22fdd12700 0 -- 192.168.1.112:6802/3807 >>
192.168.1.109:6803/1176009454 pipe(0x56aee00 sd=53 :40060 s=1 pgs=0 cs=0
l=0 c=0x533fd20).connect claims to be 192.168.1.109:6803/63896 not
192.168.1.109:6803/1176009454 - wrong node!
5. 2014-05-20 10:19:09.551347 7f22fdd12700 0 -- 192.168.1.112:6802/3807 >>
192.168.1.109:6803/1176009454 pipe(0x56aee00 sd=53 :40060 s=1 pgs=0 cs=0
l=0 c=0x533fd20).fault with nothing to send, going to standby
6. 2014-05-20 10:19:09.703901 7f22fd80d700 0 -- 192.168.1.112:6802/3807 >>
192.168.1.113:6802/13870 pipe(0x56adf00 sd=137 :42889 s=1 pgs=0 cs=0 l=0
c=0x8302aa0).connect claims to be 192.168.1.113:6802/24612 not
192.168.1.113:6802/13870 - wrong node!
7. 2014-05-20 10:19:09.704039 7f22fd80d700 0 -- 192.168.1.112:6802/3807 >>
192.168.1.113:6802/13870 pipe(0x56adf00 sd=137 :42889 s=1 pgs=0 cs=0 l=0
c=0x8302aa0).fault with nothing to send, going to standby
8. 2014-05-20 10:19:10.243139 7f22fd005700 0 -- 192.168.1.112:6802/3807 >>
192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0
c=0x8304780).connect claims to be 192.168.1.112:6800/2852 not
192.168.1.112:6800/14114 - wrong node!
9. 2014-05-20 10:19:10.243190 7f22fd005700 0 -- 192.168.1.112:6802/3807 >>
192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0
c=0x8304780).fault with nothing to send, going to standby
10. 2014-05-20 10:19:10.349693 7f22fc7fd700 0 -- 192.168.1.112:6802/3807 >>
192.168.1.109:6800/13492 pipe(0x8698c80 sd=156 :0 s=1 pgs=0 cs=0 l=0
c=0x83070c0).fault with nothing to send, going to standby
1. ceph -v
ceph version 0.80-469-g991f7f1
(991f7f15a6e107b33a24bbef1169f21eb7fcce2c) #
1. ceph osd stat
osdmap e357073: 165 osds: 91 up, 165 in
flags noout #
1. Restarting the problematic OSDs , but no luck
2. i restarted entire host but no luck, still osds are down and getting the same mesage
1. 2014-05-20 10:19:10.243139 7f22fd005700 0 -- 192.168.1.112:6802/3807 >>
192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0
c=0x8304780).connect claims to be 192.168.1.112:6800/2852 not
192.168.1.112:6800/14114 - wrong node!
2. 2014-05-20 10:19:10.243190 7f22fd005700 0 -- 192.168.1.112:6802/3807 >>
192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0
c=0x8304780).fault with nothing to send, going to standby
3. 2014-05-20 10:19:10.349693 7f22fc7fd700 0 -- 192.168.1.112:6802/3807 >>
192.168.1.109:6800/13492 pipe(0x8698c80 sd=156 :0 s=1 pgs=0 cs=0 l=0
c=0x83070c0).fault with nothing to send, going to standby
4. 2014-05-20 10:22:23.312473 7f2307e61700 0 osd.158 357781 do_command r=0
5. 2014-05-20 10:22:23.326110 7f2307e61700 0 osd.158 357781 do_command r=0
debug_osd=0/5
6. 2014-05-20 10:22:23.326123 7f2307e61700 0 log [INF] : debug_osd=0/5
7. 2014-05-20 10:34:08.161864 7f230224d700 0 -- 192.168.1.112:6802/3807 >>
192.168.1.102:6808/13276 pipe(0x8698280 sd=22 :41078 s=2 pgs=603 cs=1
l=0 c=0x8301600).fault with nothing to send, going to standby
3. Disks do not have errors , no message in dmesg and /var/log/messages
4. there was a bug in the past http://tracker.ceph.com/issues/4006 , dont
know it again came bacin in Firefly
5. Recently no activity performed on cluster , except some pool and keys
creation for cinder /glance integration
6. Nodes have enough free resources for osds.
7. No issues with network , osds are down on all cluster nodes. not from a single node.
****************************************************************
Karan Singh
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/
****************************************************************
Craig Lewis
2014-05-22 01:34:21 UTC
Permalink
Post by Sage Weil
Post by Karan Singh
Hello Cephers , need your suggestion for troubleshooting.
My cluster is terribly struggling , 70+ osd are down out of 165
Problem ?>OSD are getting marked out of cluster and are down. The cluster is
degraded. On checking logs of failed OSD we are getting wired entries that
are continuously getting generated.
Also happy to debug this interactively over IRC; that will likely be
faster!
Thanks-
sage
If you do this over IRC, can you please post a summary to the mailling
list?

I believe I'm having this issue as well.
--
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email clewis-***@public.gmane.org <mailto:clewis-***@public.gmane.org>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
Sage Weil
2014-05-22 04:15:55 UTC
Permalink
Post by Craig Lewis
If you do this over IRC, can you please post a summary to the mailling
list? 
I believe I'm having this issue as well.
In the other case, we found that some of the OSDs were behind processing
maps (by several thousand epochs). The trick here to give them a chance
to catch up is

ceph osd set noup
ceph osd set nodown
ceph osd set noout

and wait for them to stop spinning on the CPU. You can check which map
each OSD is on with

ceph daemon osd.NNN status

to see which epoch they are on and compare that to

ceph osd stat

Once they are within 100 or less epochs,

ceph osd unset noup

and let them all start up.

We haven't determined whether the original problem was caused by this or
the other way around; we'll see once they are all caught up.

sage
Craig Lewis
2014-05-22 07:26:52 UTC
Permalink
Post by Sage Weil
Post by Craig Lewis
If you do this over IRC, can you please post a summary to the mailling
list?
I believe I'm having this issue as well.
In the other case, we found that some of the OSDs were behind processing
maps (by several thousand epochs). The trick here to give them a chance
to catch up is
ceph osd set noup
ceph osd set nodown
ceph osd set noout
and wait for them to stop spinning on the CPU. You can check which map
each OSD is on with
ceph daemon osd.NNN status
to see which epoch they are on and compare that to
ceph osd stat
Once they are within 100 or less epochs,
ceph osd unset noup
and let them all start up.
We haven't determined whether the original problem was caused by this or
the other way around; we'll see once they are all caught up.
sage
I was seeing the CPU spinning too, so I think it is the same issue.
Thanks for the explanation! I've been pulling my hair out for weeks.


I can give you a data point for the "how". My problems started with a
kswapd problem on 12.04.04 (kernel 3.5.0-46-generic
#70~precise1-Ubuntu). kswapd was consuming 100% CPU, and it was
blocking the ceph-osd processes. Once I prevented kswapd from doing
that, my OSDs couldn't recover. noout and nodown didn't help; the OSDs
would suicide and restart.


Upgrading to Ubuntu 14.04 seems to have helped. The cluster isn't all
clear yet, but it's getting better. The cluster is finally healthy
after 2 weeks of incomplete and stale. It's still unresponsive, but
it's making progress. I am still seeing OSD's consuming 100% CPU, but
only the OSDs that are actively deep-scrubing. Once the deep-scrub
finishes, the OSD starts behaving again. They seem to be slowly getting
better, which matches up with your explanation.


I'll go ahead at set noup. I don't think it's necessary at this point,
but it's not going to hurt.

I'm running Emperor, and looks like osd status isn't supported. Not a
big deal though. Deep-scrub has made it through half of the PGs in the
last 36 hours, so I'll just watch for another day or two. This is a
slave cluster, so I have that luxury.
--
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email clewis-***@public.gmane.org <mailto:clewis-***@public.gmane.org>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
Continue reading on narkive:
Loading...