免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 7410 | 回复: 3
打印 上一主题 下一主题

[mooseFS] MooseFS 崩盘,master起来没有问题,chunkserver起不来了,该怎么办 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2011-06-24 16:01 |只看该作者 |倒序浏览
1台 master
1台 metaloger
4台chunkserver

状况:
  1台早出故障了,所以已经替换了,但已经上线4天左右
  4台同时出现磁盘故障,紧急恢复,
  停止了所有服务( 忘了关闭 chunkserver1,3,隔一天后关闭的),关闭了master,metalog
  
  =============================================
现在修复了所有故障的机器,恢复了磁盘,2号机器恢复有点问题,故暂时搁置。

  现在4号成功恢复 状态 no errors
  1,3 同时 状态demaged,怎么都无法恢复进入MFS,

  =============================================

master的启动到关闭的日志
  1. Jun 24 15:10:14 MDS mfsmaster[24224]: chunk 0000000000031806 has only invalid copies (1) - please repair it manually
  2. Jun 24 15:10:14 MDS mfsmaster[24224]: chunk 0000000000031806_00000001 - invalid copy on (172.21.93.205 - ver:00000000)
  3. Jun 24 15:11:00 MDS mfsmaster[24224]: chunkservers status:
  4. Jun 24 15:11:00 MDS mfsmaster[24224]: server 1 (ip: 172.21.93.205, port: 9422): usedspace: 1729372667904 (1610.60 GiB), totalspace: 13479880687616 (12554.12 GiB), usage: 12.83%
  5. Jun 24 15:11:00 MDS mfsmaster[24224]: total: usedspace: 1729372667904 (1610.60 GiB), totalspace: 13479880687616 (12554.12 GiB), usage: 12.83%
  6. Jun 24 15:11:00 MDS mfsmaster[24224]: no meta loggers connected !!!
  7. Jun 24 15:12:00 MDS mfsmaster[24224]: chunkservers status:
  8. Jun 24 15:12:00 MDS mfsmaster[24224]: server 1 (ip: 172.21.93.205, port: 9422): usedspace: 1729372667904 (1610.60 GiB), totalspace: 13479880687616 (12554.12 GiB), usage: 12.83%
  9. Jun 24 15:12:00 MDS mfsmaster[24224]: total: usedspace: 1729372667904 (1610.60 GiB), totalspace: 13479880687616 (12554.12 GiB), usage: 12.83%
  10. Jun 24 15:12:00 MDS mfsmaster[24224]: no meta loggers connected !!!
  11. Jun 24 15:12:01 MDS monitor: P2 OR P4 ERROR to GET
  12. Jun 24 15:12:50 MDS xinetd[12778]: START: nrpe pid=24326 from=172.21.90.9
  13. Jun 24 15:12:50 MDS xinetd[12778]: EXIT: nrpe status=0 pid=24326 duration=0(sec)
  14. Jun 24 15:13:00 MDS mfsmaster[24224]: chunkservers status:
  15. Jun 24 15:13:00 MDS mfsmaster[24224]: server 1 (ip: 172.21.93.205, port: 9422): usedspace: 1729372667904 (1610.60 GiB), totalspace: 13479880687616 (12554.12 GiB), usage: 12.83%
  16. Jun 24 15:13:00 MDS mfsmaster[24224]: total: usedspace: 1729372667904 (1610.60 GiB), totalspace: 13479880687616 (12554.12 GiB), usage: 12.83%
  17. Jun 24 15:13:00 MDS mfsmaster[24224]: no meta loggers connected !!!
  18. Jun 24 15:13:27 MDS xinetd[12778]: START: nrpe pid=24331 from=172.21.90.9
  19. Jun 24 15:13:27 MDS xinetd[12778]: EXIT: nrpe status=0 pid=24331 duration=0(sec)
  20. Jun 24 15:13:50 MDS mfsmaster[24224]: chunkserver register begin (packet version: 5) - ip: 172.21.93.206, port: 9422
  21. Jun 24 15:13:50 MDS mfsmaster[24224]: chunkserver register end (packet version: 5) - ip: 172.21.93.206, port: 9422, usedspace: 6661750644736 (6204.24 GiB), totalspace: 14975217172480 (13946.76 GiB)
  22. Jun 24 15:13:53 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 000000000018813F replication status: 29
  23. Jun 24 15:13:54 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 00000000001881BA replication status: 22
  24. Jun 24 15:13:57 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 00000000000D832B replication status: 29
  25. Jun 24 15:13:58 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 00000000001583A6 replication status: 13
  26. Jun 24 15:13:58 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 00000000000382B0 replication status: 13
  27. Jun 24 15:14:00 MDS mfsmaster[24224]: chunkservers status:
  28. Jun 24 15:14:00 MDS mfsmaster[24224]: server 1 (ip: 172.21.93.206, port: 9422): usedspace: 6661756018688 (6204.24 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  29. Jun 24 15:14:00 MDS mfsmaster[24224]: total: usedspace: 6661756018688 (6204.24 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  30. Jun 24 15:14:00 MDS mfsmaster[24224]: no meta loggers connected !!!
  31. Jun 24 15:14:00 MDS mfsmaster[24224]: (172.21.93.205:9422) chunk: 0000000000048235 replication status: 13
  32. Jun 24 15:14:00 MDS mfsmaster[24224]: (172.21.93.205:9422) chunk: 00000000000282B0 replication status: 13
  33. Jun 24 15:14:01 MDS monitor: P2 OR P4 ERROR to GET
  34. Jun 24 15:14:50 MDS xinetd[12778]: START: nrpe pid=24347 from=172.21.90.9
  35. Jun 24 15:14:50 MDS xinetd[12778]: EXIT: nrpe status=0 pid=24347 duration=0(sec)
  36. Jun 24 15:15:00 MDS mfsmaster[24224]: chunkservers status:
  37. Jun 24 15:15:00 MDS mfsmaster[24224]: server 1 (ip: 172.21.93.206, port: 9422): usedspace: 6661756018688 (6204.24 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  38. Jun 24 15:15:00 MDS mfsmaster[24224]: total: usedspace: 6661756018688 (6204.24 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  39. Jun 24 15:15:00 MDS mfsmaster[24224]: no meta loggers connected !!!
  40. Jun 24 15:16:00 MDS mfsmaster[24224]: chunkservers status:
  41. Jun 24 15:16:00 MDS mfsmaster[24224]: server 1 (ip: 172.21.93.206, port: 9422): usedspace: 6661755617280 (6204.24 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  42. Jun 24 15:16:00 MDS mfsmaster[24224]: total: usedspace: 6661755617280 (6204.24 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  43. Jun 24 15:16:00 MDS mfsmaster[24224]: no meta loggers connected !!!
  44. Jun 24 15:16:01 MDS monitor: P2 OR P4 ERROR to GET
  45. Jun 24 15:16:50 MDS xinetd[12778]: START: nrpe pid=24366 from=172.21.90.9
  46. Jun 24 15:16:50 MDS xinetd[12778]: EXIT: nrpe status=0 pid=24366 duration=0(sec)
  47. Jun 24 15:17:00 MDS mfsmaster[24224]: chunkservers status:
  48. Jun 24 15:17:00 MDS mfsmaster[24224]: server 1 (ip: 172.21.93.206, port: 9422): usedspace: 6661755617280 (6204.24 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  49. Jun 24 15:17:00 MDS mfsmaster[24224]: total: usedspace: 6661755617280 (6204.24 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  50. Jun 24 15:17:00 MDS mfsmaster[24224]: no meta loggers connected !!!
  51. Jun 24 15:17:17 MDS mfsmaster[24224]: chunkserver register begin (packet version: 5) - ip: 172.21.93.203, port: 9422
  52. Jun 24 15:17:17 MDS mfsmaster[24224]: chunkserver register end (packet version: 5) - ip: 172.21.93.203, port: 9422, usedspace: 6985866539008 (6506.10 GiB), totalspace: 14975217172480 (13946.76 GiB)
  53. Jun 24 15:17:18 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 00000000000CE3BE replication status: 22
  54. Jun 24 15:17:19 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 000000000017E439 replication status: 22
  55. Jun 24 15:17:20 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 000000000015E4B4 replication status: 29
  56. Jun 24 15:17:26 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 000000000009E52F replication status: 28
  57. Jun 24 15:17:26 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 00000000000FE52F replication status: 28
  58. Jun 24 15:17:32 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 000000000015E811 replication status: 28
  59. Jun 24 15:17:32 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 000000000019E811 replication status: 28
  60. Jun 24 15:17:34 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 000000000001EA78 replication status: 13
  61. Jun 24 15:17:34 MDS mfsmaster[24224]: (172.21.93.206:9422) chunk: 000000000015EAF3 replication status: 13
  62. Jun 24 15:17:35 MDS mfsmaster[24224]: (172.21.93.203:9422) chunk: 00000000000DE52F replication status: 21
  63. Jun 24 15:17:35 MDS mfsmaster[24224]: (172.21.93.203:9422) chunk: 00000000000AE52F replication status: 21
  64. Jun 24 15:18:00 MDS mfsmaster[24224]: chunkservers status:
  65. Jun 24 15:18:00 MDS mfsmaster[24224]: server 1 (ip: 172.21.93.206, port: 9422): usedspace: 6661758066688 (6204.25 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  66. Jun 24 15:18:00 MDS mfsmaster[24224]: total: usedspace: 6661758066688 (6204.25 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  67. Jun 24 15:18:00 MDS mfsmaster[24224]: no meta loggers connected !!!
  68. Jun 24 15:18:01 MDS monitor: P2 OR P4 ERROR to GET
  69. Jun 24 15:18:27 MDS xinetd[12778]: START: nrpe pid=24384 from=172.21.90.9
  70. Jun 24 15:18:27 MDS xinetd[12778]: EXIT: nrpe status=0 pid=24384 duration=0(sec)
  71. Jun 24 15:19:00 MDS mfsmaster[24224]: chunkservers status:
  72. Jun 24 15:19:00 MDS mfsmaster[24224]: server 1 (ip: 172.21.93.206, port: 9422): usedspace: 6661758066688 (6204.25 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  73. Jun 24 15:19:00 MDS mfsmaster[24224]: total: usedspace: 6661758066688 (6204.25 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  74. Jun 24 15:19:00 MDS mfsmaster[24224]: no meta loggers connected !!!
  75. Jun 24 15:19:46 MDS xinetd[12778]: START: rsync pid=24391 from=172.21.93.202
  76. Jun 24 15:19:46 MDS xinetd[12778]: EXIT: rsync status=255 pid=24391 duration=0(sec)
  77. Jun 24 15:19:50 MDS xinetd[12778]: START: nrpe pid=24392 from=172.21.90.9
  78. Jun 24 15:19:50 MDS xinetd[12778]: EXIT: nrpe status=0 pid=24392 duration=0(sec)
  79. Jun 24 15:20:00 MDS mfsmaster[24224]: chunkservers status:
  80. Jun 24 15:20:00 MDS mfsmaster[24224]: server 1 (ip: 172.21.93.206, port: 9422): usedspace: 6661758066688 (6204.25 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  81. Jun 24 15:20:00 MDS mfsmaster[24224]: total: usedspace: 6661758066688 (6204.25 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  82. Jun 24 15:20:00 MDS mfsmaster[24224]: no meta loggers connected !!!
  83. Jun 24 15:20:01 MDS monitor: P2 OR P4 ERROR to GET
  84. Jun 24 15:20:50 MDS xinetd[12778]: START: nrpe pid=24409 from=172.21.90.9
  85. Jun 24 15:20:50 MDS xinetd[12778]: EXIT: nrpe status=0 pid=24409 duration=0(sec)
  86. Jun 24 15:21:00 MDS mfsmaster[24224]: chunkservers status:
  87. Jun 24 15:21:00 MDS mfsmaster[24224]: server 1 (ip: 172.21.93.206, port: 9422): usedspace: 6661758066688 (6204.25 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  88. Jun 24 15:21:00 MDS mfsmaster[24224]: total: usedspace: 6661758066688 (6204.25 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  89. Jun 24 15:21:00 MDS mfsmaster[24224]: no meta loggers connected !!!
  90. Jun 24 15:21:45 MDS mfsmaster[24224]: connection with CS(172.21.93.203) has been closed by peer
  91. Jun 24 15:21:45 MDS mfsmaster[24224]: chunkserver disconnected - ip: 172.21.93.203, port: 9422, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
  92. Jun 24 15:22:00 MDS mfsmaster[24224]: chunkservers status:
  93. Jun 24 15:22:00 MDS mfsmaster[24224]: server 1 (ip: 172.21.93.206, port: 9422): usedspace: 6661758066688 (6204.25 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  94. Jun 24 15:22:00 MDS mfsmaster[24224]: total: usedspace: 6661758066688 (6204.25 GiB), totalspace: 14975217172480 (13946.76 GiB), usage: 44.49%
  95. Jun 24 15:22:00 MDS mfsmaster[24224]: no meta loggers connected !!!
  96. Jun 24 15:22:00 MDS mfsmaster[24224]: connection with CS(172.21.93.206) has been closed by peer
  97. Jun 24 15:22:00 MDS mfsmaster[24224]: chunkserver disconnected - ip: 172.21.93.206, port: 9422, usedspace: 6661758066688 (6204.25 GiB), totalspace: 14975217172480 (13946.76 GiB)
  98. Jun 24 15:22:01 MDS monitor: P2 OR P4 ERROR to GET
  99. Jun 24 15:22:18 MDS mfsmaster[24224]: connection with CS(172.21.93.205) has been closed by peer
  100. Jun 24 15:22:18 MDS mfsmaster[24224]: chunkserver disconnected - ip: 172.21.93.205, port: 9422, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB)
复制代码
多谢各位了,

论坛徽章:
0
2 [报告]
发表于 2011-06-24 17:15 |只看该作者
版本 是mfs-1.6.20-2     
仔细看过了之后,发现是因为修复时候出现了坏块,所以导致数据文件丢失了,我不求全部保存出来,只求尽可能多的保存出一部分数据,该怎么做?

论坛徽章:
0
3 [报告]
发表于 2011-06-25 18:51 |只看该作者
哎,看来这么恢复的还真的不多啊,一般情况下,看到master崩溃的恢复的很多,可是如果chunkserver崩溃了,其实也是非常危险的,特别是一些像我一样的备份数又少,资料有多,硬件故障率又高的,实在需要一套非常稳定安全的存储才行啊。

论坛徽章:
0
4 [报告]
发表于 2011-06-27 14:58 |只看该作者
唉,实在没有办法了,我都修改了源代码才解决了部分问题,这一块好像真的不如hadoop好,hadoop进入safe_mode之后,还是比较安全的,可以不用担心被写,但是这个好像控制不住被写,这样对于磁盘脆弱的节点来说,真的是无异于雪上加霜,
强烈建议moosefs 修改一些机制,创建一个只读安全恢复模式,以提供给紧急用户使用。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP