Jan 16 03:33:21 mds kernel: [ 249.964236] LDISKFS-fs: mounted filesystem with ordered data mode.
Jan 16 03:33:21 mds kernel: [ 250.109722] Lustre: MGS MGS started
Jan 16 03:33:21 mds kernel: [ 250.282758] Lustre: Enabling user_xattr
Jan 16 03:33:21 mds kernel: [ 250.286323] Lustre: 4236:0mds_fs.c:460:mds_init_server_data()) RECOVERY: service testfs-MDT0000, 1 recoverable clients, last_transno 335
Jan 16 03:33:21 mds kernel: [ 250.327847] Lustre: MDT testfs-MDT0000 now serving dev (testfs-MDT0000/758d56e2-979c-b8cc-4ccf-5edb0bcd88af), but will be in recovery for at least 5:00, or until 1 client reconnect. During this time new clients will not be allowed to connect. Recovery progress can be monitored by watching /proc/fs/lustre/mds/testfs-MDT0000/recovery_status.
Jan 16 03:33:21 mds kernel: [ 250.327884] Lustre: 4236:0lproc_mds.c:262:lprocfs_wr_group_upcall()) testfs-MDT0000: group upcall set to /usr/sbin/l_getgroups
Jan 16 03:33:21 mds kernel: [ 250.327891] Lustre: testfs-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups
Jan 16 03:33:21 mds kernel: [ 250.328247] Lustre: 4236:0mds_lov.c:1008:mds_notify()) MDS testfs-MDT0000: in recovery, not resetting orphans on testfs-OST0000_UUID
Jan 16 03:33:26 mds kernel: [ 255.323684] Lustre: Request x7 sent from testfs-OST0000-osc to NID 192.168.6.22@tcp 5s ago has timed out (limit 5s).
Jan 16 03:33:26 mds kernel: [ 255.325477] Lustre: cmd=cf00d 0:testfs-mdtlov 1:testfs-OST0000_UUID 2:0 3:1
Jan 16 03:33:26 mds kernel: [ 255.331677] Lustre: Failing over testfs-MDT0000
Jan 16 03:33:26 mds kernel: [ 255.331703] Lustre: *** setting obd testfs-MDT0000 device 'unknown-block(147,0)' read-only ***
Jan 16 03:33:27 mds kernel: [ 255.360260] Turning device drbd0 (0x9300000) read-only
Jan 16 03:33:27 mds kernel: [ 255.360370] Lustre: Failing over testfs-mdtlov
Jan 16 03:33:27 mds kernel: [ 255.362071] Lustre: testfs-MDT0000: shutting down for failover; client state will be preserved.
Jan 16 03:33:27 mds kernel: [ 255.362254] Lustre: MDT testfs-MDT0000 has stopped.
Jan 16 03:33:27 mds kernel: [ 255.393366] Lustre: MGS has stopped.作者: yftty 时间: 2009-01-17 20:49
看起来是clinet死机,导致mds数据recovery失败,系统发现异常,进入read-only状态,客户端的状态倒是一直保存着。
需要作failover操作。