- 论坛徽章:
- 0
|
两台HP560 + 磁盘阵列 双机集群
昨天突然出现两台服务器认不到阵列的问题,重启动服务器后一切又回复正常,message日志文件(一部分)如下记录:
Mar 5 04:02:22 localhost syslogd 1.4.1: restart.
Mar 5 04:48:51 localhost kernel: tg3: eth1: Link is down.
Mar 5 04:48:51 localhost clusvcmgrd[1523]: <crit> Invalid reply!
Mar 5 04:48:54 localhost kernel: tg3: eth1: Link is up at 1000 Mbps, full duplex.
Mar 5 04:48:54 localhost kernel: tg3: eth1: Flow control is off for TX and off for RX.
Mar 5 04:48:54 localhost kernel: scsi(0): LOOP DOWN detected.
Mar 5 04:48:56 localhost clusvcmgrd[1523]: <crit> Couldn't connect to member #0: Connection timed out
Mar 5 04:48:56 localhost clusvcmgrd[1523]: <err> Unable to obtain cluster lock: No locks available
Mar 5 04:49:02 localhost kernel: scsi(0): LOOP DEAD detected.
Mar 5 04:49:02 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 10000
Mar 5 04:49:02 localhost kernel: I/O error: dev 08:03, sector 9728
Mar 5 04:49:02 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 10000
Mar 5 04:49:02 localhost kernel: I/O error: dev 08:03, sector 9736
Mar 5 04:49:02 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 10000
Mar 5 04:49:02 localhost kernel: I/O error: dev 08:03, sector 9744
Mar 5 04:49:02 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 10000
Mar 5 04:49:02 localhost kernel: I/O error: dev 08:02, sector 289
Mar 5 04:49:04 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 10000
Mar 5 04:49:04 localhost kernel: I/O error: dev 08:03, sector 9752
Mar 5 04:49:04 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 10000
Mar 5 04:49:04 localhost kernel: I/O error: dev 08:03, sector 9760
Mar 5 04:49:04 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 10000
Mar 5 04:49:04 localhost kernel: I/O error: dev 08:03, sector 9768
Mar 5 04:49:06 localhost cluquorumd[1502]: <warning> --> Commencing STONITH <--
Mar 5 04:49:06 localhost cluquorumd[1502]: <warning> STONITH: Falsely claiming that 10.0.11.153 has been fenced
Mar 5 04:49:06 localhost cluquorumd[1502]: <crit> STONITH: Data integrity may be compromised!
Mar 5 04:52:19 localhost syslogd 1.4.1: restart.
Mar 5 04:52:19 localhost syslog: syslogd startup succeeded
Mar 5 04:52:19 localhost kernel: klogd 1.4.1, log source = /proc/kmsg started.
Mar 5 04:52:19 localhost kernel: Linux version 2.4.21-27.ELsmp (bhcompile@bugs.build.redhat.com) (gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-47)) #1 SMP Wed Dec 1 21:59:02 EST 2004
Mar 5 04:52:19 localhost kernel: BIOS-provided physical RAM map:
Mar 5 04:52:19 localhost kernel: BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
Mar 5 04:52:19 localhost kernel: BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
Mar 5 04:52:19 localhost kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
Mar 5 04:52:19 localhost kernel: BIOS-e820: 0000000000100000 - 000000007fffa000
请教各位,这个阵列丢失是什么原因造成的?谢谢! |
|