- 论坛徽章:
- 0
|
原帖由 jerrywjl 于 2008-8-13 23:14 发表 ![]()
这是一个RHEL3上的集群。如果两台机器交替重启,则证明有可能是两个节点在相互fence对方。
RHEL3集群我没有做过,所以不敢断言,不过我认为用直连线做心跳的方法肯定不行。因为一旦心跳断了没有接上就会相互fe ...
这个HA系统2006年用到现在,所有硬件设备没有动过。
突然出现这个问题,期间一直正常,应该和直连线没有关系的。
现在怀疑是HA软件的原因。手动一步一步执行启动程序没有问题,看日志,好像找不到程序启动脚本,不太明白,手工都没有问题的
----------------------------------
Aug 11 23:56:03 jifei1 syslogd 1.4.1: restart.
Aug 11 23:56:03 jifei1 syslog: syslogd startup succeeded
Aug 11 23:56:03 jifei1 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Aug 11 23:56:03 jifei1 kernel: Linux version 2.4.21-32.ELsmp (bhcompile@tweety.build.redhat.com) (gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-52)) #1 SMP Fri Apr 15 21:17:59 EDT 2005
Aug 11 23:56:03 jifei1 kernel: BIOS-provided physical RAM map:
Aug 11 23:56:03 jifei1 kernel: BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
Aug 11 23:56:03 jifei1 kernel: BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
Aug 11 23:56:03 jifei1 kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
Aug 11 23:56:03 jifei1 kernel: BIOS-e820: 0000000000100000 - 000000007fffa000 (usable)
Aug 11 23:56:03 jifei1 kernel: BIOS-e820: 000000007fffa000 - 0000000080000000 (ACPI data)
Aug 11 23:56:03 jifei1 kernel: BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
Aug 11 23:56:03 jifei1 kernel: BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
Aug 11 23:56:03 jifei1 kernel: BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved)
Aug 11 23:56:03 jifei1 kernel: 1151MB HIGHMEM available.
Aug 11 23:56:03 jifei1 syslog: klogd startup succeeded
Aug 11 23:56:03 jifei1 kernel: 896MB LOWMEM available.
Aug 11 23:56:03 jifei1 kernel: found SMP MP-table at 000f4fd0
Aug 11 23:56:03 jifei1 kernel: hm, page 000f4000 reserved twice.
Aug 11 23:56:03 jifei1 kernel: hm, page 000f5000 reserved twice.
Aug 11 23:56:03 jifei1 kernel: hm, page 000f2000 reserved twice.
/restart
Aug 11 23:59:38 jifei1 clusvcmgrd: [1378]: <err> service error: Cannot stop user script for wapftp
Aug 11 23:59:38 jifei1 clusvcmgrd: [1462]: <notice> service notice: Stopping service jiesuan ...
Aug 11 23:59:38 jifei1 clusvcmgrd: [1462]: <notice> service notice: Running user script '/home/tjjifei/cluster/jiesuan.sh stop'
Aug 11 23:59:38 jifei1 su(pam_unix)[1491]: session opened for user tjjifei by (uid=0)
Aug 11 23:59:38 jifei1 su(pam_unix)[1491]: session closed for user tjjifei
Aug 11 23:59:38 jifei1 clusvcmgrd: [1462]: <err> service error: User script '/home/tjjifei/cluster/jiesuan.sh stop' returned error 1
Aug 11 23:59:38 jifei1 clusvcmgrd: [1462]: <err> service error: -bash: line 1: cd: /home/tjjifei/jiesuanapp/jiesuan/: No such file or directory
Aug 11 23:59:38 jifei1 clusvcmgrd: [1462]: <err> service error: Cannot stop user script for jiesuan
Aug 11 23:59:38 jifei1 clusvcmgrd: [1544]: <notice> service notice: Stopping service jifee ...
Aug 11 23:59:38 jifei1 clusvcmgrd: [1544]: <notice> service notice: Running user script '/home/tjjifei/cluster/jifei.sh stop'
Aug 11 23:59:38 jifei1 su(pam_unix)[1572]: session opened for user tjjifei by (uid=0)
Aug 11 23:59:39 jifei1 su(pam_unix)[1572]: session closed for user tjjifei
Aug 11 23:59:39 jifei1 clusvcmgrd: [1544]: <err> service error: User script '/home/tjjifei/cluster/jifei.sh stop' returned error 1
Aug 11 23:59:39 jifei1 clusvcmgrd: [1544]: <err> service error: -bash: line 1: cd: /home/tjjifei/jifeiapp/jifee: No such file or directory
Aug 11 23:59:39 jifei1 clusvcmgrd: [1544]: <err> service error: Cannot stop user script for jifee
---------------------------------- |
|