- 论坛徽章:
- 0
|
本帖最后由 shitoryu 于 2012-01-20 01:06 编辑
参照网上的文档,配置RHCS,并添加了一个http的集群服务,资源包括虚拟IP,filesystem(gfs),httpd的启动脚本。
目前测试服务切换,当一个节点ha1.com的网络失效或者节点ha1.com关机,http服务都能切换到节点ha2.com上。
把ha1上的HBA卡的光纤线拔掉,用clustat查看集群状态,
显示如下:
Member Name ID Status
------ ---- ---- ------
ha1.com 1 Online, Local,rgmanager
ha2.com 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:redhat_http (ha2.com) recoverable
服务一直不能切换到ha2,直到ha1完全关机,则服务会切换到ha2.com上。
我的疑问是 http服务的资源中包括了filesystem,hba卡链路断掉,相当于不能访问文件系统,为什么切换没有发生?
ha1中的log如下:Jan 18 19:38:26 ha1 kernel: qla2xxx 0000:06:01.0: LIP reset occured (f7f7).
Jan 18 19:38:31 ha1 kernel: qla2xxx 0000:06:01.0: LOOP DOWN detected (2 e678 0).
Jan 18 19:38:42 ha1 kernel: rport-0:0-0: blocked FC remote port time out: saving binding
Jan 18 19:38:46 ha1 kernel: sd 0:0:0:1: SCSI error: return code = 0x00010000
Jan 18 19:38:46 ha1 kernel: end_request: I/O error, dev sda, sector 48386825
Jan 18 19:38:46 ha1 kernel: GFS2: fsid=new_cluster:gfs1.0: gfs2_quotad: statfs error -5
Jan 18 19:39:16 ha1 kernel: sd 0:0:0:1: SCSI error: return code = 0x00010000
Jan 18 19:39:16 ha1 kernel: end_request: I/O error, dev sda, sector 48386825
Jan 18 19:39:16 ha1 kernel: GFS2: fsid=new_cluster:gfs1.0: gfs2_quotad: statfs error -5
Jan 18 19:39:46 ha1 kernel: sd 0:0:0:1: SCSI error: return code = 0x00010000
Jan 18 19:39:46 ha1 kernel: end_request: I/O error, dev sda, sector 47443065
Jan 18 19:39:46 ha1 kernel: Buffer I/O error on device sda9, logical block 14439
Jan 18 19:39:46 ha1 kernel: lost page write due to I/O error on sda9
Jan 18 19:39:46 ha1 kernel: Buffer I/O error on device sda9, logical block 14440
Jan 18 19:39:46 ha1 kernel: lost page write due to I/O error on sda9
Jan 18 19:39:46 ha1 kernel: Buffer I/O error on device sda9, logical block 14441
Jan 18 19:39:46 ha1 kernel: lost page write due to I/O error on sda9
Jan 18 19:39:46 ha1 kernel: Buffer I/O error on device sda9, logical block 14442
Jan 18 19:39:46 ha1 kernel: lost page write due to I/O error on sda9
Jan 18 19:39:46 ha1 kernel: Buffer I/O error on device sda9, logical block 14443
Jan 18 19:39:46 ha1 kernel: lost page write due to I/O error on sda9
Jan 18 19:39:46 ha1 kernel: sd 0:0:0:1: SCSI error: return code = 0x00010000
Jan 18 19:39:46 ha1 kernel: end_request: I/O error, dev sda, sector 47443105
Jan 18 19:39:46 ha1 kernel: Buffer I/O error on device sda9, logical block 14444
Jan 18 19:39:46 ha1 kernel: lost page write due to I/O error on sda9
Jan 18 19:39:46 ha1 kernel: GFS2: fsid=new_cluster:gfs1.0: fatal: I/O error
Jan 18 19:39:46 ha1 kernel: GFS2: fsid=new_cluster:gfs1.0: block = 14444
Jan 18 19:39:46 ha1 kernel: GFS2: fsid=new_cluster:gfs1.0: function = log_write_header, file = fs/gfs2/log.c, line = 622
Jan 18 19:39:46 ha1 kernel: GFS2: fsid=new_cluster:gfs1.0: about to withdraw this file system
Jan 18 19:39:46 ha1 kernel: GFS2: fsid=new_cluster:gfs1.0: telling LM to withdraw
Jan 18 19:40:09 ha1 clurgmgrd: [2984]: <err> /share_fs is not a directory
Jan 18 19:40:09 ha1 clurgmgrd[2984]: <notice> status on clusterfs "share_fs" returned 1 (generic error)
Jan 18 19:40:09 ha1 clurgmgrd[2984]: <notice> Stopping service service:redhat_http
Jan 18 19:40:10 ha1 avahi-daemon[2480]: Withdrawing address record for 192.168.101.15 on eth0.
Jan 18 19:42:41 ha1 kernel: qla2xxx 0000:06:01.0: Loop down - aborting ISP.
Jan 18 19:42:41 ha1 kernel: qla2xxx 0000:06:01.0: Performing ISP error recovery - ha= f7c602e0.
Jan 18 19:43:01 ha1 kernel: qla2xxx 0000:06:01.0: Cable is unplugged...
谢谢!
|
|