oracle集群单节点重启后有问题

zhangyudong1987 发表于 2013-09-29 16:42

环境：solaris11.1+IPMP+11gr2 RAC

第二个节点重启后，资源都切换到第一个节点，这是正常的；但是第二个节重启完成后，第二个节点的资源无法正常启动，手动启动也无法启起来
root@csustdb1:/oracle/app/grid_home/bin# ./crs_stat -t -v
Name       Type       R/RA F/FT Target State Host
----------------------------------------------------------------------
ora.DATA.dg ora....up.type 0/5 0/ ONLINE ONLINE csustdb1
ora....ER.lsnr ora....er.type 0/5 0/ ONLINE ONLINE csustdb1
ora....N1.lsnr ora....er.type 0/5 0/0 ONLINE ONLINE csustdb1
ora.OCR.dg ora....up.type 0/5 0/ ONLINE ONLINE csustdb1
ora.asm    ora.asm.type 0/5 0/ ONLINE ONLINE csustdb1
ora....SM1.asm application 0/5 0/0 ONLINE ONLINE csustdb1
ora....B1.lsnr application 0/5 0/0 ONLINE ONLINE csustdb1
ora....db1.gsd application 0/5 0/0 OFFLINE OFFLINE
ora....db1.ons application 0/3 0/0 ONLINE ONLINE csustdb1
ora....db1.vip ora....t1.type 0/0 0/0 ONLINE ONLINE csustdb1
ora....db2.vip ora....t1.type 0/0 0/0 ONLINE ONLINE csustdb1
ora.cvu    ora.cvu.type 0/5 0/0 ONLINE ONLINE csustdb1
ora.gsd    ora.gsd.type 0/5 0/ OFFLINE OFFLINE
ora....network ora....rk.type 2/5 0/ ONLINE ONLINE csustdb1
ora.oc4j    ora.oc4j.type0/1 0/2 ONLINE ONLINE csustdb1
ora.ons    ora.ons.type 0/3 0/ ONLINE ONLINE csustdb1
ora.scan1.vipora....ip.type 0/0 0/0 ONLINE ONLINE csustdb1

root@csustdb2:/oracle/app/grid_home/bin# ./crsctl start cluster
CRS-2672: Attempting to start 'ora.cssd' on 'csustdb2'
CRS-2672: Attempting to start 'ora.diskmon' on 'csustdb2'
CRS-2676: Start of 'ora.diskmon' on 'csustdb2' succeeded

CRS-2674: Start of 'ora.cssd' on 'csustdb2' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'csustdb2'
CRS-2681: Clean of 'ora.cssd' on 'csustdb2' succeeded
CRS-5804: Communication error with agent process
CRS-2672: Attempting to start 'ora.cssd' on 'csustdb2'
CRS-2672: Attempting to start 'ora.diskmon' on 'csustdb2'
CRS-2676: Start of 'ora.diskmon' on 'csustdb2' succeeded
CRS-2674: Start of 'ora.cssd' on 'csustdb2' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'csustdb2'
CRS-2681: Clean of 'ora.cssd' on 'csustdb2' succeeded
CRS-5804: Communication error with agent process
CRS-4000: Command Start failed, or completed with errors.
第二个节点无法正常启动资源；
两节点重启就正常了

fflixiang 发表于 2013-10-07 16:04

查看ohasd.bin和cssdagent进程是否启动，如果没用启动，执行crsctl start has启动。不行的话，可以查看一下ohasd 的日志：<GRID_HOME>/log/<host>/agent/ohasd

flutter 发表于 2013-10-09 15:10

肯定是HA的资源控制有点问题，某个节点2重启后，该节点2的资源会自动漂移到另一节点1，该节点2启动正常后，资源会自动漂移回来，此时手工是无法启动那些资源的，因为资源已经被占用且启动。

若无法漂移回来，估计是HA通讯有故障，节点1认为节点2已经宕机，需要2个节点同时启动。

www_xylove 发表于 2013-10-10 16:38

重启服务器

zhangyudong1987 发表于 2013-10-11 10:52

回复 3# flutter

两节点同时重启确实可以解决这个问题，但问题是如果每次都需要两节点同时重启，那这个RAC做的就没什么意义了哇

zhangyudong1987 发表于 2013-10-11 10:53

回复 4# www_xylove

两机器同时重启才行，只重启一台不行

www_xylove 发表于 2013-10-11 22:57

回复 5# zhangyudong1987

这只是解决问题的一种方法而已。

duolanshizhe 发表于 2013-10-14 10:55

对于双节点的rac来说，如果其中一个节点重启，它相关的资源会自动转移到另外一个节点上，待该节点重启启动完毕之后，应该会自动将属于自己的资源接管过来，比如VIP等，如果出现无法自动接管的现象，请仔细查看相关日志文件进行分析，定位，此时正常情况下是不需要手工干预的！

flutter 发表于 2013-10-14 12:54

回复 5# zhangyudong1987

正常情况下当然木有问题，如果每次都有问题，那只能说明你的RAC集群软硬件环境有问题了，需要检查、定位、解决

页: [1]

Chinaunix's Archiver

oracle集群单节点重启后有问题