幽冥公子 发表于 2011-09-16 16:28

suncluster3.2在solaris10u9上面资源组起不来请高手帮忙分析

本帖最后由 幽冥公子 于 2011-09-16 16:33 编辑

如题,小弟做rac在solarisU9上面,集群已经安装好,起来状态都正常。
可是在添加资源组的时候,报错,
# scstat -g

-- 资源组和资源 --

            组名称         资源
            ------         ----
资源:   rac-framework-rg rac_framework rac_udlm rac_svm


-- 资源组 --

            组名称         节点名称               状况         已暂停
            ------         --------               ----         ------
   组:   rac-framework-rg pxlhis1                  联机故障       否
   组:   rac-framework-rg pxlhis2                  联机故障       否


-- 资源 --

            资源名称       节点名称               状况         状态消息
            --------       --------               ----         --------
资源:   rac_frameworkpxlhis1                  启动失败       有故障的 - Error in previous reconfiguration.
资源:   rac_frameworkpxlhis2                  启动失败       有故障的 - Error in previous reconfiguration.

资源:   rac_udlm       pxlhis1                  Offline      Offline
资源:   rac_udlm       pxlhis2                  Offline      Offline

资源:   rac_svm      pxlhis1                  Offline      Offline
资源:   rac_svm      pxlhis2                  Offline      Offline
我查看了/var/adm/message和/var/cluster/ucmm/ucmm_reconf.log两个日志,发现报错如下:
/var/adm/message:

Sep 16 16:03:36 pxlhis1 SUNWscucm.ucmm_reconf: Error was detected in previous reconfiguration: "svm exited with error 1 in step cmmstart"
Sep 16 16:03:36 pxlhis1 SUNWscucm.ucmm_reconf: The ucmmd daemon will not be started due to errors in previous reconfiguration.
Sep 16 16:03:36 pxlhis1 SC: Validation failed. The ucmmd daemon will not be started on this node.
Sep 16 16:03:36 pxlhis1 Cluster.RGM.global.rgmd: Method <bin/rac_framework_start> failed on resource <rac_framework> in resource group <rac-framework-rg>
Sep 16 16:03:36 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework state on node pxlhis1 change to R_START_FAILED
Sep 16 16:03:36 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework status on node pxlhis1 change to R_FM_FAULTED
Sep 16 16:03:36 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework status msg on node pxlhis1 change to <>
Sep 16 16:03:36 pxlhis1 Cluster.RGM.global.rgmd: resource group rac-framework-rg state on node pxlhis1 change to RG_PENDING_ON_STARTED
Sep 16 16:03:36 pxlhis1 Cluster.RGM.global.rgmd: resource group rac-framework-rg state on node pxlhis1 change to RG_ONLINE
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework state on node pxlhis2 change to R_START_FAILED
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework status on node pxlhis2 change to R_FM_FAULTED
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework status msg on node pxlhis2 change to <>
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: resource group rac-framework-rg state on node pxlhis2 change to RG_ONLINE
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: failback attempt failed on resource group <rac-framework-rg> with error <资源组在选定节点上启动失败;可能会结束向其他节点的故障转移>
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework status msg on node pxlhis1 change to <Error in previous reconfiguration.>
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework status msg on node pxlhis2 change to <Error in previous reconfiguration.>

/var/cluster/ucmm/ucmm_reconf.log:
Fri Sep 16 15:59:29 CST 2011 SUNWscucm.ucmm_reconf Step: cmmstart CURRNODES=0
Fri Sep 16 15:59:29 CST 2011 SUNWscucm.ucmm_reconf ucmm reconfiguration step start started
Fri Sep 16 15:59:29 CST 2011 SUNWscucm.ucmm_reconf svm started in cmmstart
Fri Sep 16 15:59:29 CST 2011 SUNWscucm.ucmm_reconf udlm started in cmmstart
Fri Sep 16 15:59:29 CST 2011 SUNWudlm.udlmreconfig.udlmstart_cmd Starting the Unix DLM.
Fri Sep 16 15:59:29 CST 2011 SUNWscmd.svmreconfig Running: /usr/lib/lvm/metaclust -d 1 -t 120 -V 1.0 start 1
Fri Sep 16 15:59:30 CST 2011 SUNWscmd.svmreconfig Completed: /usr/lib/lvm/metaclust -t 120 -V 1.0 start 1 return_code=1
Fri Sep 16 15:59:30 CST 2011 SUNWudlm.udlmreconfig.udlmstart_cmd /var/run/udlm_running not created, waiting
Fri Sep 16 15:59:31 CST 2011 SUNWudlm.udlmreconfig.udlmstart_cmd /var/run/udlm_running created after 1 seconds.
Fri Sep 16 15:59:31 CST 2011 SUNWscucm.ucmm_reconf svm completed with error 1 in cmmstart
Fri Sep 16 15:59:31 CST 2011 SUNWscucm.ucmm_reconf Step: cmmabort CURRNODES=0
Fri Sep 16 15:59:31 CST 2011 SUNWscucm.ucmm_reconf ucmm reconfiguration step abort started
Fri Sep 16 15:59:31 CST 2011 SUNWscucm.ucmm_reconf svm started in cmmabort
Fri Sep 16 15:59:31 CST 2011 SUNWscucm.ucmm_reconf udlm started in cmmabort
Fri Sep 16 15:59:32 CST 2011 SUNWscmd.svmreconfig Running: /usr/lib/lvm/metaclust -d 1 -t 120 -V 1.0 abort
Fri Sep 16 15:59:32 CST 2011 SUNWscmd.svmreconfig Completed: /usr/lib/lvm/metaclust -t 120 -V 1.0 abort return_code=0
Fri Sep 16 15:59:32 CST 2011 SUNWscucm.ucmm_reconf svm completed successfully in cmmabort
Fri Sep 16 15:59:32 CST 2011 SUNWscucm.ucmm_reconf udlm completed successfully in cmmabort
Fri Sep 16 15:59:32 CST 2011 SUNWscucm.ucmm_reconf ucmm reconfiguration step abort completed
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf Step: validate
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf ucmm reconfiguration step validate started
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf svm started in validate
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf udlm started in validate
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf svm completed successfully in validate
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf udlm completed successfully in validate
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf ucmm reconfiguration step validate completed
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf Error was detected in previous reconfiguration: "svm exited with error 1 in step cmmstart"

其中有一段我觉得很可疑:
Fri Sep 16 15:59:30 CST 2011 SUNWudlm.udlmreconfig.udlmstart_cmd /var/run/udlm_running not created, waiting
Fri Sep 16 15:59:31 CST 2011 SUNWudlm.udlmreconfig.udlmstart_cmd /var/run/udlm_running created after 1 seconds.
Fri Sep 16 15:59:31 CST 2011 SUNWscucm.ucmm_reconf svm completed with error 1 in cmmstart
不懂udlm_running无法创建是因为什么。
系统是solaris10 U9 ,集群软件是suncluster_3_2u2-ga-solaris-sparc。检查了下SUNW.udlm补丁已有,也注册过。
请坛子里高手帮忙分析下,小弟感激不尽。
一下午都在删除资源组,重做资源组,1号机,2号机都重做过,都不行。
页: [1]
查看完整版本: suncluster3.2在solaris10u9上面资源组起不来请高手帮忙分析