suncluster3.2在solaris10u9上面资源组起不来请高手帮忙分析
本帖最后由 幽冥公子 于 2011-09-16 16:33 编辑如题,小弟做rac在solarisU9上面,集群已经安装好,起来状态都正常。
可是在添加资源组的时候,报错,
# scstat -g
-- 资源组和资源 --
组名称 资源
------ ----
资源: rac-framework-rg rac_framework rac_udlm rac_svm
-- 资源组 --
组名称 节点名称 状况 已暂停
------ -------- ---- ------
组: rac-framework-rg pxlhis1 联机故障 否
组: rac-framework-rg pxlhis2 联机故障 否
-- 资源 --
资源名称 节点名称 状况 状态消息
-------- -------- ---- --------
资源: rac_frameworkpxlhis1 启动失败 有故障的 - Error in previous reconfiguration.
资源: rac_frameworkpxlhis2 启动失败 有故障的 - Error in previous reconfiguration.
资源: rac_udlm pxlhis1 Offline Offline
资源: rac_udlm pxlhis2 Offline Offline
资源: rac_svm pxlhis1 Offline Offline
资源: rac_svm pxlhis2 Offline Offline
我查看了/var/adm/message和/var/cluster/ucmm/ucmm_reconf.log两个日志,发现报错如下:
/var/adm/message:
Sep 16 16:03:36 pxlhis1 SUNWscucm.ucmm_reconf: Error was detected in previous reconfiguration: "svm exited with error 1 in step cmmstart"
Sep 16 16:03:36 pxlhis1 SUNWscucm.ucmm_reconf: The ucmmd daemon will not be started due to errors in previous reconfiguration.
Sep 16 16:03:36 pxlhis1 SC: Validation failed. The ucmmd daemon will not be started on this node.
Sep 16 16:03:36 pxlhis1 Cluster.RGM.global.rgmd: Method <bin/rac_framework_start> failed on resource <rac_framework> in resource group <rac-framework-rg>
Sep 16 16:03:36 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework state on node pxlhis1 change to R_START_FAILED
Sep 16 16:03:36 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework status on node pxlhis1 change to R_FM_FAULTED
Sep 16 16:03:36 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework status msg on node pxlhis1 change to <>
Sep 16 16:03:36 pxlhis1 Cluster.RGM.global.rgmd: resource group rac-framework-rg state on node pxlhis1 change to RG_PENDING_ON_STARTED
Sep 16 16:03:36 pxlhis1 Cluster.RGM.global.rgmd: resource group rac-framework-rg state on node pxlhis1 change to RG_ONLINE
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework state on node pxlhis2 change to R_START_FAILED
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework status on node pxlhis2 change to R_FM_FAULTED
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework status msg on node pxlhis2 change to <>
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: resource group rac-framework-rg state on node pxlhis2 change to RG_ONLINE
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: failback attempt failed on resource group <rac-framework-rg> with error <资源组在选定节点上启动失败;可能会结束向其他节点的故障转移>
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework status msg on node pxlhis1 change to <Error in previous reconfiguration.>
Sep 16 16:03:37 pxlhis1 Cluster.RGM.global.rgmd: resource rac_framework status msg on node pxlhis2 change to <Error in previous reconfiguration.>
/var/cluster/ucmm/ucmm_reconf.log:
Fri Sep 16 15:59:29 CST 2011 SUNWscucm.ucmm_reconf Step: cmmstart CURRNODES=0
Fri Sep 16 15:59:29 CST 2011 SUNWscucm.ucmm_reconf ucmm reconfiguration step start started
Fri Sep 16 15:59:29 CST 2011 SUNWscucm.ucmm_reconf svm started in cmmstart
Fri Sep 16 15:59:29 CST 2011 SUNWscucm.ucmm_reconf udlm started in cmmstart
Fri Sep 16 15:59:29 CST 2011 SUNWudlm.udlmreconfig.udlmstart_cmd Starting the Unix DLM.
Fri Sep 16 15:59:29 CST 2011 SUNWscmd.svmreconfig Running: /usr/lib/lvm/metaclust -d 1 -t 120 -V 1.0 start 1
Fri Sep 16 15:59:30 CST 2011 SUNWscmd.svmreconfig Completed: /usr/lib/lvm/metaclust -t 120 -V 1.0 start 1 return_code=1
Fri Sep 16 15:59:30 CST 2011 SUNWudlm.udlmreconfig.udlmstart_cmd /var/run/udlm_running not created, waiting
Fri Sep 16 15:59:31 CST 2011 SUNWudlm.udlmreconfig.udlmstart_cmd /var/run/udlm_running created after 1 seconds.
Fri Sep 16 15:59:31 CST 2011 SUNWscucm.ucmm_reconf svm completed with error 1 in cmmstart
Fri Sep 16 15:59:31 CST 2011 SUNWscucm.ucmm_reconf Step: cmmabort CURRNODES=0
Fri Sep 16 15:59:31 CST 2011 SUNWscucm.ucmm_reconf ucmm reconfiguration step abort started
Fri Sep 16 15:59:31 CST 2011 SUNWscucm.ucmm_reconf svm started in cmmabort
Fri Sep 16 15:59:31 CST 2011 SUNWscucm.ucmm_reconf udlm started in cmmabort
Fri Sep 16 15:59:32 CST 2011 SUNWscmd.svmreconfig Running: /usr/lib/lvm/metaclust -d 1 -t 120 -V 1.0 abort
Fri Sep 16 15:59:32 CST 2011 SUNWscmd.svmreconfig Completed: /usr/lib/lvm/metaclust -t 120 -V 1.0 abort return_code=0
Fri Sep 16 15:59:32 CST 2011 SUNWscucm.ucmm_reconf svm completed successfully in cmmabort
Fri Sep 16 15:59:32 CST 2011 SUNWscucm.ucmm_reconf udlm completed successfully in cmmabort
Fri Sep 16 15:59:32 CST 2011 SUNWscucm.ucmm_reconf ucmm reconfiguration step abort completed
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf Step: validate
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf ucmm reconfiguration step validate started
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf svm started in validate
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf udlm started in validate
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf svm completed successfully in validate
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf udlm completed successfully in validate
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf ucmm reconfiguration step validate completed
2011年09月16日 星期五 16时03分36秒 CST SUNWscucm.ucmm_reconf Error was detected in previous reconfiguration: "svm exited with error 1 in step cmmstart"
其中有一段我觉得很可疑:
Fri Sep 16 15:59:30 CST 2011 SUNWudlm.udlmreconfig.udlmstart_cmd /var/run/udlm_running not created, waiting
Fri Sep 16 15:59:31 CST 2011 SUNWudlm.udlmreconfig.udlmstart_cmd /var/run/udlm_running created after 1 seconds.
Fri Sep 16 15:59:31 CST 2011 SUNWscucm.ucmm_reconf svm completed with error 1 in cmmstart
不懂udlm_running无法创建是因为什么。
系统是solaris10 U9 ,集群软件是suncluster_3_2u2-ga-solaris-sparc。检查了下SUNW.udlm补丁已有,也注册过。
请坛子里高手帮忙分析下,小弟感激不尽。
一下午都在删除资源组,重做资源组,1号机,2号机都重做过,都不行。
页:
[1]