- 论坛徽章:
- 0
|
本帖最后由 king3171 于 2010-10-31 18:04 编辑
上一次安装Sun Cluster 还是2年前,装的3.1,时隔2年,再次安装Cluster 3.2,感觉和3.1应该差不多,居然就碰到奇怪的问题,希望能在这里找到解决问题的线索,说实话,Sun Cluster 比起HP 的MC GUARD 和 IBM的HACMP,我感觉后者要好配多了,遇到问题也更容易解决。
硬件环境: 2台SUN M4000
存储EMC
软件系统:Solaris 10补丁打到最新 SUN Cluster 3.2 09年的发行版
在2台主机上都执行installer安装完Cluster软件,在节点1上执行scinstall进行配置时,到配置节点2时总说配置失败,对照安装配置文档,反反复复检查2台主机的配置,没有找到原因,安装日志如下:
bash-3.00# more scinstall.log.2476
*** Create a New Cluster ***
Sat Oct 30 17:07:21 CST 2010
Attempting to contact "ccip-db1" ...
Searching for a remote configuration method ...
scrcmd -N ccip-db1 test isfullyinstalled
The Sun Cluster framework software is installed.
scrcmd to "ccip-db1" - return status 1.
rsh ccip-db1 -n "/bin/sh -c '/bin/true; /bin/echo SC_COMMAND_STATUS=\$?'"
SC_COMMAND_STATUS=0
rsh to "ccip-db1" - return status 0.
ssh root@ccip-db1 -o "BatchMode yes" -o "StrictHostKeyChecking yes" -n "/bin/sh -c '/bin/true; /bin/echo SC_COMMAND_STATUS=\$?'"
SC_COMMAND_STATUS=0
ssh to "ccip-db1" - return status 0.
The Sun Cluster framework is able to complete the configuration
process without remote shell access.
Plumbing network address 172.16.0.0 on adapter bge1 >> NOT DUPLICATE ... done Plumbing network address 172.16.0.0 on adapter e1000g1 >> NOT DUPLICATE ... done---
-------------------------------
- Cluster Creation -
----------------------------------
Testing for "/globaldevices" on "ccip-db2" ...
Testing for "/globaldevices" on "ccip-db1" ...
scrcmd -N ccip-db1 chk_globaldev fs /globaldevices
Starting discovery of the cluster transport configuration.
===========================
ccip-db2
===========================
scrconf -n cmd=discover_send,adapters=bge1:e1000g1,vlans=0:0,token=suncluster_ccip,sendcount=30
===========================
ccip-db1
===========================
scrcmd -N ccip-db1 autodiscovery 0:0 suncluster_ccip 2 30
e1000g1:0:ccip-db2:e1000g1:0
bge1:0:ccip-db2:bge1:0
quit
===========================
The following connections were discovered:
ccip-db2:bge1 switch1 ccip-db1:bge1
ccip-db2:e1000g1 switch2 ccip-db1:e1000g1
Completed discovery of the cluster transport configuration.
Started cluster check on "ccip-db2".
Started cluster check on "ccip-db1".
cluster check completed with no errors or warnings for "ccip-db2".
cluster check completed with no errors or warnings for "ccip-db1".
===========================
ccip-db2
===========================
/usr/cluster/lib/scadmin/lib/cmd_sccheck
cluster check -X -k installtime -v -o /var/cluster/logs/install/cluster_check
initializing...
initializing xml output...
loading auxiliary data...
filtering out checks not marked with one of keywords: installtime
starting check run...
ccip-db2: M6708613 skipped: not a keyword match
ccip-db2: S6708255 skipped: not a keyword match
ccip-db2: M6336822 skipped: not a keyword match
ccip-db2: S6708589 skipped: not a keyword match
ccip-db2: S6708638.... starting: Node has insufficient physical memory.
ccip-db2: S6708638 passed
ccip-db2: S6708496.... starting: Cluster node (3.1 or later) OpenBoot Prom (O...
ccip-db2: S6708496 passed
ccip-db2: S6708502 skipped: not a keyword match
ccip-db2: S6708479 skipped: not a keyword match
ccip-db2: S6708586 skipped: not a keyword match
ccip-db2: S6708592 skipped: not a keyword match
ccip-db2: S6708599 skipped: not a keyword match
ccip-db2: S6708605.... starting: The /dev/rmt directory is missing.
ccip-db2: S6708605 passed
ccip-db2: S6708606.... starting: Multiple network interfaces on a single subn...
ccip-db2: S6708606 passed
ccip-db2: S6708641 skipped: not a keyword match
ccip-db2: S6708644 skipped: not a keyword match
ccip-db2: S6708642.... starting: /proc fails to mount periodically during reb...
searching /var/adm/messages
searching /var/adm/messages.0
searching /var/adm/messages.1
ccip-db2: S6708642 passed
ccip-db2: S6708689 skipped: not a keyword match
finished check run
finishing xml output...
Maximum severity of all violations: No Violations
Reports in: /var/cluster/logs/install/cluster_check/
cleaning up...
***************************************************************************
*
* cluster check (ver 1.0)
*
***************************************************************************
Report Date: 2010.10.30 at 17.08.56 CST
2010.10.30 at 09.08.56 GMT
Command run on host:
85a4bc50- ccip-db2
Checks run on nodes:
ccip-db2
Unique Checks: 5
===========================================================================
*
* Summary of Single Node Check Results for ccip-db2
*
===========================================================================
Checks Considered: 5
Results by Status
-----------------
Violated : 0
Insufficient Data : 0
Execution Error : 0
Unknown Status : 0
Information Only : 0
Not Applicable : 0
Passed : 5
Violations by Severity
----------------------
Critical : 0
High : 0
Moderate : 0
Low : 0
---------------------------------------------------------------------------
*
* Details for 5 Passed Checks on ccip-db2
*
---------------------------------------------------------------------------
* Check ID: S6708638 ***
--------------------------
* Severity: Moderate
* Problem Statement: Node has insufficient physical memory.
* Check ID: S6708496 ***
--------------------------
* Severity: Moderate
* Problem Statement: Cluster node (3.1 or later) OpenBoot Prom (OBP) has local-mac-address? variable set to 'false'.
* Check ID: S6708605 ***
--------------------------
* Severity: Critical
* Problem Statement: The /dev/rmt directory is missing.
* Check ID: S6708606 ***
--------------------------
* Severity: Moderate
* Problem Statement: Multiple network interfaces on a single subnet have the same MAC address.
* Check ID: S6708642 ***
--------------------------
* Severity: Critical
* Problem Statement: /proc fails to mount periodically during reboots.
===========================================================================
*
* End of Report 2010.10.30 at 17.08.56 CST
*
===========================================================================
===========================
ccip-db1
===========================
scrcmd -N ccip-db1 sccheck
cluster check -X -k installtime -v -o /var/cluster/logs/install/cluster_check
initializing...
initializing xml output...
loading auxiliary data...
filtering out checks not marked with one of keywords: installtime
starting check run...
ccip-db1: M6708613 skipped: not a keyword match
ccip-db1: S6708255 skipped: not a keyword match
ccip-db1: M6336822 skipped: not a keyword match
ccip-db1: S6708589 skipped: not a keyword match
ccip-db1: S6708638.... starting: Node has insufficient physical memory.
ccip-db1: S6708638 passed
ccip-db1: S6708496.... starting: Cluster node (3.1 or later) OpenBoot Prom (O...
ccip-db1: S6708496 passed
ccip-db1: S6708502 skipped: not a keyword match
ccip-db1: S6708479 skipped: not a keyword match
ccip-db1: S6708586 skipped: not a keyword match
ccip-db1: S6708592 skipped: not a keyword match
ccip-db1: S6708599 skipped: not a keyword match
ccip-db1: S6708605.... starting: The /dev/rmt directory is missing.
ccip-db1: S6708605 passed
ccip-db1: S6708606.... starting: Multiple network interfaces on a single subn...
ccip-db1: S6708606 passed
ccip-db1: S6708641 skipped: not a keyword match
ccip-db1: S6708644 skipped: not a keyword match
ccip-db1: S6708642.... starting: /proc fails to mount periodically during reb...
searching /var/adm/messages
searching /var/adm/messages.0
searching /var/adm/messages.1
ccip-db1: S6708642 passed
ccip-db1: S6708689 skipped: not a keyword match
finished check run
finishing xml output...
Maximum severity of all violations: No Violations
Reports in: /var/cluster/logs/install/cluster_check/
cleaning up...
***************************************************************************
*
* cluster check (ver 1.0)
*
***************************************************************************
Report Date: 2010.10.30 at 17.08.55 CST
2010.10.30 at 09.08.55 GMT
Command run on host:
85a4bc44- ccip-db1
Checks run on nodes:
ccip-db1
Unique Checks: 5
===========================================================================
*
* Summary of Single Node Check Results for ccip-db1
*
===========================================================================
Checks Considered: 5
Results by Status
-----------------
Violated : 0
Insufficient Data : 0
Execution Error : 0
Unknown Status : 0
Information Only : 0
Not Applicable : 0
Passed : 5
Violations by Severity
----------------------
Critical : 0
High : 0
Moderate : 0
Low : 0
---------------------------------------------------------------------------
*
* Details for 5 Passed Checks on ccip-db1
*
---------------------------------------------------------------------------
* Check ID: S6708638 ***
--------------------------
* Severity: Moderate
* Problem Statement: Node has insufficient physical memory.
* Check ID: S6708496 ***
--------------------------
* Severity: Moderate
* Problem Statement: Cluster node (3.1 or later) OpenBoot Prom (OBP) has local-mac-address? variable set to 'false'.
* Check ID: S6708605 ***
--------------------------
* Severity: Critical
* Problem Statement: The /dev/rmt directory is missing.
* Check ID: S6708606 ***
--------------------------
* Severity: Moderate
* Problem Statement: Multiple network interfaces on a single subnet have the same MAC address.
* Check ID: S6708642 ***
--------------------------
* Severity: Critical
* Problem Statement: /proc fails to mount periodically during reboots.
===========================================================================
*
* End of Report 2010.10.30 at 17.08.55 CST
*
===========================================================================
===========================
===========================
ccip-db1
===========================
scrcmd -N ccip-db1 test isinstalling
"" is not running.
scrcmd -N ccip-db1 test isconfigured
Sun Cluster is not configured.
Configuring "ccip-db1" ...
scrcmd -N ccip-db1 install -logfile /var/cluster/logs/install/scinstall.log.2476 -k -C ccip -F -T node=ccip-db2,node=ccip-db1,authtype=sys -w netaddr=172.16.0.0,netmask
=255.255.240.0,maxnodes=64,maxprivatenets=10,numvirtualclusters=12 -A trtype=dlpi,name=e1000g1 -A trtype=dlpi,name=bge1 -B type=switch,name=switch2 -B type=switch,name=
switch1 -m endpoint=:e1000g1,endpoint=switch2 -m endpoint=:bge1,endpoint=switch1
scinstall: /global/.devices/node@1 is not found
scinstall: scinstall did NOT complete successfully!
Checking device to use for global devices file system ... done
Initializing cluster name to "ccip" ... done
Initializing authentication options ... done
Initializing configuration for adapter "e1000g1" ... done
Initializing configuration for adapter "bge1" ... done
Initializing configuration for switch "switch2" ... done
Initializing configuration for switch "switch1" ... done
Initializing configuration for cable ... done
Initializing configuration for cable ... done
Initializing private network address options ... done
Setting the node ID for "ccip-db1" ... done (id=1)
Checking for global devices global file system ... done
Log file - /var/cluster/logs/install/scinstall.log.2476
Failed to configure "ccip-db1".
scinstall: scinstall did NOT complete successfully! |
|