免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 3429 | 回复: 6
打印 上一主题 下一主题

Sun Cluster 3.2安装配置遇到问题,求助 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2010-10-30 20:06 |只看该作者 |倒序浏览
本帖最后由 king3171 于 2010-10-31 18:04 编辑

上一次安装Sun Cluster 还是2年前,装的3.1,时隔2年,再次安装Cluster 3.2,感觉和3.1应该差不多,居然就碰到奇怪的问题,希望能在这里找到解决问题的线索,说实话,Sun Cluster 比起HP 的MC  GUARD 和 IBM的HACMP,我感觉后者要好配多了,遇到问题也更容易解决。
硬件环境: 2台SUN M4000
存储EMC
软件系统:Solaris 10补丁打到最新  SUN Cluster  3.2 09年的发行版
在2台主机上都执行installer安装完Cluster软件,在节点1上执行scinstall进行配置时,到配置节点2时总说配置失败,对照安装配置文档,反反复复检查2台主机的配置,没有找到原因,安装日志如下:
bash-3.00# more scinstall.log.2476

*** Create a New Cluster ***
Sat Oct 30 17:07:21 CST 2010

    Attempting to contact "ccip-db1" ...

    Searching for a remote configuration method ...

scrcmd -N ccip-db1 test isfullyinstalled
The Sun Cluster framework software is installed.
scrcmd to "ccip-db1" - return status 1.

rsh ccip-db1 -n "/bin/sh -c '/bin/true; /bin/echo SC_COMMAND_STATUS=\$?'"
SC_COMMAND_STATUS=0
rsh to "ccip-db1" - return status 0.

ssh root@ccip-db1 -o "BatchMode yes" -o "StrictHostKeyChecking yes" -n "/bin/sh -c '/bin/true; /bin/echo SC_COMMAND_STATUS=\$?'"
SC_COMMAND_STATUS=0
ssh to "ccip-db1" - return status 0.

    The Sun Cluster framework is able to complete the configuration
    process without remote shell access.

    Plumbing network address 172.16.0.0 on adapter bge1 >> NOT DUPLICATE ... done    Plumbing network address 172.16.0.0 on adapter e1000g1 >> NOT DUPLICATE ... done---
-------------------------------
- Cluster Creation -
----------------------------------

    Testing for "/globaldevices" on "ccip-db2" ...
    Testing for "/globaldevices" on "ccip-db1" ...

scrcmd -N ccip-db1 chk_globaldev fs /globaldevices


    Starting discovery of the cluster transport configuration.

===========================
ccip-db2
===========================
scrconf -n cmd=discover_send,adapters=bge1:e1000g1,vlans=0:0,token=suncluster_ccip,sendcount=30

===========================
ccip-db1
===========================

scrcmd -N ccip-db1 autodiscovery 0:0 suncluster_ccip 2 30
e1000g1:0:ccip-db2:e1000g1:0
bge1:0:ccip-db2:bge1:0
quit

===========================

    The following connections were discovered:

        ccip-db2:bge1  switch1  ccip-db1:bge1
        ccip-db2:e1000g1  switch2  ccip-db1:e1000g1

    Completed discovery of the cluster transport configuration.

    Started cluster check on "ccip-db2".
    Started cluster check on "ccip-db1".

    cluster check completed with no errors or warnings for "ccip-db2".
    cluster check completed with no errors or warnings for "ccip-db1".

===========================
ccip-db2
===========================

/usr/cluster/lib/scadmin/lib/cmd_sccheck
cluster check -X -k installtime -v -o /var/cluster/logs/install/cluster_check
  initializing...
  initializing xml output...
  loading auxiliary data...
  filtering out checks not marked with one of keywords: installtime
  starting check run...
     ccip-db2:   M6708613       skipped: not a keyword match
     ccip-db2:   S6708255       skipped: not a keyword match
     ccip-db2:   M6336822       skipped: not a keyword match
     ccip-db2:   S6708589       skipped: not a keyword match
     ccip-db2:   S6708638.... starting:  Node has insufficient physical memory.      
     ccip-db2:   S6708638       passed
     ccip-db2:   S6708496.... starting:  Cluster node (3.1 or later) OpenBoot Prom (O...
     ccip-db2:   S6708496       passed
     ccip-db2:   S6708502       skipped: not a keyword match
     ccip-db2:   S6708479       skipped: not a keyword match
     ccip-db2:   S6708586       skipped: not a keyword match
     ccip-db2:   S6708592       skipped: not a keyword match
     ccip-db2:   S6708599       skipped: not a keyword match
     ccip-db2:   S6708605.... starting:  The /dev/rmt directory is missing.         
     ccip-db2:   S6708605       passed
     ccip-db2:   S6708606.... starting:  Multiple network interfaces on a single subn...
     ccip-db2:   S6708606       passed
     ccip-db2:   S6708641       skipped: not a keyword match
     ccip-db2:   S6708644       skipped: not a keyword match
     ccip-db2:   S6708642.... starting:  /proc fails to mount periodically during reb...
        searching /var/adm/messages
        searching /var/adm/messages.0
        searching /var/adm/messages.1
     ccip-db2:   S6708642       passed
     ccip-db2:   S6708689       skipped: not a keyword match
  finished check run
  finishing xml output...
  Maximum severity of all violations: No Violations
  Reports in: /var/cluster/logs/install/cluster_check/
  cleaning up...
***************************************************************************
*
*       cluster check           (ver 1.0)
*
***************************************************************************

    Report Date:        2010.10.30 at 17.08.56 CST
                        2010.10.30 at 09.08.56 GMT
    Command run on host:
                        85a4bc50- ccip-db2
    Checks run on nodes:
                        ccip-db2

    Unique Checks: 5

===========================================================================
*
*       Summary of Single Node Check Results for ccip-db2
*
===========================================================================
        
        Checks Considered: 5
        
        Results by Status
        -----------------
            Violated          :   0
            Insufficient Data :   0
            Execution Error   :   0
            Unknown Status    :   0
            Information Only  :   0
            Not Applicable    :   0
            Passed            :   5
        
        Violations by Severity
        ----------------------
            Critical          :   0
            High              :   0
            Moderate          :   0
            Low               :   0
        
---------------------------------------------------------------------------
*
*       Details for 5 Passed Checks on ccip-db2
*
---------------------------------------------------------------------------

        *  Check ID: S6708638  ***
        --------------------------
            *  Severity: Moderate

            *  Problem Statement: Node has insufficient physical memory.


        *  Check ID: S6708496  ***
        --------------------------
            *  Severity: Moderate

            *  Problem Statement: Cluster node (3.1 or later) OpenBoot Prom (OBP) has local-mac-address? variable set to 'false'.


        *  Check ID: S6708605  ***
        --------------------------
            *  Severity: Critical

            *  Problem Statement: The /dev/rmt directory is missing.


        *  Check ID: S6708606  ***
        --------------------------
            *  Severity: Moderate

            *  Problem Statement: Multiple network interfaces on a single subnet have the same MAC address.


        *  Check ID: S6708642  ***
        --------------------------
            *  Severity: Critical

            *  Problem Statement: /proc fails to mount periodically during reboots.



===========================================================================
*
*       End of Report 2010.10.30 at 17.08.56 CST
*
===========================================================================

===========================
ccip-db1
===========================

scrcmd -N ccip-db1 sccheck
cluster check -X -k installtime -v -o /var/cluster/logs/install/cluster_check
  initializing...
  initializing xml output...
  loading auxiliary data...
  filtering out checks not marked with one of keywords: installtime
  starting check run...
     ccip-db1:   M6708613       skipped: not a keyword match
     ccip-db1:   S6708255       skipped: not a keyword match
     ccip-db1:   M6336822       skipped: not a keyword match
     ccip-db1:   S6708589       skipped: not a keyword match
     ccip-db1:   S6708638.... starting:  Node has insufficient physical memory.      
     ccip-db1:   S6708638       passed
     ccip-db1:   S6708496.... starting:  Cluster node (3.1 or later) OpenBoot Prom (O...
     ccip-db1:   S6708496       passed
     ccip-db1:   S6708502       skipped: not a keyword match
     ccip-db1:   S6708479       skipped: not a keyword match
     ccip-db1:   S6708586       skipped: not a keyword match
     ccip-db1:   S6708592       skipped: not a keyword match
     ccip-db1:   S6708599       skipped: not a keyword match
     ccip-db1:   S6708605.... starting:  The /dev/rmt directory is missing.         
     ccip-db1:   S6708605       passed
     ccip-db1:   S6708606.... starting:  Multiple network interfaces on a single subn...
     ccip-db1:   S6708606       passed
     ccip-db1:   S6708641       skipped: not a keyword match
     ccip-db1:   S6708644       skipped: not a keyword match
     ccip-db1:   S6708642.... starting:  /proc fails to mount periodically during reb...
        searching /var/adm/messages
        searching /var/adm/messages.0
        searching /var/adm/messages.1
     ccip-db1:   S6708642       passed
     ccip-db1:   S6708689       skipped: not a keyword match
  finished check run
  finishing xml output...
  Maximum severity of all violations: No Violations
  Reports in: /var/cluster/logs/install/cluster_check/
  cleaning up...
***************************************************************************
*
*       cluster check           (ver 1.0)
*
***************************************************************************

    Report Date:        2010.10.30 at 17.08.55 CST
                        2010.10.30 at 09.08.55 GMT
    Command run on host:
                        85a4bc44- ccip-db1
    Checks run on nodes:
                        ccip-db1

    Unique Checks: 5

===========================================================================
*
*       Summary of Single Node Check Results for ccip-db1
*
===========================================================================
        
        Checks Considered: 5
        
        Results by Status
        -----------------
            Violated          :   0
            Insufficient Data :   0
            Execution Error   :   0
            Unknown Status    :   0
            Information Only  :   0
            Not Applicable    :   0
            Passed            :   5
        
        Violations by Severity
        ----------------------
            Critical          :   0
            High              :   0
            Moderate          :   0
            Low               :   0
        
---------------------------------------------------------------------------
*
*       Details for 5 Passed Checks on ccip-db1
*
---------------------------------------------------------------------------

        *  Check ID: S6708638  ***
        --------------------------
            *  Severity: Moderate

            *  Problem Statement: Node has insufficient physical memory.


        *  Check ID: S6708496  ***
        --------------------------
            *  Severity: Moderate

            *  Problem Statement: Cluster node (3.1 or later) OpenBoot Prom (OBP) has local-mac-address? variable set to 'false'.


        *  Check ID: S6708605  ***
        --------------------------
            *  Severity: Critical

            *  Problem Statement: The /dev/rmt directory is missing.


        *  Check ID: S6708606  ***
        --------------------------
            *  Severity: Moderate

            *  Problem Statement: Multiple network interfaces on a single subnet have the same MAC address.


        *  Check ID: S6708642  ***
        --------------------------
            *  Severity: Critical

            *  Problem Statement: /proc fails to mount periodically during reboots.



===========================================================================
*
*       End of Report 2010.10.30 at 17.08.55 CST
*
===========================================================================

===========================



===========================
ccip-db1
===========================


scrcmd -N ccip-db1 test isinstalling
"" is not running.

scrcmd -N ccip-db1 test isconfigured
Sun Cluster is not configured.

    Configuring "ccip-db1" ...

scrcmd -N ccip-db1 install -logfile /var/cluster/logs/install/scinstall.log.2476 -k -C ccip -F -T node=ccip-db2,node=ccip-db1,authtype=sys -w netaddr=172.16.0.0,netmask
=255.255.240.0,maxnodes=64,maxprivatenets=10,numvirtualclusters=12 -A trtype=dlpi,name=e1000g1 -A trtype=dlpi,name=bge1 -B type=switch,name=switch2 -B type=switch,name=
switch1 -m endpoint=:e1000g1,endpoint=switch2 -m endpoint=:bge1,endpoint=switch1
scinstall:  /global/.devices/node@1 is not found

scinstall:  scinstall did NOT complete successfully!


Checking device to use for global devices file system ... done

Initializing cluster name to "ccip" ... done
Initializing authentication options ... done
Initializing configuration for adapter "e1000g1" ... done
Initializing configuration for adapter "bge1" ... done
Initializing configuration for switch "switch2" ... done
Initializing configuration for switch "switch1" ... done
Initializing configuration for cable ... done
Initializing configuration for cable ... done
Initializing private network address options ... done



Setting the node ID for "ccip-db1" ... done (id=1)


Checking for global devices global file system ... done

Log file - /var/cluster/logs/install/scinstall.log.2476


Failed to configure "ccip-db1".



scinstall:  scinstall did NOT complete successfully!

论坛徽章:
0
2 [报告]
发表于 2010-10-30 21:48 |只看该作者
scinstall:  /global/.devices/node@1 is not found

创建这个全局设备文件系统是不是出问题了,分区的时候预留空间了吗?

论坛徽章:
0
3 [报告]
发表于 2010-10-31 17:58 |只看该作者
我把c0t0d0s6分区划了1G,mount在/globaldevices目录下,在scinstall执行配置的时候会对/globaldevices目录做检测,这个检测也PASS了,资料显示,cluster会在配置中自动把/globaldevices修改为/global/.devices/node@1 ,但现在报 /global/.devices/node@1 is not found,让我很纳闷,因为有/globaldevices挂载点,空间也足够,郁闷 。。。。。

论坛徽章:
2
双鱼座
日期:2014-02-23 12:10:03操作系统版块每日发帖之星
日期:2015-12-17 06:20:00
4 [报告]
发表于 2010-10-31 18:45 |只看该作者
你是2个节点一起安装的吧?分开试试呢?

论坛徽章:
2
双鱼座
日期:2014-02-23 12:10:03操作系统版块每日发帖之星
日期:2015-12-17 06:20:00
5 [报告]
发表于 2010-10-31 18:45 |只看该作者
local-mac-address=false?

论坛徽章:
0
6 [报告]
发表于 2010-10-31 22:32 |只看该作者
本帖最后由 king3171 于 2010-10-31 22:33 编辑

local-mac-address=true ,分开试了,分开安装单节点也不成功。说实话,我前头安装配置了一次,第一次安装配置是很顺利的,我在DB1上执行scinstall配置另外一个节点DB2配置成功了,DB2配置完成自动重启以后,开始配置DB1,DB1 配置完开始重启,然后我的厄运就开始了,我是远程telnet连接安装的,(我忘了他会自动重启)DB1一重启,我的连接就中断了,等到我再次连接上DB1,发现cluster状态不正常,于是我就跑到机房,用串口连上去,重新安装系统,重新打补丁,重新安装配置,然后就卡在这了,一样的配置,重新来一遍就不对了,我太郁闷了

论坛徽章:
0
7 [报告]
发表于 2010-11-01 00:15 |只看该作者
本帖最后由 king3171 于 2010-11-07 18:06 编辑

问题解决一段时间了,还是上来说一下吧

删除配置,重新使用scinstall配置,不使用默认的globaldevices,使用lofi设备就好了,为什么会这样,也没心思深究,如下:

# df -k
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d0       30259230 7433336 22523302    25%    /
/devices                   0       0       0     0%    /devices
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                 45660912    2176 45658736     1%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
sharefs                    0       0       0     0%    /etc/dfs/sharetab
fd                         0       0       0     0%    /dev/fd
swap                 45658808      72 45658736     1%    /tmp
swap                 45683720   24984 45658736     1%    /var/run
/dev/md/dsk/d4       51637369 4367168 46753828     9%    /app
/dev/lofi/126          95771    4247   81947     5%    /global/.devices/node@2
/dev/lofi/127          95771    4261   81933     5%    /global/.devices/node@1
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP