双机起不来了，跪求sun cluster3.0和agent安装包

abbend 发表于 2012-05-09 09:57

愁死我了，两台sun880和两台3510阵列，solaris8

双机启动不了，boot -x没有问题。
启动信息提示如下：
SUNW,pci-gem0: Using Gigabit SERDES Interface
SUNW,pci-gem0: Auto-Negotiated 1000 Mbps Full-Duplex Link Up
Could not read symbolic link for: /dev/rdsk/c2t44d1s2 path not loaded
   No such file or directory
Could not read symbolic link for: /dev/rdsk/c2t44d0s2 path not loaded
   No such file or directory
Booting as part of a cluster
NOTICE: CMM: Node hh_db1 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node hh_db2 (nodeid = 2) with votecount = 1 added.
NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d17s2) added; votecount = 1, bitmask of nodes with configured paths = 0x3.
WARNING: CMM: Open failed with error '(No such device or address)' and errno = 6 for quorum device 1 with gdevname '/dev/did/rdsk/d17s2'.
NOTICE: clcomm: Adapter eri0 constructed
NOTICE: clcomm: Path hh_db2:eri0 - hh_db1:eri0 being constructed
NOTICE: clcomm: Adapter qfe0 constructed
NOTICE: clcomm: Path hh_db2:qfe0 - hh_db1:qfe0 being constructed
NOTICE: CMM: Node hh_db2: attempting to join cluster.
NOTICE: clcomm: Path hh_db2:qfe0 - hh_db1:qfe0 being initiated
NOTICE: clcomm: Path hh_db2:qfe0 - hh_db1:qfe0 online
NOTICE: CMM: Node hh_db1 (nodeid: 1, incarnation #: 1336489999) has become reachable.
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node hh_db1 (nodeid = 1) is up; new incarnation number = 1336489999.
NOTICE: CMM: Node hh_db2 (nodeid = 2) is up; new incarnation number = 1336490017.
NOTICE: CMM: Cluster members: hh_db1 hh_db2.
NOTICE: CMM: node reconfiguration #1 completed.
NOTICE: CMM: Node hh_db2: joined cluster.
NOTICE: CCR: Waiting for repository synchronization to finish.
NOTICE: clcomm: Path hh_db2:eri0 - hh_db1:eri0 being initiated
NOTICE: clcomm: Path hh_db2:eri0 - hh_db1:eri0 online
Could not read symbolic link for: /dev/rdsk/c2t44d1s2 path not loaded
   No such file or directory
Could not read symbolic link for: /dev/rdsk/c2t44d0s2 path not loaded
   No such file or directory
VxVM general startup...
The system is coming up.Please wait.
Starting cpudiagd ... done.
starting rpc services: rpcbind done.
Setting netmask of lo0:1 to 255.255.255.255
Setting netmask of ce1 to 255.255.255.0
Setting netmask of ge0 to 255.255.255.0
Setting netmask of eri0 to 255.255.255.128
Setting netmask of eri0:1 to 255.255.255.252
Setting netmask of qfe0 to 255.255.255.128
Setting default IPv4 interface for multicast: add net 224.0/4: gateway hh_db2
syslog service starting.
obtaining access to all attached disks
Configuring the /dev/global directory (global devices)
Print services started.
volume management starting.

panic/thread=30005a2e3a0: CMM: Cluster lost operational quorum; aborting.

000002a1026d7450 cl_runtime:__0FZsc_syslog_msg_log_no_argsPviTCPCcTB+60 (30005f74000, 3, 0, 784c8a7c, 2a1026d7650, 3)
%l0-3: 00000300055900e0 0000000000000e90 0000000000000001 0000030003348538
%l4-7: 0000000000000002 0000000000000000 0000000000000000 0000030003348810
000002a1026d7500 cl_runtime:__0f5CosNsc_syslog_msgDlogiTBPCce+1c (30004fcc4d8, 3, 0, 784c8a7c, 0, 30003348538)
%l0-3: 000002a1026d77c0 0000000000000001 0000030003348538 0000000000000002
%l4-7: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
000002a1026d75b0 cl_comm:__0fOautomaton_implMqcheck_statev+6ec (30004f2c008, 1, 2a1026d76bc, 784c9790, 30004f2c080, 0)
%l0-3: 00000000783a6ff0 0000030004f2c1c0 0000030004f2c008 0000000000000000
%l4-7: 0000000000000041 0000030001d75ea8 0000000000000002 0000030001d75ed0

怀疑是did和quorum device有问题了，但是双机起不来，怎么更改did和quorum device啊？

是不是只能重新装cluster了？没有3.0的安装包啊。

znnnz 发表于 2012-05-09 10:07

boot-x进系统看3510是否正常，看网络是否正常

abbend 发表于 2012-05-09 10:13

boot-x进系统后3510正常，网络也正常，手工挂载了veritas的相关卷，暂时单机恢复了业务

注：两台主机boot -x状态都正常

rf00147 发表于 2012-05-09 10:36

检查链路吧，双机心跳和仲裁都是好的。可能是资源组挂载卷有问题，导致宕机。

abbend 发表于 2012-05-09 11:08

rf00147 发表于 2012-05-09 10:36 static/image/common/back.gif
检查链路吧，双机心跳和仲裁都是好的。可能是资源组挂载卷有问题，导致宕机。

启动提示Quorum device 报错

还有就是veritas的dg也不能import进来，我怀疑是did出问题了，导致的

现在双机一启动就马上重启，没有办法进cluster状态，怎么修改啊

rf00147 发表于 2012-05-09 11:57

仲裁报错，也不会宕机，心跳通的就不会宕，还有就是看你的日志他是加入cluster后宕的回复 5# abbend

abbend 发表于 2012-05-09 19:39

实在没有办法了，最后用cluster3.1重建了，故障解决

东方蜘蛛 发表于 2012-05-10 00:21

:-L二台主机都报这个错吗：panic/thread=30005a2e3a0: CMM: Cluster lost operational quorum; aborting.

byuq 发表于 2012-05-10 10:02

本帖最后由 byuq 于 2012-05-10 10:12 编辑

/dev/rdsk/c2t44d1s2
/dev/rdsk/c2t44d0s2
其中一块硬盘肯定是/dev/did/rdsk/d17s2,所以导致票盘失效。

Scenario:
* The quorum device has failed.
* All nodes are out of cluster mode.

Solution:

Use the following procedures to remove the existing quorum device and
create a new quorum device.

Example for two node cluster, named node-0 and node-1:
------------------------------------------------------

1.Reboot each node with boot -x command.
2.Edit the /etc/cluster/ccr/infrastructure file on all nodes.
3.Change the installmode from disabled to enabled:

cluster.properties.installmodeenabled

4.Delete all quorum devices by removing lines that start with
   cluster.quorum_devices and save the file.

cluster.quorum_devices.1.name d4
cluster.quorum_devices.1.stateenabled
cluster.quorum_devices.1.properties.votecount 1
cluster.quorum_devices.1.properties.gdevname /dev/did/rdsk/d4s2
cluster.quorum_devices.1.properties.path_1    enabled
cluster.quorum_devices.1.properties.path_2    enabled

5.Regenerate the checksum of the infrastructure file by running the
   following command on node-0:

/usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/infrastructure -o

6.Regenerate the checksum of the infrastructure file by running the
   following command on node-1:

/usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/infrastructure

7.Boot both nodes back into cluster mode.
8.Rerun scsetup to reset installmode and recreate the quorum device.

Example for three node cluster (OPS/RAC environment:
--------------------------------------------------

1.Delete all references to the quorum device by removing lines that
   start with cluster.quorum_devices on all nodes and save the file

cluster.quorum_devices.1.name d4
cluster.quorum_devices.1.stateenabled
cluster.quorum_devices.1.properties.votecount 1
cluster.quorum_devices.1.properties.gdevname /dev/did/rdsk/d4s2
cluster.quorum_devices.1.properties.path_1    enabled
cluster.quorum_devices.1.properties.path_2    enabled

2.Regenerate the checksum of the infrastructure file by running the
   following command on node-0:

/usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/infrastructure -o

3.Regenerate the checksum of the infrastructure file by running the
   following command on node-1:

/usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/infrastructure

4.Regenerate the checksum of the infrastructure file by running the
   following command on node-2:

/usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/infrastructure

5.Boot first node into cluster mode. Node will hang at:

NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for
quorum.

6.Boot second node into cluster mode.
7.The first and second nodes should meet quorum requirements and boot
   into cluster mode.
8.Boot third node into cluster mode. This node should join cluster
   without issue.

NOTE:With configurations larger than three nodes, continue booting
nodes until quorum count is reached.

9.Once all nodes are in cluster mode, run scsetup to create new quorum
   device.
10. Ensure new quorum device is allotted correct number of votes based
   on number of nodes able to access the quorum device.

abbend 发表于 2012-05-10 15:47

东方蜘蛛发表于 2012-05-10 00:21 static/image/common/back.gif
二台主机都报这个错吗：panic/thread=30005a2e3a0: CMM: Cluster lost operational quorum; abort ...

另一台当时主要是dg无法import，但是我尝试手动可以import的。报错如下：
May9 15:16:44Cluster.Framework: stderr: /usr/cluster/lib/sc/run_reserve: 886 Segmentation Fault(coredump)
May9 15:16:44Cluster.Framework: stderr: vxvm:vxdg: ERROR
May9 15:16:44Cluster.Framework: stderr: : Disk group ipasdg: No such disk group is imported
Fatal error: could not deport VxVM diskgroup ipasdg. Halting node.

root@ # May9 15:16:44 /usr/lib/snmp/snmpdx: received signal 15
May9 15:16:44 rpcbind: rpcbind terminating on signal.
syncing file systems... done
WARNING: CMM: Node being shut down.

感谢各位

页: [1] 2

Chinaunix's Archiver

双机起不来了，跪求sun cluster3.0和agent安装包