- 论坛徽章:
- 0
|
故障描述:重启节点HDBS1后几分钟能够ping通网关,但过几分钟后网卡就报错,从而无法ping通网关。节点2可以ping通节点HDBS1的ip,节点HDBS1无法ping通HDBS2。
sun cluster版本:3.2
系统版本:solaris 10
IPMP的配置为:
more /etc/hostname.nxge0
HDBS1 netmask + broadcast + group sc_ipmp0 up \
addif HDBS1_TEST netmask + broadcast + deprecated -failover up
more /etc/hostname.nxge2
HDBS1_2 netmask + broadcast + group sc_ipmp1 up \
addif HDBS1_2_TEST netmask + broadcast + deprecated -failover up
sun cluster状态输出为:
-- Cluster Nodes --
Node name Status
--------- ------
Cluster node: HDBS1 Online
Cluster node: HDBS2 Online
------------------------------------------------------------------
-- Cluster Transport Paths --
Endpoint Endpoint Status
-------- -------- ------
Transport path: HDBS1:nxge3 HDBS2:nxge3 Path online
Transport path: HDBS1:nxge1 HDBS2:nxge1 Path online
------------------------------------------------------------------
-- Quorum Summary --
Quorum votes possible: 3
Quorum votes needed: 2
Quorum votes present: 3
-- Quorum Votes by Node --
Node Name Present Possible Status
--------- ------- -------- ------
Node votes: HDBS1 1 1 Online
Node votes: HDBS2 1 1 Online
-- Quorum Votes by Device --
Device Name Present Possible Status
----------- ------- -------- ------
Device votes: /dev/did/rdsk/d8s2 1 1 Online
------------------------------------------------------------------
-- Device Group Servers --
Device Group Primary Secondary
------------ ------- ---------
-- Device Group Spares --
Device Group Spare Nodes
------------ -----------
-- Device Group Inactives --
Device Group Inactive Nodes
------------ --------------
-- Device Group Transitions --
Device Group In Transition Nodes
------------ -------------------
-- Device Group Status --
Device Group Status
------------ ------
-- Multi-owner Device Groups --
Device Group Online Status
------------ -------------
------------------------------------------------------------------
-- Resource Groups and Resources --
Group Name Resources
---------- ---------
Resources: hdbs_cluster oracle2 oracle1 oracle-listener oracle-services oracle-storage-db oracle-storage-index
-- Resource Groups --
Group Name Node Name State Suspended
---------- --------- ----- ---------
Group: hdbs_cluster HDBS1 Offline No
Group: hdbs_cluster HDBS2 Online No
-- Resources --
Resource Name Node Name State Status Message
------------- --------- ----- --------------
Resource: oracle2 HDBS1 Offline Offline - LogicalHostname offline.
Resource: oracle2 HDBS2 Online Online - LogicalHostname online.
Resource: oracle1 HDBS1 Offline Offline - LogicalHostname offline.
Resource: oracle1 HDBS2 Online Online - LogicalHostname online.
Resource: oracle-listener HDBS1 Offline Offline
Resource: oracle-listener HDBS2 Online Online
Resource: oracle-services HDBS1 Offline Offline
Resource: oracle-services HDBS2 Online Online
Resource: oracle-storage-db HDBS1 Offline Offline
Resource: oracle-storage-db HDBS2 Online Online
Resource: oracle-storage-index HDBS1 Offline Offline
Resource: oracle-storage-index HDBS2 Online Online
------------------------------------------------------------------
-- IPMP Groups --
Node Name Group Status Adapter Status
--------- ----- ------ ------- ------
IPMP Group: HDBS1 sc_ipmp1 Online nxge2 Offline
IPMP Group: HDBS1 sc_ipmp0 Online nxge0 Offline
IPMP Group: HDBS2 sc_ipmp1 Online nxge2 Online
IPMP Group: HDBS2 sc_ipmp0 Online nxge0 Online
-- IPMP Groups in Zones --
Zone Name Group Status Adapter Status
--------- ----- ------ ------- ------
------------------------------------------------------------------
系统日志里面报错为:
Aug 25 14:55:28 HDBS1 /scsi_vhci/ssd@g600a0b800075ee2500000bec55cb8951 (ssd14): Command Timeout on path /pci@400/pci@0/pci@d/QLGC,qlc@0,1/fp@0,0 (fp3)
Aug 25 14:55:28 HDBS1 scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):
Aug 25 14:55:28 HDBS1 /scsi_vhci/ssd@g600a0b800075ee2500000bf155cb8afd (ssd12): Command Timeout on path /pci@400/pci@0/pci@d/QLGC,qlc@0,1/fp@0,0 (fp3)
Aug 25 14:55:29 HDBS1 scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):
Aug 25 14:55:29 HDBS1 /scsi_vhci/ssd@g600a0b800075ee2500000bef55cb89e4 (ssd13): Command Timeout on path /pci@400/pci@0/pci@d/QLGC,qlc@0,1/fp@0,0 (fp3)
Aug 25 15:31:44 HDBS1 usba: [ID 912658 kern.info] USB 2.0 interface (usbif928,0.config1.1) operating at hi speed (USB 2.x) on USB 2.0 external hub: input@1, hid3 at bus address 3
Aug 25 15:31:44 HDBS1 usba: [ID 349649 kern.info] OEM Mass Storage plus ABCDEF0123456789
Aug 25 15:31:44 HDBS1 genunix: [ID 936769 kern.info] hid3 is /pci@400/pci@0/pci@1/pci@0/usb@0,2/hub@4/device@4/input@1
Aug 25 16:20:55 HDBS1 in.mpathd[297]: [ID 594170 daemon.error] NIC failure detected on nxge2 of group sc_ipmp1
Aug 25 16:20:55 HDBS1 Cluster.PNM: [ID 890413 daemon.notice] sc_ipmp1: state transition from OK to DOWN.
Aug 25 16:20:55 HDBS1 in.mpathd[297]: [ID 594170 daemon.error] NIC failure detected on nxge0 of group sc_ipmp0
Aug 25 16:20:55 HDBS1 Cluster.PNM: [ID 890413 daemon.notice] sc_ipmp0: state transition from OK to DOWN.
IPMP |
|