- 论坛徽章:
- 0
|
公司两台服务器配置了Sun Cluster,两节点 hrsms01、hrsms02 ,数据库初始运行在 hrsms01上
前几天hrsms01因为内存卡问题,突然宕机,数据库自动切换到hrsms02上,但当更换内存条hrsms01重新加入Cluster后,执行
root@hrsms02 # scswitch -z -g oracle-rg -h hrsms01
scswitch: Resource group oracle-rg failed to start on chosen node and may fail over to other node(s)
日志如下:
Sep 19 14:37:52 hrsms01 Cluster.RGM.rgmd: [ID 525628 daemon.notice] CMM: Cluster has reached quorum.
Sep 19 14:37:52 hrsms01 Cluster.RGM.rgmd: [ID 377347 daemon.notice] CMM: Node hrsms01 (nodeid = 1) is up; new incarnation number = 1
348036672.
Sep 19 14:37:52 hrsms01 Cluster.RGM.rgmd: [ID 377347 daemon.notice] CMM: Node hrsms02 (nodeid = 2) is up; new incarnation number = 1
348030597.
Sep 19 14:37:52 hrsms01 Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <bin/oracle_server_boot> for resource <oracle-r
es>, resource group <oracle-rg>, timeout <30> seconds
Sep 19 14:37:52 hrsms01 Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <bin/oracle_listener_boot> for resource <oracle
-lsn>, resource group <oracle-rg>, timeout <30> seconds
Sep 19 14:37:52 hrsms01 Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <bin/oracle_listener_boot> completed successfully for res
ource <oracle-lsn>, resource group <oracle-rg>, time used: 0% of timeout <30 seconds>
Sep 19 14:37:52 hrsms01 Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <bin/oracle_server_boot> completed successfully for resou
rce <oracle-res>, resource group <oracle-rg>, time used: 0% of timeout <30 seconds>
Sep 19 14:37:54 hrsms01 snmpXdmid: [ID 723131 daemon.error] Error in Adding Row for Subscription Table Entry
Sep 19 14:37:54 hrsms01 snmpXdmid: [ID 132663 daemon.error] Failed to add filter to SP for Event delivery
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 922726 daemon.notice] The status of device: /dev/did/rdsk/d1s0 is set to MONITORED
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 922726 daemon.notice] The status of device: /dev/did/rdsk/d2s0 is set to MONITORED
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 922726 daemon.notice] The status of device: /dev/did/rdsk/d4s0 is set to MONITORED
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 922726 daemon.notice] The status of device: /dev/did/rdsk/d5s0 is set to MONITORED
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 977412 daemon.notice] The state of the path to device: /dev/did/rdsk/d4s0 has changed to
FAILED
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 977412 daemon.notice] The state of the path to device: /dev/did/rdsk/d5s0 has changed to
FAILEDSep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 922726 daemon.notice] The status of device: /dev/did/rdsk/d6s0 is set to MONITORED
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 922726 daemon.notice] The status of device: /dev/did/rdsk/d7s0 is set to MONITORED
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 977412 daemon.notice] The state of the path to device: /dev/did/rdsk/d6s0 has changed to
FAILED
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 977412 daemon.notice] The state of the path to device: /dev/did/rdsk/d7s0 has changed to
FAILEDSep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 489913 daemon.notice] The state of the path to device: /dev/did/rdsk/d1s0 has changed to
OK
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 489913 daemon.notice] The state of the path to device: /dev/did/rdsk/d2s0 has changed to
OK
Sep 19 14:38:00 hrsms01 pseudo: [ID 129642 kern.info] pseudo-device: vol0
Sep 19 14:38:00 hrsms01 genunix: [ID 936769 kern.info] vol0 is /pseudo/vol@0
Sep 19 14:38:29 hrsms01 genunix: [ID 408822 kern.info] NOTICE: ce1: no fault external to device; service available
Sep 19 14:38:29 hrsms01 genunix: [ID 611667 kern.info] NOTICE: ce1: xcvr addr:0x01 - link up 100 Mbps full duplex
ep 19 14:40:00 hrsms01 pseudo: [ID 129642 kern.info] pseudo-device: devinfo0
Sep 19 14:40:00 hrsms01 genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo@0
Sep 19 14:50:25 hrsms01 Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_prenet_start> for resource <plmmcsg>, r
esource group <oracle-rg>, timeout <300> seconds
Sep 19 14:50:26 hrsms01 Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hafoip_prenet_start> completed successfully for resource
<plmmcsg>, resource group <oracle-rg>, time used: 0% of timeout <300 seconds>
Sep 19 14:50:26 hrsms01 Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hastorageplus_prenet_start> for resource <orac
le-ha>, resource group <oracle-rg>, timeout <1800> seconds
Sep 19 14:50:28 hrsms01 Cluster.Framework: [ID 801593 daemon.notice] stdout: becoming primary for plmds
Sep 19 14:50:29 hrsms01 Cluster.Framework: [ID 801593 daemon.error] stderr: metaset: hrsms01: plmds: there are no existing databases
Sep 19 14:50:29 hrsms01 Cluster.Framework: [ID 801593 daemon.error] stderr: metaset: hrsms01: plmds: must be owner of the set for th
is commandSep 19 14:51:04 hrsms01 Cluster.Framework: [ID 801593 daemon.notice] stdout: becoming primary for plmds
Sep 19 14:51:05 hrsms01 Cluster.Framework: [ID 801593 daemon.error] stderr: metaset: hrsms01: plmds: there are no existing databases
Sep 19 14:51:05 hrsms01 Cluster.Framework: [ID 801593 daemon.error] stderr: metaset: hrsms01: plmds: must be owner of the set for th
is command
Sep 19 14:51:08 hrsms01 SC[SUNW.HAStoragePlus:2,oracle-rg,oracle-ha,hastorageplus_prenet_start_private]: [ID 500133 daemon.warning]
Device switchover of global service plmds associated with path /u02 to this node failed: Node failed to become the primary.
Sep 19 14:51:08 hrsms01 SC[SUNW.HAStoragePlus:2,oracle-rg,oracle-ha,hastorageplus_prenet_start_private]: [ID 500133 daemon.warning]
Device switchover of global service plmds associated with path /u03 to this node failed: Node failed to become the primary.
Sep 19 14:51:08 hrsms01 SC[SUNW.HAStoragePlus:2,oracle-rg,oracle-ha,hastorageplus_prenet_start_private]: [ID 790080 daemon.error] Gl
obal service plmds associated with path /u02 is unable to become a primary on node 1.Sep 19 14:51:08 hrsms01 Cluster.RGM.rgmd: [ID 938318 daemon.error] Method <hastorageplus_prenet_start> failed on resource <oracle-ha
> in resource group <oracle-rg> [exit code <1>, time used: 2% of timeout <1800 seconds>]
Sep 19 14:51:08 hrsms01 Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hastorageplus_stop> for resource <oracle-ha>,
resource group <oracle-rg>, timeout <1800> seconds
Sep 19 14:51:08 hrsms01 Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hastorageplus_stop> completed successfully for resource
<oracle-ha>, resource group <oracle-rg>, time used: 0% of timeout <1800 seconds>
Sep 19 14:51:08 hrsms01 Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_stop> for resource <plmmcsg>, resource
group <oracle-rg>, timeout <300> seconds
Sep 19 14:51:08 hrsms01 ip: [ID 347787 kern.notice] TCP_IOC_ABORT_CONN: local = 192.168.099.070:0, remote = 000.000.000.000:0, start
= -2, end = 6
Sep 19 14:51:08 hrsms01 ip: [ID 440816 kern.notice] TCP_IOC_ABORT_CONN: aborted 0 connection
Sep 19 14:51:08 hrsms01 Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hafoip_stop> completed successfully for resource <plmmcs
g>, resource group <oracle-rg>, time used: 0% of timeout <300 seconds>
Sep 19 14:51:08 hrsms01 Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hastorageplus_postnet_stop> for resource <orac
le-ha>, resource group <oracle-rg>, timeout <1800> seconds
@
我首先关注了下
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 977412 daemon.notice] The state of the path to device: /dev/did/rdsk/d4s0 has changed to
FAILED
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 977412 daemon.notice] The state of the path to device: /dev/did/rdsk/d5s0 has changed to
FAILED
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 977412 daemon.notice] The state of the path to device: /dev/did/rdsk/d6s0 has changed to
FAILED
Sep 19 14:37:54 hrsms01 Cluster.scdpmd: [ID 977412 daemon.notice] The state of the path to device: /dev/did/rdsk/d7s0 has changed to
FAILED
hrsms01>#scdidadm -L
1 hrsms01:/dev/rdsk/c0t0d0 /dev/did/rdsk/d1
2 hrsms01:/dev/rdsk/c0t1d0 /dev/did/rdsk/d2
3 hrsms01:/dev/rdsk/c0t6d0 /dev/did/rdsk/d3
4 hrsms02:/dev/rdsk/c6t600A0B80001F71B200000AB44509B55Ed0 /dev/did/rdsk/d4
4 hrsms01:/dev/rdsk/c2t600A0B80001F71B200000AB44509B55Ed0 /dev/did/rdsk/d4
5 hrsms02:/dev/rdsk/c6t600A0B800018EF9D000000154509B527d0 /dev/did/rdsk/d5
5 hrsms01:/dev/rdsk/c2t600A0B800018EF9D000000154509B527d0 /dev/did/rdsk/d5
6 hrsms02:/dev/rdsk/c6t600A0B80001F71B200000AB34509B4DEd0 /dev/did/rdsk/d6
6 hrsms01:/dev/rdsk/c2t600A0B80001F71B200000AB34509B4DEd0 /dev/did/rdsk/d6
7 hrsms02:/dev/rdsk/c6t600A0B800018EF9D000000134509B4BDd0 /dev/did/rdsk/d7
7 hrsms01:/dev/rdsk/c2t600A0B800018EF9D000000134509B4BDd0 /dev/did/rdsk/d7
8 hrsms02:/dev/rdsk/c1t0d0 /dev/did/rdsk/d8
11 hrsms02:/dev/rdsk/c0t0d0 /dev/did/rdsk/d11
12 hrsms02:/dev/rdsk/c1t5d0 /dev/did/rdsk/d12
13 hrsms02:/dev/rdsk/c1t1d0 /dev/did/rdsk/d13
14 hrsms02:/dev/rdsk/c1t2d0 /dev/did/rdsk/d14
16 hrsms02:/dev/rdsk/c1t4d0 /dev/did/rdsk/d16
17 hrsms02:/dev/rdsk/c1t3d0 /dev/did/rdsk/d17
8187 hrsms02:/dev/rmt/1 /dev/did/rmt/5
8188 hrsms01:/dev/rmt/2 /dev/did/rmt/4
8189 hrsms01:/dev/rmt/1 /dev/did/rmt/3
8190 hrsms02:/dev/rmt/0 /dev/did/rmt/2
8191 hrsms01:/dev/rmt/0 /dev/did/rmt/1
hrsms01>#scdpm -p all
hrsms01:/dev/did/rdsk/d1 Ok
hrsms01:/dev/did/rdsk/d2 Ok
hrsms01:/dev/did/rdsk/d4 Fail
hrsms01:/dev/did/rdsk/d5 Fail
hrsms01:/dev/did/rdsk/d6 Fail
hrsms01:/dev/did/rdsk/d7 Fail
hrsms02:/dev/did/rdsk/d12 Ok
hrsms02:/dev/did/rdsk/d13 Ok
hrsms02:/dev/did/rdsk/d14 Ok
hrsms02:/dev/did/rdsk/d16 Ok
hrsms02:/dev/did/rdsk/d17 Ok
hrsms02:/dev/did/rdsk/d4 Ok
hrsms02:/dev/did/rdsk/d5 Ok
hrsms02:/dev/did/rdsk/d6 Ok
hrsms02:/dev/did/rdsk/d7 Ok
hrsms02:/dev/did/rdsk/d8 Ok
结果显示 在hrsms01上确实无法访问
hrsms01:/dev/did/rdsk/d4 Fail
hrsms01:/dev/did/rdsk/d5 Fail
hrsms01:/dev/did/rdsk/d6 Fail
hrsms01:/dev/did/rdsk/d7 Fail
我现在hrsms01上尝试访问下/dev/did/rdsk/d2
hrsms01>#prtvtoc /dev/did/rdsk/d4s2
prtvtoc: /dev/did/rdsk/d4s2: No such device or address
而在hrsms02上则可以
root@hrsms02 # prtvtoc /dev/did/rdsk/d4s2
* /dev/did/rdsk/d4s2 partition map
*
* Dimensions:
* 512 bytes/sector
* 64 sectors/track
* 64 tracks/cylinder
* 4096 sectors/cylinder
* 25600 cylinders
* 25598 accessible cylinders
*
* Flags:
* 1: unmountable
* 10: read-only
使用Format查看磁盘信息
hrsms01>#format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/ssm@0,0/pci@18,600000/pci@2/scsi@2/sd@0,0
1. c0t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/ssm@0,0/pci@18,600000/pci@2/scsi@2/sd@1,0
Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 12288 104837120 104849407
7 4 01 0 12288 12287
root@hrsms02 # format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000014c3e031d7,0
1. c1t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e011e31a01,0
2. c1t2d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e011e7be61,0
3. c1t3d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e011e80151,0
4. c1t4d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e011e7fad1,0
5. c1t5d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e011e3c451,0
6. c6t600A0B80001F71B200000AB34509B4DEd0 <IBM-1722-600-0520 cyl 51198 alt 2 hd 256 sec 64>
/scsi_vhci/ssd@g600a0b80001f71b200000ab34509b4de
7. c6t600A0B80001F71B200000AB44509B55Ed0 <IBM-1722-600-0520 cyl 25598 alt 2 hd 64 sec 64>
/scsi_vhci/ssd@g600a0b80001f71b200000ab44509b55e
8. c6t600A0B800018EF9D000000134509B4BDd0 <IBM-1722-600-0520 cyl 51198 alt 2 hd 256 sec 64>
/scsi_vhci/ssd@g600a0b800018ef9d000000134509b4bd
9. c6t600A0B800018EF9D000000154509B527d0 <IBM-1722-600-0520 cyl 25996 alt 2 hd 64 sec 64>
/scsi_vhci/ssd@g600a0b800018ef9d000000154509b527
再关注一下硬件配置信息:
hrsms01>#cfgadm -al
Ap_Id Type Receptacle Occupant Condition
N0.IB6 PCI+_I/O_Bo connected configured ok
N0.IB6::pci0 io connected configured ok
N0.IB6::pci1 io connected configured ok
N0.IB6::pci2 io connected configured ok
N0.IB6::pci3 io connected configured ok
N0.IB8 PCI+_I/O_Bo connected configured ok
N0.IB8::pci0 io connected configured ok
N0.IB8::pci1 io connected configured ok
N0.IB8::pci2 io connected configured ok
N0.IB8::pci3 io connected configured ok
N0.SB0 unknown empty unconfigured unknown
N0.SB2 unknown empty unconfigured unknown
N0.SB4 CPU_V3 connected configured ok
N0.SB4::cpu0 cpu connected configured ok
N0.SB4::cpu1 cpu connected configured ok
N0.SB4::cpu2 cpu connected configured ok
N0.SB4::cpu3 cpu connected configured ok
N0.SB4::memory memory connected configured ok
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 disk connected configured unknown
c0::dsk/c0t1d0 disk connected configured unknown
c0::dsk/c0t6d0 CD-ROM connected configured unknown
c0::es/ses0 processor connected configured unknown
c0::es/ses1 processor connected configured unknown
c0::rmt/0 tape connected configured unknown
c1 scsi-bus connected unconfigured unknown
c6 fc connected unconfigured unknown
c7 fc connected unconfigured unknown
c8 fc-fabric connected unconfigured unknownc
8::200600a0b81f71b4 disk connected unconfigured unknownc8::210100e08ba7607a unknown connected unconfigured unknown
c9 fc-fabric connected configured unknown
c9::200700a0b81f71b4 disk connected unconfigured unknownc9::210000e08b87607a unknown connected unconfigured unknown
c9::500308c146699004 tape connected configured unknown
c9::500308c146699007 tape connected configured unknown
root@hrsms02 # cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 fc-private connected configured unknown
c1::21000014c3e031d7 disk connected configured unknown
c1::500000e011e31a01 disk connected configured unknown
c1::500000e011e3c451 disk connected configured unknown
c1::500000e011e7be61 disk connected configured unknown
c1::500000e011e7fad1 disk connected configured unknown
c1::500000e011e80151 disk connected configured unknown
c1::5080020000251231 ESI connected configured unknown
c4 fc-fabric connected configured unknown
c4::200700a0b81f71b4 disk connected configured unknownc4::210000e08b11eb72 unknown connected unconfigured unknown
c4::500308c146699004 tape connected unconfigured unknown
c4::500308c146699007 tape connected unconfigured unknown
c5 fc-fabric connected configured unknown
c5::200600a0b81f71b4 disk connected configured unknownc5::210000e08b11ea72 unknown connected unconfigured unknown
pcisch0:hpc1_slot0 ethernet/hp connected configured ok
pcisch0:hpc1_slot1 ethernet/hp connected configured ok
pcisch0:hpc1_slot2 mult/hp connected configured ok
pcisch0:hpc1_slot3 ethernet/hp connected configured ok
pcisch2:hpc2_slot4 unknown empty unconfigured unknown
pcisch2:hpc2_slot5 unknown empty unconfigured unknown
pcisch2:hpc2_slot6 unknown empty unconfigured unknown
pcisch3:hpc0_slot7 unknown empty unconfigured unknown
pcisch3:hpc0_slot8 vgs8514/hp connected configured ok
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb0/4 unknown empty unconfigured ok
请问众位,上述状态正常吗?
在 hrsms01 上 两个HBA卡一个是Configured ,一个是 unconfigued,感觉好像有点问题
而在hrsms02上,两个HBA卡都是configured
在正常的Sun Cluster环境里,一旦资源切换以后,使用Format在此节点上就无法看到共享磁盘了吗?
|
|