免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
123下一页
最近访问板块 发新帖
查看: 12275 | 回复: 21
打印 上一主题 下一主题

[HACMP集群] hacmp的问题 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2008-04-11 21:38 |只看该作者 |倒序浏览
os版本是53-05-csp ha的版本是5.3 ,以及打上最新的fix,由于是测试,用sevice-ip做为资源组,下面是一些拓扑和配置信息
# ./clilsif -S
ZHQZA_boot           boot       net_ether_01 ether      public     ZHQZ_A     192.1.2.93                        en0                               255.255.255.0
ZHQZA_stdy           boot       net_ether_01 ether      public     ZHQZ_A     192.1.1.93                        en2                               255.255.255.0
ZHQZ_service         service    net_ether_01 ether      public     ZHQZ_A     130.1.1.93                                                          255.255.255.0
ZHQZAtty0            service    net_rs232_01 rs232      serial     ZHQZ_A     /dev/tty0                                                                     
ZHQZB_boot           boot       net_ether_01 ether      public     ZHQZ_B     192.1.2.91                        en0                               255.255.255.0
ZHQZB_stdy           boot       net_ether_01 ether      public     ZHQZ_B     192.1.1.91                        en2                               255.255.255.0
ZHQZ_service         service    net_ether_01 ether      public     ZHQZ_B     130.1.1.93                                                          255.255.255.0
ZHQZBtty0            service    net_rs232_01 rs232      serial     ZHQZ_B     /dev/tty0                                                                     
# ./cllscf
Cluster Name: zhqzcluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
There were 2 networks defined : net_ether_01, net_rs232_01
There are 2 nodes in this cluster.

NODE ZHQZ_A:
        This node has 2 service IP label(s):

        Service IP Label ZHQZ_service:
                IP address:     130.1.1.93
                Hardware Address:      
                Network:        net_ether_01
                Attribute:      public
                Aliased Address?:       Enable

        Service IP Label ZHQZ_service has 2 communication interfaces.
                (Alternate Service) Communication Interface 1: ZHQZA_boot
                IP address:     192.1.2.93
                Network:        net_ether_01
                Attribute:      public
                        Alias address for heartbeat:   
                (Alternate Service) Communication Interface 2: ZHQZA_stdy
                IP address:     192.1.1.93
                Network:        net_ether_01
                Attribute:      public
                        Alias address for heartbeat:   
        Service IP Label ZHQZ_service has no communication interfaces for recovery.


        Service IP Label ZHQZAtty0:
                IP address:     /dev/tty0
                Hardware Address:      
                Network:        net_rs232_01
                Attribute:      serial
                Aliased Address?:       Disable

        Service IP Label ZHQZAtty0 has no communication interfaces.
        Service IP Label ZHQZAtty0 has no communication interfaces for recovery.


NODE ZHQZ_B:
        This node has 2 service IP label(s):

        Service IP Label ZHQZ_service:
                IP address:     130.1.1.93
                Hardware Address:      
                Network:        net_ether_01
                Attribute:      public
                Aliased Address?:       Enable

        Service IP Label ZHQZ_service has 2 communication interfaces.
                (Alternate Service) Communication Interface 1: ZHQZB_boot
                IP address:     192.1.2.91
                Network:        net_ether_01
                Attribute:      public
                        Alias address for heartbeat:   
                (Alternate Service) Communication Interface 2: ZHQZB_stdy
                IP address:     192.1.1.91
                Network:        net_ether_01
                Attribute:      public
                        Alias address for heartbeat:   
        Service IP Label ZHQZ_service has no communication interfaces for recovery.


        Service IP Label ZHQZBtty0:
                IP address:     /dev/tty0
                Hardware Address:      
                Network:        net_rs232_01
                Attribute:      serial
                Aliased Address?:       Disable

        Service IP Label ZHQZBtty0 has no communication interfaces.
        Service IP Label ZHQZBtty0 has no communication interfaces for recovery.




Breakdown of network connections:

Connections to network net_ether_01
        Node ZHQZ_A is connected to network net_ether_01 by these interfaces:
                ZHQZA_boot
                ZHQZA_stdy
                ZHQZ_service

        Node ZHQZ_B is connected to network net_ether_01 by these interfaces:
                ZHQZB_boot
                ZHQZB_stdy
                ZHQZ_service


Connections to network net_rs232_01
        Node ZHQZ_A is connected to network net_rs232_01 by these interfaces:
                ZHQZAtty0

        Node ZHQZ_B is connected to network net_rs232_01 by these interfaces:
                ZHQZBtty0
lssrc -ls topsvcs
Subsystem         Group            PID     Status
topsvcs          topsvcs          323822  active
Network Name   Indx Defd  Mbrs  St   Adapter ID      Group ID
net_ether_01_0 [ 0] 2     1     S    192.1.2.91      192.1.2.91     
net_ether_01_0 [ 0] en0              0x47fdc011      0x47fdc025
HB Interval = 2.000 secs. Sensitivity = 12 missed beats
Missed HBs: Total: 12 Current group: 0
Packets sent    : 377 ICMP 11 Errors: 0 No mbuf: 0
Packets received: 498 ICMP 53 Dropped: 0
NIM's PID: 327848
net_ether_01_1 [ 1] 2     0     D    192.1.1.91     
net_ether_01_1 [ 1] en2            
HB Interval = 2.000 secs. Sensitivity = 12 missed beats
Missed HBs: Total: 12 Current group: 12
Packets sent    : 291 ICMP 11 Errors: 0 No mbuf: 0
Packets received: 486 ICMP 53 Dropped: 0
NIM's PID: 323904
rs232_0        [ 2] 2     0     D    255.255.0.1   
rs232_0        [ 2] tty0            
HB Interval = 2.000 secs. Sensitivity = 5 missed beats
Missed HBs: Total: 5 Current group: 0
Packets sent    : 803 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 365 ICMP 0 Dropped: 0
NIM's PID: 299346
  2 locally connected Clients with PIDs:
haemd(389516) hagsd(393352)
  Dead Man Switch Enabled:
     reset interval = 1 seconds
     trip  interval = 48 seconds
  Configuration Instance = 7
  Daemon employs no security
  Segments pinned: Text Data.
  Text segment size: 784 KB. Static data segment size: 1526 KB.
  Dynamic data segment size: 3841. Number of outstanding malloc: 175
  User time 0 sec. System time 0 sec.
  Number of page faults: 0. Process swapped out 0 times.
  Number of nodes up: 1. Number of nodes down: 1.
  Nodes up : 2
# ./cllsnode

NODE ZHQZ_A:
        Interfaces to network net_ether_01
                Communication Interface: Name ZHQZA_boot, Attribute public, IP address 192.1.2.93
                Communication Interface: Name ZHQZA_stdy, Attribute public, IP address 192.1.1.93
                Communication Interface: Name ZHQZ_service, Attribute public, IP address 130.1.1.93
        Interfaces to network net_rs232_01
                Communication Interface: Name ZHQZAtty0, Attribute serial, IP address /dev/tty0

NODE ZHQZ_B:
        Interfaces to network net_ether_01
                Communication Interface: Name ZHQZB_boot, Attribute public, IP address 192.1.2.91
                Communication Interface: Name ZHQZB_stdy, Attribute public, IP address 192.1.1.91
                Communication Interface: Name ZHQZ_service, Attribute public, IP address 130.1.1.93
        Interfaces to network net_rs232_01
                Communication Interface: Name ZHQZBtty0, Attribute serial, IP address /dev/tty0
# ./cllsres
FORCED_VARYON="false"
FSCHECK_TOOL="fsck"
FS_BEFORE_IPADDR="false"
RECOVERY_METHOD="sequential"
SERVICE_LABEL="ZHQZ_service"
SSA_DISK_FENCING="false"
VG_AUTO_IMPORT="false"
资源组是默认的cascading资源组,碰到的问题是两块网卡全部down下去,资源不漂移,另外一台机器不接管,用halt -q关去一台机器,另外一台机器不接管,用errpt查看有topsvcs grpsvcs haemd报错,用ps -ef |grep cluster 查看进程,发现有haemd和harmd的进程.

论坛徽章:
0
2 [报告]
发表于 2008-04-12 13:06 |只看该作者
先手工切换资源看是否正常, 如不正常, 检查资源配置 . 如正常,检查下列:
1. 查看hacmp.out的相关报错信息
2. 检查心跳是否正常
3. 查看网络状态

论坛徽章:
0
3 [报告]
发表于 2008-04-14 09:55 |只看该作者
用smitty clstop停止hacmp服务,方式是takeover,资源是切换的,在c-spoc下资源漂移也是没有问题,测试用cat /etchosts > /deev/tty0 ,cat < /dev/tty0,两边串口也是没有问题的.感觉是脑裂的状况.

论坛徽章:
0
4 [报告]
发表于 2008-04-14 09:59 |只看该作者
机器的型号是55Q,微码是最新的,338,感觉是RSCT方面的问题,RSCT的版本是2.4.5.0,HACMP的版本是5.3.0.5,不知道是不是要把RSCT升级到2.4.7.0,还是2.4.8.0

论坛徽章:
0
5 [报告]
发表于 2008-04-14 10:53 |只看该作者
Apr 10 11:00:31 EVENT START: node_down ZHQZ_A

:node_down[63] [[ high == high ]]
:node_down[63] version=1.50.1.3
:node_down[64] HA_DIR=es
:node_down[66] NODENAME=ZHQZ_A
:node_down[66] export NODENAME
:node_down[67] PARAM=''
:node_down[67] export PARAM
:node_down[69] UPDATESTATDFILE=/usr/es/sbin/cluster/etc/updatestatd
:node_down[72] : This will be the exit status seen by the Cluster Manager.
:node_down[73] : If STATUS is not 0, the Cluster Manager will enter reconfiguration
:node_down[74] : All lower-level scripts should pass status back to the caller.
:node_down[75] : This will allow a Resource Groups to be processed individually,
:node_down[76] : independent of the status of another resource group.
:node_down[78] STATUS=0
:node_down[78] typeset -i STATUS
:node_down[80] EMULATE=REAL
:node_down[82] set -u
:node_down[84] (( 1 < 1 ))
:node_down[89] rm -f /tmp/.RPCLOCKDSTOPPED
:node_down[90] rm -f /usr/es/sbin/cluster/etc/updatestatd
:node_down[92] [[ '' == forced ]]
:node_down[112] UPDATESTATD=0
:node_down[113] export UPDATESTATD
:node_down[116] : If RG_DEPENDENCIES was set to true by the cluster manager,
:node_down[117] : then every resource group action is taken via rg_move events.
:node_down[119] [[ FALSE == FALSE ]]
:node_down[122] : Set the RESOURCE_GROUPS environment variable with the names
:node_down[123] : of all Resource Groups participating in this event, and export
:node_down[124] : them to all successive scripts.
:node_down[126] set -a
:node_down[127] clsetenvgrp ZHQZ_A node_down
:clsetenvgrp[50] [[ high = high ]]
:clsetenvgrp[50] version=1.16
:clsetenvgrp[52] usingVer=clSetenvgrp
:clsetenvgrp[57] clSetenvgrp ZHQZ_A node_down
executing clSetenvgrp
clSetenvgrp completed successfully
:clsetenvgrp[58] exit 0
:node_down[127] eval FORCEDOWN_GROUPS='""' RESOURCE_GROUPS='""' HOMELESS_GROUPS='""' HOMELESS_FOLLOWER_GROUPS='""' ERRSTATE_GROUPS='""' PRINCIPAL_ACTIONS='""' ASSOCIATE_ACTIONS='""' AUXILLIARY_ACTIONS='""'
:node_down[1] FORCEDOWN_GROUPS=''
:node_down[1] RESOURCE_GROUPS=''
:node_down[1] HOMELESS_GROUPS=''
:node_down[1] HOMELESS_FOLLOWER_GROUPS=''
:node_down[1] ERRSTATE_GROUPS=''
:node_down[1] PRINCIPAL_ACTIONS=''
:node_down[1] ASSOCIATE_ACTIONS=''
:node_down[1] AUXILLIARY_ACTIONS=''
:node_down[128] RC=0
:node_down[129] set +a
:node_down[130] (( 0 != 0 ))
:node_down[135] : Process_Resources for parallel-processed resource groups
:node_down[136] : If RG_DEPENDENCIES is true, then this call is responsible for
:node_down[137] : starting the necessary rg_move events.
:node_down[139] process_resources
:process_resources[2230] [[ high = high ]]
:process_resources[2230] version=1.84.1.11
:process_resources[2231] :process_resources[2231] cl_get_path
HA_DIR=es
:process_resources[2233] STATUS=0
:process_resources[2234] sddsrv_off=FALSE
:process_resources[2236] [ ! -n  ]
:process_resources[2238] EMULATE=REAL
:process_resources[2243] cut -c1-2
:process_resources[2243] oslevel -r
:process_resources[2243] [[ 53 > 52 ]]
:process_resources[2245] FORCED=-F
:process_resources[2250] true
:process_resources[2252] set -a
:process_resources[2255] clRGPA
:clRGPA[49] [[ high = high ]]
:clRGPA[49] version=1.16
:clRGPA[51] usingVer=clrgpa
:clRGPA[56] clrgpa
:clRGPA[57] exit 0
:process_resources[2255] eval JOB_TYPE=ERROR RESOURCE_GROUPS="zhqzrg"
:process_resources[2255] JOB_TYPE=ERROR RESOURCE_GROUPS=zhqzrg
:process_resources[2257] RC=0
:process_resources[2258] set +a
:process_resources[2260] [ 0 -ne 0 ]
:process_resources[2510] set_resource_group_state ERROR
:process_resources[3] STAT=0
zhqzrg:process_resources[6] export GROUPNAME
zhqzrg:process_resources[7] [ ERROR != DOWN ]
zhqzrg:process_resources[9] [ REAL = EMUL ]
zhqzrg:process_resources[14] clchdaemons -d clstrmgr_scripts -t resource_locator -n ZHQZ_B -o zhqzrg -v ERROR
zhqzrg:process_resources[15] [ 0 -ne 0 ]
zhqzrg:process_resources[26] [ ERROR = ACQUIRING ]
zhqzrg:process_resources[31] [ ERROR = RELEASING ]
zhqzrg:process_resources[36] [ ERROR = UP ]
zhqzrg:process_resources[41] [ ERROR = DOWN ]
zhqzrg:process_resources[46] [ ERROR = ERROR ]
zhqzrg:process_resources[48] cl_RMupdate rg_error zhqzrg process_resources
Reference string: Thu.Apr.10.11:00:32.BEIDT.2008.process_resources.zhqzrg.ref
zhqzrg:process_resources[49] continue
zhqzrg:process_resources[80] return 0
zhqzrg:process_resources[2250] true
zhqzrg:process_resources[2252] set -a
zhqzrg:process_resources[2255] clRGPA
zhqzrg:clRGPA[49] [[ high = high ]]
zhqzrg:clRGPA[49] version=1.16
zhqzrg:clRGPA[51] usingVer=clrgpa
zhqzrg:clRGPA[56] clrgpa
zhqzrg:clRGPA[57] exit 0
zhqzrg:process_resources[2255] eval JOB_TYPE=NONE
zhqzrg:process_resources[2255] JOB_TYPE=NONE
zhqzrg:process_resources[2257] RC=0
zhqzrg:process_resources[2258] set +a
zhqzrg:process_resources[2260] [ 0 -ne 0 ]
zhqzrg:process_resources[2547] break
zhqzrg:process_resources[2558] [[ FALSE = TRUE ]]
zhqzrg:process_resources[2564] exit 0
:node_down[145] : if the rpc statd got updated during process_resources, we do not have to
:node_down[146] : update it again.
:node_down[148] [[ -f /usr/es/sbin/cluster/etc/updatestatd ]]
:node_down[155] : For each participating resource group, serially process the resources
:node_down[189] [[ REAL == EMUL ]]
:node_down[194] [[ -f /tmp/.RPCLOCKDSTOPPED ]]
:node_down[222] : Process_Resources for SSA fencing
:node_down[224] process_resources FENCE
:process_resources[2230] [[ high = high ]]
:process_resources[2230] version=1.84.1.11
:process_resources[2231] :process_resources[2231] cl_get_path
HA_DIR=es
:process_resources[2233] STATUS=0
:process_resources[2234] sddsrv_off=FALSE
:process_resources[2236] [ ! -n  ]
:process_resources[2238] EMULATE=REAL
:process_resources[2243] cut -c1-2
:process_resources[2243] oslevel -r
:process_resources[2243] [[ 53 > 52 ]]
:process_resources[2245] FORCED=-F
:process_resources[2250] true
:process_resources[2252] set -a
:process_resources[2255] clRGPA FENCE
:clRGPA[49] [[ high = high ]]
:clRGPA[49] version=1.16
:clRGPA[51] usingVer=clrgpa
:clRGPA[56] clrgpa FENCE
:clRGPA[57] exit 0
:process_resources[2255] eval JOB_TYPE=NONE
:process_resources[2255] JOB_TYPE=NONE
:process_resources[2257] RC=0
:process_resources[2258] set +a
:process_resources[2260] [ 0 -ne 0 ]
:process_resources[2547] break
:process_resources[2558] [[ FALSE = TRUE ]]
:process_resources[2564] exit 0
:node_down[232] : Check to see if this node is going down. If so, perform clean up
:node_down[233] : independent of resource groups owned by this node
:node_down[235] [[ ZHQZ_A == ZHQZ_B ]]
:node_down[283] : Perform any fencing necessary for concurrent volume groups
:node_down[285] cl_fence_vg ZHQZ_A
:cl_fence_vg[443] [[ high == high ]]
:cl_fence_vg[443] version=1.15
:cl_fence_vg[445] HA_DIR=es
:cl_fence_vg[447] export All_DHB_disks
:cl_fence_vg[449] [[ -z ZHQZ_B ]]
:cl_fence_vg[458] : Accept a formal parameter of 'name of node that failed' if none were set
:cl_fence_vg[459] : in the environment
:cl_fence_vg[461] EVENTNODE=ZHQZ_A
:cl_fence_vg[463] [[ -z ZHQZ_A ]]
:cl_fence_vg[472] : An explicit volume group list can be passed. Pick up any such
:cl_fence_vg[474] shift
:cl_fence_vg[475] vg_list=''
:cl_fence_vg[477] common_groups=''
:cl_fence_vg[478] common_vgs=''
:cl_fence_vg[480] [[ -z '' ]]
:cl_fence_vg[483] : Find all the concurrent resource groups that contain both ZHQZ_A and ZHQZ_B
:cl_fence_vg[483] sed -n '/group = /s/^.* "\(.*\)".*/\1/p'
:cl_fence_vg[483] odmget -q 'startup_pref = OAAN' HACMPgroup
:cl_fence_vg[515] : Look at each of the resource groups in turn to determine what concurrent
:cl_fence_vg[516] : volume groups the local node ZHQZ_B share access with
:cl_fence_vg[517] : ZHQZ_A
:cl_fence_vg[543] : Process the list of common volume groups,
:node_down[288] exit 0
Apr 10 11:00:32 EVENT COMPLETED: node_down ZHQZ_A 0

                        HACMP Event Summary
Event: node_down ZHQZ_A
Start time: Thu Apr 10 11:00:31 2008

End time: Thu Apr 10 11:00:34 2008

Action:                Resource:                        Script Name:
----------------------------------------------------------------------------
Error encountered with group:        zhqzrg        process_resources
Search on: Thu.Apr.10.11:00:32.BEIDT.2008.process_resources.zhqzrg.ref
----------------------------------------------------------------------------

Apr 10 11:00:34 EVENT START: node_down_complete ZHQZ_A

:node_down_complete[80] [[ high = high ]]
:node_down_complete[80] version=1.2.3.46
:node_down_complete[81] :node_down_complete[81] cl_get_path
HA_DIR=es
:node_down_complete[83] export NODENAME=ZHQZ_A
:node_down_complete[84] export PARAM=
:node_down_complete[86] VSD_PROG=/usr/lpp/csd/bin/hacmp_vsd_down2
:node_down_complete[87] HPS_PROG=/usr/es/sbin/cluster/events/utils/cl_HPS_init
:node_down_complete[96] STATUS=0
:node_down_complete[98] [ ! -n  ]
:node_down_complete[100] EMULATE=REAL
:node_down_complete[103] set -u
:node_down_complete[105] [ 1 -lt 1 ]
:node_down_complete[111] [[  = forced ]]
:node_down_complete[133] [[ FALSE = FALSE ]]
:node_down_complete[141] set -a
:node_down_complete[142] clsetenvgrp ZHQZ_A node_down_complete
:clsetenvgrp[50] [[ high = high ]]
:clsetenvgrp[50] version=1.16
:clsetenvgrp[52] usingVer=clSetenvgrp
:clsetenvgrp[57] clSetenvgrp ZHQZ_A node_down_complete
executing clSetenvgrp
clSetenvgrp completed successfully
:clsetenvgrp[58] exit 0
:node_down_complete[142] eval FORCEDOWN_GROUPS="" RESOURCE_GROUPS="" HOMELESS_GROUPS="" HOMELESS_FOLLOWER_GROUPS="" ERRSTATE_GROUPS="" PRINCIPAL_ACTIONS="" ASSOCIATE_ACTIONS="" AUXILLIARY_ACTIONS=""
:node_down_complete[142] FORCEDOWN_GROUPS= RESOURCE_GROUPS= HOMELESS_GROUPS= HOMELESS_FOLLOWER_GROUPS= ERRSTATE_GROUPS= PRINCIPAL_ACTIONS= ASSOCIATE_ACTIONS= AUXILLIARY_ACTIONS=
:node_down_complete[143] RC=0
:node_down_complete[144] set +a
:node_down_complete[146] [ 0 -ne 0 ]
:node_down_complete[157] [[ FALSE = FALSE ]]
:node_down_complete[159] process_resources
:process_resources[2230] [[ high = high ]]
:process_resources[2230] version=1.84.1.11
:process_resources[2231] :process_resources[2231] cl_get_path
HA_DIR=es
:process_resources[2233] STATUS=0
:process_resources[2234] sddsrv_off=FALSE
:process_resources[2236] [ ! -n  ]
:process_resources[2238] EMULATE=REAL
:process_resources[2243] cut -c1-2
:process_resources[2243] oslevel -r
:process_resources[2243] [[ 53 > 52 ]]
:process_resources[2245] FORCED=-F
:process_resources[2250] true
:process_resources[2252] set -a
:process_resources[2255] clRGPA
:clRGPA[49] [[ high = high ]]
:clRGPA[49] version=1.16
:clRGPA[51] usingVer=clrgpa
:clRGPA[56] clrgpa
:clRGPA[57] exit 0
:process_resources[2255] eval JOB_TYPE=ERROR RESOURCE_GROUPS="zhqzrg"
:process_resources[2255] JOB_TYPE=ERROR RESOURCE_GROUPS=zhqzrg
:process_resources[2257] RC=0
:process_resources[2258] set +a
:process_resources[2260] [ 0 -ne 0 ]
:process_resources[2510] set_resource_group_state ERROR
:process_resources[3] STAT=0
zhqzrg:process_resources[6] export GROUPNAME
zhqzrg:process_resources[7] [ ERROR != DOWN ]
zhqzrg:process_resources[9] [ REAL = EMUL ]
zhqzrg:process_resources[14] clchdaemons -d clstrmgr_scripts -t resource_locator -n ZHQZ_B -o zhqzrg -v ERROR
zhqzrg:process_resources[15] [ 0 -ne 0 ]
zhqzrg:process_resources[26] [ ERROR = ACQUIRING ]
zhqzrg:process_resources[31] [ ERROR = RELEASING ]
zhqzrg:process_resources[36] [ ERROR = UP ]
zhqzrg:process_resources[41] [ ERROR = DOWN ]
zhqzrg:process_resources[46] [ ERROR = ERROR ]
zhqzrg:process_resources[48] cl_RMupdate rg_error zhqzrg process_resources
Reference string: Thu.Apr.10.11:00:34.BEIDT.2008.process_resources.zhqzrg.ref
zhqzrg:process_resources[49] continue
zhqzrg:process_resources[80] return 0
zhqzrg:process_resources[2250] true
zhqzrg:process_resources[2252] set -a
zhqzrg:process_resources[2255] clRGPA
zhqzrg:clRGPA[49] [[ high = high ]]
zhqzrg:clRGPA[49] version=1.16
zhqzrg:clRGPA[51] usingVer=clrgpa
zhqzrg:clRGPA[56] clrgpa
zhqzrg:clRGPA[57] exit 0
zhqzrg:process_resources[2255] eval JOB_TYPE=NONE
zhqzrg:process_resources[2255] JOB_TYPE=NONE
zhqzrg:process_resources[2257] RC=0
zhqzrg:process_resources[2258] set +a
zhqzrg:process_resources[2260] [ 0 -ne 0 ]
zhqzrg:process_resources[2547] break
zhqzrg:process_resources[2558] [[ FALSE = TRUE ]]
zhqzrg:process_resources[2564] exit 0
:node_down_complete[160] [ 0 -ne 0 ]
:node_down_complete[170] [ -f /usr/lpp/csd/bin/hacmp_vsd_down2 ]
:node_down_complete[189] :node_down_complete[189] odmget -qnodename = ZHQZ_B HACMPadapter
:node_down_complete[189] grep hps
:node_down_complete[189] grep type
SP_SWITCH=
:node_down_complete[191] :node_down_complete[191] lscfg -v
:node_down_complete[191] awk { print $4 }
:node_down_complete[191] grep css
:node_down_complete[191] LANG=C
SWITCH_TYPE=
:node_down_complete[192] :node_down_complete[192] lscfg -v
:node_down_complete[192] awk { print $4 }
:node_down_complete[192] grep sn
:node_down_complete[192] LANG=C
FED_TYPE=
:node_down_complete[199] [ -n  -a -f /usr/es/sbin/cluster/events/utils/cl_HPS_init -a -z  ]
:node_down_complete[240] LOCALCOMP=N
:node_down_complete[244] [[ FALSE = FALSE ]]
:node_down_complete[282] [ ZHQZ_A = ZHQZ_B ]
:node_down_complete[334] exit 0
Apr 10 11:00:35 EVENT COMPLETED: node_down_complete ZHQZ_A 0

                        HACMP Event Summary
Event: node_down_complete ZHQZ_A
Start time: Thu Apr 10 11:00:34 2008

End time: Thu Apr 10 11:00:37 2008

Action:                Resource:                        Script Name:
----------------------------------------------------------------------------
Error encountered with group:        zhqzrg        process_resources
Search on: Thu.Apr.10.11:00:34.BEIDT.2008.process_resources.zhqzrg.ref
--------------------------------------------------------------------------
我看了一下hacmp.out,发现有这样的报错,这是用halt -q做测试时,hacmp.out的输出.麻烦大家帮我看看产生这个问题的原因.

论坛徽章:
0
6 [报告]
发表于 2008-04-14 11:03 |只看该作者
Apr 10 11:19:28 EVENT START: fail_interface ZHQZ_A 192.1.2.93

:fail_interface[57] [[ high = high ]]
:fail_interface[57] version=1.7
:fail_interface[58] :fail_interface[58] cl_get_path
HA_DIR=es
:fail_interface[60] [ 2 -ne 2 ]
:fail_interface[66] NODENAME=ZHQZ_A
:fail_interface[67] ADDR=192.1.2.93
:fail_interface[69] set -u
:fail_interface[71] :fail_interface[71] dspmsg scripts.cat 8062 Interface 192.1.2.93 has failed on node ZHQZ_A.\n 192.1.2.93 ZHQZ_A
MSG=Interface 192.1.2.93 has failed on node ZHQZ_A.
:fail_interface[72] echo Interface 192.1.2.93 has failed on node ZHQZ_A.
:fail_interface[72] 1> /dev/console
:fail_interface[74] [[ ZHQZ_A = ZHQZ_B ]]
:fail_interface[135] exit 0
Apr 10 11:19:28 EVENT COMPLETED: fail_interface ZHQZ_A 192.1.2.93 0

                        HACMP Event Summary
Event: fail_interface ZHQZ_A 192.1.2.93
Start time: Thu Apr 10 11:19:28 2008

End time: Thu Apr 10 11:19:28 2008

Action:                Resource:                        Script Name:
----------------------------------------------------------------------------
No resources changed as a result of this event
----------------------------------------------------------------------------

Apr 10 11:19:30 EVENT START: fail_interface ZHQZ_B 192.1.2.91

:fail_interface[57] [[ high = high ]]
:fail_interface[57] version=1.7
:fail_interface[58] :fail_interface[58] cl_get_path
HA_DIR=es
:fail_interface[60] [ 2 -ne 2 ]
:fail_interface[66] NODENAME=ZHQZ_B
:fail_interface[67] ADDR=192.1.2.91
:fail_interface[69] set -u
:fail_interface[71] :fail_interface[71] dspmsg scripts.cat 8062 Interface 192.1.2.91 has failed on node ZHQZ_B.\n 192.1.2.91 ZHQZ_B
MSG=Interface 192.1.2.91 has failed on node ZHQZ_B.
:fail_interface[72] echo Interface 192.1.2.91 has failed on node ZHQZ_B.
:fail_interface[72] 1> /dev/console
:fail_interface[74] [[ ZHQZ_B = ZHQZ_B ]]
:fail_interface[78] :fail_interface[78] cut -d: -f3
:fail_interface[78] cllsif -Scn 192.1.2.91
NETWORK=net_ether_01
:fail_interface[81] :fail_interface[81] odmget -qname=net_ether_01 HACMPnetwork
:fail_interface[81] sed s/"//g
:fail_interface[81] awk $1 == "alias" {print $3}
ALIASING=1
:fail_interface[81] [[ 1 = 1 ]]
:fail_interface[84] saveNSORDER=UNDEFINED
:fail_interface[85] NSORDER=local
:fail_interface[85] export NSORDER
:fail_interface[86] netstat -in
Name  Mtu   Network     Address              Ipkts Ierrs    Opkts Oerrs  Coll
en0   1500  link#2      0.1a.64.ad.b.12       2058     0     1912     4     0
en0   1500  192.1.2     192.1.2.91            2058     0     1912     4     0
en2   1500  link#3      0.1a.64.a8.3d.70      2420     0     1491     3     0
en2   1500  192.1.1     192.1.1.91            2420     0     1491     3     0
lo0   16896 link#1                            4373     0     4420     0     0
lo0   16896 127         127.0.0.1             4373     0     4420     0     0
lo0   16896 ::1                               4373     0     4420     0     0
:fail_interface[87] netstat -rnC
Routing tables
Destination      Gateway           Flags     Wt  Policy  If   Cost Config_Cost

Route tree for Protocol Family 2 (Internet):
127/8            127.0.0.1         U          1    -    lo0       0    0
192.1.1.0        192.1.1.91        UHSb       1    -    en2       0    0 =>
192.1.1/24       192.1.1.91        U          1    -    en2       0    0
192.1.1.91       127.0.0.1         UGHS       1    -    lo0       0    0
192.1.1.255      192.1.1.91        UHSb       1    -    en2       0    0
192.1.2.0        192.1.2.91        UHSb       1    -    en0       0    0 =>
192.1.2/24       192.1.2.91        U          1    -    en0       0    0
192.1.2.91       127.0.0.1         UGHS       1   RR   lo0       0    0 =>
192.1.2.91       192.1.2.93        UH         1   -"-   en0       0    0
192.1.2.255      192.1.2.91        UHSb       1    -    en0       0    0

Route tree for Protocol Family 24 (Internet v6):
::1              ::1               UH         1    -    lo0       0    0
:fail_interface[88] cl_configure_persistent_address fail_boot -i 192.1.2.91 -n net_ether_01
:cl_configure_persistent_address[797] [[ high = high ]]
:cl_configure_persistent_address[797] version=1.23.1.2
:cl_configure_persistent_address[798] :cl_configure_persistent_address[798] cl_get_path
HA_DIR=es
:cl_configure_persistent_address[800] :cl_configure_persistent_address[800] get_local_nodename
:get_local_nodename[40] [[ high = high ]]
:get_local_nodename[40] version=1.2.1.16
:get_local_nodename[41] :get_local_nodename[41] cl_get_path
HA_DIR=es
:get_local_nodename[43] AIXODMDIR=/etc/objrepos
:get_local_nodename[44] HAODMDIR=/etc/es/objrepos
:get_local_nodename[48] export ODMDIR=/etc/es/objrepos
:get_local_nodename[50] :get_local_nodename[50] /usr/es/sbin/cluster/utilities/cllsclstr -N
nodename=ZHQZ_B
:get_local_nodename[52] :get_local_nodename[52] cut -d: -f1
:get_local_nodename[52] cllsnode -cS
NODENAME=ZHQZ_A
ZHQZ_B
:get_local_nodename[56] [[ ZHQZ_A = ZHQZ_B ]]
:get_local_nodename[56] [[ ZHQZ_B = ZHQZ_B ]]
:get_local_nodename[59] print ZHQZ_B
:get_local_nodename[60] exit 0
LOCALNODENAME=ZHQZ_B
:cl_configure_persistent_address[802] NETWORK=
:cl_configure_persistent_address[803] ALIVE_IF=
:cl_configure_persistent_address[804] FAILED_IF=
:cl_configure_persistent_address[805] FAILED_ADDRESS=
:cl_configure_persistent_address[806] CHECK_HA_ALIVE=1
:cl_configure_persistent_address[807] RESTORE_ROUTES=/usr/es/sbin/cluster/.pers_restore_routes
:cl_configure_persistent_address[809] ACTION=fail_boot
:cl_configure_persistent_address[810] shift
:cl_configure_persistent_address[812] getopt n:a:f:i:d -i 192.1.2.91 -n net_ether_01
:cl_configure_persistent_address[812] set -- -i 192.1.2.91 -n net_ether_01 --
:cl_configure_persistent_address[814] [[ 0 != 0 ]]
:cl_configure_persistent_address[814] [[ fail_boot =  ]]
:cl_configure_persistent_address[819] [[ -i != -- ]]
:cl_configure_persistent_address[835] FAILED_ADDRESS=192.1.2.91
:cl_configure_persistent_address[836] shift
:cl_configure_persistent_address[836] shift
:cl_configure_persistent_address[836] [[ -n != -- ]]
:cl_configure_persistent_address[822] NETWORK=net_ether_01
:cl_configure_persistent_address[823] shift
:cl_configure_persistent_address[823] shift
:cl_configure_persistent_address[823] [[ -- != -- ]]
:cl_configure_persistent_address[851] shift
:cl_configure_persistent_address[853] set -u
:cl_configure_persistent_address[857] [[ fail_boot = up ]]
:cl_configure_persistent_address[857] [[ fail_boot = swap ]]
:cl_configure_persistent_address[857] [[ fail_boot = fail_boot ]]
:cl_configure_persistent_address[857] [[ 192.1.2.91 =  ]]
:cl_configure_persistent_address[857] [[ net_ether_01 =  ]]
:cl_configure_persistent_address[996] :cl_configure_persistent_address[996] awk {print $1}
:cl_configure_persistent_address[996] clgetif -a 192.1.2.91
:cl_configure_persistent_address[996] 2> /dev/null
IF=en0
:cl_configure_persistent_address[997] :cl_configure_persistent_address[997] cut -d: -f3
:cl_configure_persistent_address[997] cllsif -Scn 192.1.2.91
NETWORK=net_ether_01
:cl_configure_persistent_address[997] isAliasingNetwork net_ether_01
:cl_configure_persistent_address[3] set -u
:cl_configure_persistent_address[5] NETWORK=net_ether_01
:cl_configure_persistent_address[7] odmget -qname=net_ether_01 HACMPnetwork
:cl_configure_persistent_address[7] sed s/"//g
:cl_configure_persistent_address[7] awk $1 == "alias" {print $3}
:cl_configure_persistent_address[7] print 1
:cl_configure_persistent_address[997] [[ 1 != 1 ]]
:cl_configure_persistent_address[1007] :cl_configure_persistent_address[1007] awk -F: $2 == "persistent" && $3 == "net_ether_01" {print $1}
:cl_configure_persistent_address[1007] cllsif -Scpi ZHQZ_B
PERSISTENT=
:cl_configure_persistent_address[1007] [[  =  ]]
:cl_configure_persistent_address[1010] exit 0
:fail_interface[92] :fail_interface[92] clgetif -n 192.1.2.91
:fail_interface[92] LANG=C
NETMASK=255.255.255.0
:fail_interface[93] :fail_interface[93] clgetif -a 192.1.2.91
:fail_interface[93] LANG=C
IF1=en0
:fail_interface[94] BOOT1=192.1.2.91
:fail_interface[96] :fail_interface[96] awk -F: -v net=net_ether_01 -v if1=en0 ($2=="boot" && $3==net && $9!=if1) {printf("%s\n",$7)}
:fail_interface[96] cllsif -cSi ZHQZ_B
BOOT2=192.1.1.91
:fail_interface[96] [[ -n 192.1.1.91 ]]
:fail_interface[102] :fail_interface[102] awk -v boot1=192.1.2.91 (NR > 4 && $1!="default" && $2==boot1 && $3=="U") \
                {printf("%s %s",$1,$2)}
:fail_interface[102] netstat -rn
BROUTE=192.1.2/24 192.1.2.91
:fail_interface[102] [[ -n 192.1.2/24 192.1.2.91 ]]
:fail_interface[102] clgetnet 192.1.2.91 255.255.255.0
:fail_interface[102] clgetnet 192.1.1.91 255.255.255.0
:fail_interface[102] [[ 192.1.2.0 = 192.1.1.0 ]]
:fail_interface[102] [[ UNDEFINED != UNDEFINED ]]
:fail_interface[130] export NSORDER=
:fail_interface[135] exit 0
Apr 10 11:19:30 EVENT COMPLETED: fail_interface ZHQZ_B 192.1.2.91 0

                        HACMP Event Summary
Event: fail_interface ZHQZ_B 192.1.2.91
Start time: Thu Apr 10 11:19:30 2008

End time: Thu Apr 10 11:19:30 2008

Action:                Resource:                        Script Name:
----------------------------------------------------------------------------
No resources changed as a result of this event
----------------------------------------------------------------------------

Apr 10 11:20:47 EVENT START: network_down -1 net_ether_01

:network_down[62] [[ high = high ]]
:network_down[62] version=1.23
:network_down[63] :network_down[63] cl_get_path
HA_DIR=es
:network_down[65] [ 2 -ne 2 ]
:network_down[77] :network_down[77] cl_rrmethods2call net_cleanup
:cl_rrmethods2call[49] [[ high = high ]]
:cl_rrmethods2call[49] version=1.11.1.1
:cl_rrmethods2call[50] :cl_rrmethods2call[50] cl_get_path
HA_DIR=es
:cl_rrmethods2call[63] :cl_rrmethods2call[63] odmget -qname=net_ether_01 HACMPnetwork
:cl_rrmethods2call[63] egrep nimname
:cl_rrmethods2call[63] sed s/"//g
:cl_rrmethods2call[63] awk {print $3}
RRNET=ether
:cl_rrmethods2call[63] [[ ether = Geo_Primary ]]
:cl_rrmethods2call[63] [[ ether = XD_data ]]
:cl_rrmethods2call[76] :cl_rrmethods2call[76] odmget -qtype=2 HACMPrresmethods
:cl_rrmethods2call[76] egrep net_cleanup =
:cl_rrmethods2call[76] sed s/"//g
:cl_rrmethods2call[76] awk {print $3}
RRMETHODS=
:cl_rrmethods2call[78] echo  
:cl_rrmethods2call[79] exit 0
METHODS=
:network_down[91] set -u
:network_down[104] exit 0
Apr 10 11:20:47 EVENT COMPLETED: network_down -1 net_ether_01 0

                        HACMP Event Summary
Event: network_down -1 net_ether_01
Start time: Thu Apr 10 11:20:47 2008

End time: Thu Apr 10 11:20:47 2008

Action:                Resource:                        Script Name:
----------------------------------------------------------------------------
No resources changed as a result of this event
----------------------------------------------------------------------------

Apr 10 11:20:47 EVENT START: network_down_complete -1 net_ether_01

:network_down_complete[61] [[ high = high ]]
:network_down_complete[61] version=1.1.1.13
:network_down_complete[62] :network_down_complete[62] cl_get_path
HA_DIR=es
:network_down_complete[64] [ ! -n  ]
:network_down_complete[66] EMULATE=REAL
:network_down_complete[69] [ 2 -ne 2 ]
:network_down_complete[75] set -u
:network_down_complete[81] STATUS=0
:network_down_complete[85] odmget HACMPnode
:network_down_complete[85] grep name =
:network_down_complete[85] sort
:network_down_complete[85] uniq
:network_down_complete[85] wc -l
:network_down_complete[85] [ 2 -eq 2 ]
:network_down_complete[87] :network_down_complete[87] odmget HACMPgroup
:network_down_complete[87] grep group =
:network_down_complete[87] sed s/"//g
:network_down_complete[87] awk {print $3}
RESOURCE_GROUPS=zhqzrg
:network_down_complete[91] :network_down_complete[91] odmget -q group=zhqzrg AND name=EXPORT_FILESYSTEM HACMPresource
:network_down_complete[91] grep value
:network_down_complete[91] sed s/"//g
:network_down_complete[91] awk {print $3}
EXPORTLIST=
:network_down_complete[92] [ -n  ]
:network_down_complete[114] cl_hb_alias_network net_ether_01 add
:cl_hb_alias_network[57] [[ high = high ]]
:cl_hb_alias_network[57] version=1.4
:cl_hb_alias_network[58] :cl_hb_alias_network[58] cl_get_path
HA_DIR=es
:cl_hb_alias_network[60] NETWORK=net_ether_01
:cl_hb_alias_network[61] ACTION=add
:cl_hb_alias_network[64] [[ 2 != 2 ]]
:cl_hb_alias_network[70] [[ add != add ]]
:cl_hb_alias_network[76] set -u
:cl_hb_alias_network[78] cl_echo 33 Starting execution of /usr/es/sbin/cluster/utilities/cl_hb_alias_network with parameters net_ether_01 add\n /usr/es/sbin/cluster/utilities/cl_hb_alias_network net_ether_01 add
:cl_echo[49] version=1.13
:cl_echo[98] HACMP_OUT_FILE=/tmp/hacmp.out
Apr 10 2008 11:20:47 Starting execution of /usr/es/sbin/cluster/utilities/cl_hb_alias_network with parameters net_ether_01 add
:cl_hb_alias_network[79] date
Thu Apr 10 11:20:47 BEIDT 2008
:cl_hb_alias_network[81] :cl_hb_alias_network[81] get_local_nodename
:get_local_nodename[40] [[ high = high ]]
:get_local_nodename[40] version=1.2.1.16
:get_local_nodename[41] :get_local_nodename[41] cl_get_path
HA_DIR=es
:get_local_nodename[43] AIXODMDIR=/etc/objrepos
:get_local_nodename[44] HAODMDIR=/etc/es/objrepos
:get_local_nodename[48] export ODMDIR=/etc/es/objrepos
:get_local_nodename[50] :get_local_nodename[50] /usr/es/sbin/cluster/utilities/cllsclstr -N
nodename=ZHQZ_B
:get_local_nodename[52] :get_local_nodename[52] cut -d: -f1
:get_local_nodename[52] cllsnode -cS
NODENAME=ZHQZ_A
ZHQZ_B
:get_local_nodename[56] [[ ZHQZ_A = ZHQZ_B ]]
:get_local_nodename[56] [[ ZHQZ_B = ZHQZ_B ]]
:get_local_nodename[59] print ZHQZ_B
:get_local_nodename[60] exit 0
LOCALNODENAME=ZHQZ_B
:cl_hb_alias_network[82] STATUS=0
:cl_hb_alias_network[85] cllsnw -Scn net_ether_01
:cl_hb_alias_network[85] grep -q hb_over_alias
:cl_hb_alias_network[85] cut -d: -f4
:cl_hb_alias_network[85] exit 0
:network_down_complete[120] exit 0
Apr 10 11:20:47 EVENT COMPLETED: network_down_complete -1 net_ether_01 0

                        HACMP Event Summary
Event: network_down_complete -1 net_ether_01
Start time: Thu Apr 10 11:20:47 2008

End time: Thu Apr 10 11:20:47 2008

Action:                Resource:                        Script Name:
----------------------------------------------------------------------------
No resources changed as a result of this event
----------------------------------------------------------------------------
这是资源所在那台机器两块网卡down全部掉的情况,hacmp.out的输出

论坛徽章:
0
7 [报告]
发表于 2008-04-14 11:14 |只看该作者
节点只是down了啊,没有up啊。你的hacmp.out的输出不全啊

论坛徽章:
0
8 [报告]
发表于 2008-04-14 11:28 |只看该作者
问题是节点down下去不另外一个节点不接管,我是选择性的贴了一些hacmo.out的输出.

论坛徽章:
1
荣誉会员
日期:2011-11-23 16:44:17
9 [报告]
发表于 2008-04-14 11:36 |只看该作者
你的节点down的时候处理资源组,资源组状态为error,把你资源组配置贴上来

论坛徽章:
0
10 [报告]
发表于 2008-04-14 11:43 |只看该作者
资源组就是标准的cascading资源组,只是添加一个service-ip用来测试的.
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP