- 论坛徽章:
- 0
|
最近遇到一个奇怪的问题,请大家帮忙分析一下
两台SUN V490组成VCS CFS,每台双HBA,两台机器各有一个HBA划成一组zone和存储连接
node A的HBA坏了一块,此时node A没有报HBA错误,一直报和node B的心跳LLT有问题,
反而是node B报出node A的HBA有一个offline
最奇怪的是此时node A, node B都hang住,执行命令反应特别慢,oracle的sqlplus无法执行,业务受影响
VCS没有发生切换,直到把node A坏了的HBA的光纤线拔掉,两台node恢复正常(也没发生VCS切换)。
而node B跟node A坏掉的HBA在同一zone里HBA由N-port转为NL-port (交换机上有F-port变为L-port)
咨询Oracle, Brocade和Symentec都说不清是怎么回事,大家有遇到过吗?
现在问题集中在两个地方:
1. node A坏HBA为什么node B也跟着hang, vertitas却不切换。
2. node A坏HBA为什么把node B的HBA降为NL-port, Brocade说是node B的HBA也有问题造成的,Oracle说是因为node B的HBA和node A坏了的HBA在一个zone里,是brocade干的。。。
node A log:
Jan 9 13:09:00 yms7prd1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 1 (ce3) node 0 in trouble
Jan 9 13:09:00 yms7prd1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 1 (ce3) node 0 active
Jan 9 13:09:24 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 119 ticks
Jan 9 13:09:32 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 143 ticks
Jan 9 13:09:52 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 139 ticks
Jan 9 13:10:06 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 194 ticks
Jan 9 13:11:39 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 156 ticks
Jan 9 13:12:13 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 203 ticks
Jan 9 13:12:57 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 111 ticks
Jan 9 13:14:51 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 102 ticks
Jan 9 13:18:40 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 125 ticks
Jan 9 13:19:31 yms7prd1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 1 (ce3) node 0 in trouble
Jan 9 13:19:32 yms7prd1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 1 (ce3) node 0 active
Jan 9 13:19:54 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 107 ticks
Jan 9 13:20:10 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 166 ticks
Jan 9 13:23:24 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 104 ticks
Jan 9 13:24:30 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 119 ticks
Jan 9 13:25:24 yms7prd1 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 110 ticks
Jan 9 13:25:58 yms7prd1 qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(0): Link OFFLINE //////////////////////此时拔线了
node B log (红色的WWN号是node A坏了的HBA卡的)
Jan 9 13:07:26 ddm7prd1 fctl: [ID 517869 kern.warning] WARNING: fp(3)::GPN_ID for D_ID=10a00 failed
Jan 9 13:07:26 ddm7prd1 fctl: [ID 517869 kern.warning] WARNING: fp(3)::N_x Port with D_ID=10a00, PWWN=210000e08b87d831 disappeared from fabric
Jan 9 13:09:00 ddm7prd1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 1 (ce3) node 1 in trouble
Jan 9 13:09:00 ddm7prd1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 1 (ce3) node 1 active
Jan 9 13:19:32 ddm7prd1 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 1 (ce3) node 1 in trouble
Jan 9 13:19:32 ddm7prd1 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 1 (ce3) node 1 active
|
|