免费注册 查看新帖 |

Chinaunix

广告
  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 5165 | 回复: 7
打印 上一主题 下一主题

hp-ux 总是过一段时间就得重启。mp日志如下! [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2009-02-25 17:02 |只看该作者 |倒序浏览
各位大大:


    俺的HP RX4640机器(操作系统版本 B11.23)老是隔一段时间就需要重新启动。

    mp日志如下:

113   SFW  0   2  0x5680029800E00BF0 0000000000801000 MC_PCI_BUS_REQUESTOR_ID
                                                      04 Jan 2009 06:37:00
114   SFW  0   2  0x5680029900E00C10 00000000FED28000 MC_PCI_BUS_RESPONDER_ID
                                                      04 Jan 2009 06:37:01
115   SFW  0   2  0x5680029A00E00C30 000000003F97E008 MC_PCI_BUS_TARGET_ID
                                                      04 Jan 2009 06:37:01
116   SFW  0  *3  0x7680010700E00C50 0000000000000000 OS_MCA_NOT_REGISTERED
                                                      04 Jan 2009 06:37:01
117   BMC      2  0x204960590E020C70 FFFF027000120300 Type-02 127002 1208322
                                                      04 Jan 2009 06:37:02
118   SFW      2  0xC149605911020C80 FFFF000A001D0300 Type-02 1d0a00 1903104
                                                      04 Jan 2009 06:37:05
119   SFW  0   2  0x5480006300E00C90 0000000000000000 BOOT_START
                                                      04 Jan 2009 06:37:05
120   SFW      2  0xC149605911020CB0 FFFF000A001D0300 Type-02 1d0a00 1903104
                                                      04 Jan 2009 06:37:05
121   SFW  1   2  0x5480006301E00CC0 0000000000000000 BOOT_START
                                                      04 Jan 2009 06:37:05
122   BMC      2  0x2049605916020CE0 FFFF0103FDC00300 Type-02 c00301 12583681
                                                      04 Jan 2009 06:37:10

查 syslog.log没什么异常,请各位大大帮忙分析下。
谢谢!

论坛徽章:
0
2 [报告]
发表于 2009-02-25 17:50 |只看该作者
MP日志没有decode工具,看不了啊
看一下/var/opt/resmon/log/event.log和/var/tombstones下mca日志吧

论坛徽章:
0
3 [报告]
发表于 2009-02-25 17:53 |只看该作者
Decode 的一下Alert 3的 code
Event Code: 7680010700E00C50 0000000000000000

Record Type         = E0h
Reporting Entity ID = System Firmware - cpu 0
Event ID            = #263

...........................................................

Keyword             = OS_MCA_NOT_REGISTERED     

Description:

The OS_MCA vector has not been registered

Cause / Action:

The OS has not registered an OS_MCA vector.

Recommendation:

None, the OS has failed to register the vector or has chosen not to.

___________________________________________________________

Alert Level         = 3  - Warning or non-critical
Data Type           = 22 - Implementation dependant                    
Data                = 00 00 00 00 00 00 00 00

论坛徽章:
0
4 [报告]
发表于 2009-02-26 10:11 |只看该作者

回复 #2 htg407 的帖子

没太看明白您的意思 。
汗!

论坛徽章:
1
荣誉会员
日期:2011-11-23 16:44:17
5 [报告]
发表于 2009-03-05 09:45 |只看该作者
MCA,硬件问题,可能是卡也可能是io板,最小化检吧

论坛徽章:
0
6 [报告]
发表于 2009-03-06 17:42 |只看该作者
不建议随便拆,先收集信息。
有没有hpmc? syslog,Oldsyslog,eventlog

论坛徽章:
0
7 [报告]
发表于 2009-03-07 11:13 |只看该作者
死机的时候在通过串口的笔记本能看到如下的一行提示:
shuttding down machine. please wait
Halting czfs01 to preserve data integrity
Reason:CMGMSD daemon failed
shutdown complete.
sync'ing disks(0 buffer to flush):
o buffers bit flushed
0 buffers still dirty

论坛徽章:
0
8 [报告]
发表于 2009-03-07 11:19 |只看该作者
有 HP MC
有 oracle RAC。
Thu Feb 26 04:19:49 2009
Errors in file /oracle/9.2.0.2/rdbms/log/czczj1_ora_3272.trc:
ORA-29702: error occurred in Cluster Group Service operation
Thu Feb 26 04:19:49 2009
LMON: terminating instance due to error 29702
Thu Feb 26 07:12:55 2009
Starting ORACLE instance (normal)

找是alert.log的报错部分,全部是报29702的错。
对应的.trc文件附后面:

/oracle/9.2.0.2/rdbms/log/czczj1_ora_3272.trc
Oracle9i Enterprise Edition Release 9.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production
ORACLE_HOME = /oracle/9.2.0.2
System name:        HP-UX
Node name:        czfs01
Release:        B.11.23
Version:        U
Machine:        ia64
Instance name: czczj1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
3272

Ioctl ASYNC_CONFIG error, errno = 1
/oracle/9.2.0.2/rdbms/log/czczj1_ora_3272.trc
Oracle9i Enterprise Edition Release 9.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production
ORACLE_HOME = /oracle/9.2.0.2
System name:        HP-UX
Node name:        czfs01
Release:        B.11.23
Version:        U
Machine:        ia64
Instance name: czczj1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
3272

Ioctl ASYNC_CONFIG error, errno = 1
/oracle/9.2.0.2/rdbms/log/czczj1_ora_3272.trc
Oracle9i Enterprise Edition Release 9.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production
ORACLE_HOME = /oracle/9.2.0.2
System name:        HP-UX
Node name:        czfs01
Release:        B.11.23
Version:        U
Machine:        ia64
Instance name: czczj1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
3272

Ioctl ASYNC_CONFIG error, errno = 1
/oracle/9.2.0.2/rdbms/log/czczj1_ora_3272.trc
Oracle9i Enterprise Edition Release 9.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production
ORACLE_HOME = /oracle/9.2.0.2
System name:        HP-UX
Node name:        czfs01
Release:        B.11.23
Version:        U
Machine:        ia64
Instance name: czczj1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
3272

Ioctl ASYNC_CONFIG error, errno = 1
/oracle/9.2.0.2/rdbms/log/czczj1_ora_3272.trc
Oracle9i Enterprise Edition Release 9.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production
ORACLE_HOME = /oracle/9.2.0.2
System name:        HP-UX
Node name:        czfs01
Release:        B.11.23
Version:        U
Machine:        ia64
Instance name: czczj1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
3272

Ioctl ASYNC_CONFIG error, errno = 1
*** SESSION ID3.1) 2009-02-23 08:50:28.198
Batch msg size = 2048
Batching factor: enqueue replay 47, ack 53
Batching factor: cache replay 29 size per lock 64
kjxggin: receive buffer size = 32768
kjxgmin: SKGXN ver (2 0 Hewlett-Packard SKGXN 2.0)
*** 2009-02-23 08:50:33.945
kjxgmrcfg: Reconfiguration started, reason 1
kjxgmcs: Setting state to 0 0.
*** 2009-02-23 08:50:33.945
     Name Service frozen
kjxgmcs: Setting state to 0 1.
kjfcpiora: publish my weight 59303
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 1 2.
     Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 1 3.
     Name Service recovery started
     Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 1 4.
     Multicasted all local name entries for publish
     Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 1 5.
     Name Service normal
     Name Service recovery done
*** 2009-02-23 08:50:34.308
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 1 6.
*** 2009-02-23 08:50:34.510
*** 2009-02-23 08:50:34.510
Reconfiguration started
Synchronization timeout interval: 600 sec
List of nodes: 0,
Global Resource Directory frozen
node 0
release 9 2 0 2
* kjshashcfg: I'm the only node in the cluster (node 0)
Active Sendback Threshold = 50 %
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 0
0 GCS shadows traversed, 0 cancelled, 0 closed
0 GCS resources traversed, 0 cancelled
set master node info
Submitted all remote-enqueue requests
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
*** 2009-02-23 08:50:34.694
0 GCS shadows traversed, 0 replayed, 0 unopened
Submitted all GCS cache requests
0 write requests issued in 0 GCS resources
0 PIs marked suspect, 0 flush PI msgs
*** 2009-02-23 08:50:34.773
Reconfiguration complete
Post SMON to start 1st pass IR
*** 2009-02-23 08:50:42.474
kjxgrtmc2: Member 0 thread 1 mounted
*** 2009-02-23 08:51:19.547
kjxgmpoll reconfig bitmap: 0 1
*** 2009-02-23 08:51:19.547
kjxgmrcfg: Reconfiguration started, reason 1
kjxgmcs: Setting state to 1 0.
*** 2009-02-23 08:51:19.590
     Name Service frozen
kjxgmcs: Setting state to 1 1.
*** 2009-02-23 08:51:19.761
Obtained RR update lock for sequence 1, RR seq 1
*** 2009-02-23 08:51:19.829
Voting results, upd 0, seq 2, bitmap: 0 1
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 2 2.
     Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 2 3.
     Name Service recovery started
     Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 2 4.
     Multicasted all local name entries for publish
     Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 2 5.
     Name Service normal
     Name Service recovery done
*** 2009-02-23 08:51:19.836
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 2 6.
*** 2009-02-23 08:51:19.936
*** 2009-02-23 08:51:19.936
Reconfiguration started
Synchronization timeout interval: 600 sec
List of nodes: 0,1,
Global Resource Directory frozen
node 0
node 1
release 9 2 0 2
* kjdrqrnums: node 1 resnum could not be queried (ret 7).
res_master_weight for node 0 is 59303
res_master_weight for node 1 is 59303
Total master weight = 118606
Dead  inst
Join  inst 1
Exist inst 0
Active Sendback Threshold = 50 %
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 820
1834 GCS shadows traversed, 0 cancelled, 0 closed
1834 GCS resources traversed, 0 cancelled
64036 GCS resources on freelist, 65233 on array, 65233 allocated
set master node info
Submitted all remote-enqueue requests
kjfcrfg: Number of mesgs sent to node 1 = 288
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
*** 2009-02-23 08:51:20.323
1834 GCS shadows traversed, 637 replayed, 0 unopened
Submitted all GCS cache requests
0 write requests issued in 1197 GCS resources
0 PIs marked suspect, 0 flush PI msgs
* kjdrqrnums: node 1 resnum could not be queried (ret 7).
*** 2009-02-23 08:51:20.537
Reconfiguration complete
Post SMON to start 1st pass IR
*** 2009-02-25 16:59:38.934
kjxgmpoll reconfig bitmap: 0
*** 2009-02-25 16:59:38.953
kjxgmrcfg: Reconfiguration started, reason 1
kjxgmcs: Setting state to 2 0.
*** 2009-02-25 16:59:39.065
     Name Service frozen
kjxgmcs: Setting state to 2 1.
*** 2009-02-25 16:59:39.274
Obtained RR update lock for sequence 2, RR seq 2
*** 2009-02-25 16:59:40.872
Voting results, upd 0, seq 3, bitmap: 0
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 3 2.
     Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 3 3.
     Name Service recovery started
     Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 3 4.
     Multicasted all local name entries for publish
     Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 3 5.
     Name Service normal
     Name Service recovery done
*** 2009-02-25 16:59:40.893
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 3 6.
kjfmact: call ksimdic on instance (1)
*** 2009-02-25 16:59:40.929
*** 2009-02-25 16:59:40.935
Reconfiguration started
Synchronization timeout interval: 600 sec
List of nodes: 0,
Global Resource Directory frozen
node 0
* kjshashcfg: I'm the only node in the cluster (node 0)
Active Sendback Threshold = 50 %
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 11060
63803 GCS shadows traversed, 1 cancelled, 9507 closed
27102 GCS resources traversed, 0 cancelled
38131 GCS resources on freelist, 65233 on array, 65233 allocated
set master node info
Submitted all remote-enqueue requests
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
*** 2009-02-25 16:59:41.594
63803 GCS shadows traversed, 0 replayed, 9508 unopened
Submitted all GCS cache requests
0 write requests issued in 54295 GCS resources
0 PIs marked suspect, 0 flush PI msgs
*** 2009-02-25 16:59:42.008
Reconfiguration complete
Post SMON to start 1st pass IR
*** 2009-02-26 04:19:48.565
kjxggpoll: received an error event from DBALL_DB
Return code from kjxggpoll: 10
error 29702 detected in background process
ORA-29702: error occurred in Cluster Group Service operation
ksuitm: waiting for [5] seconds before killing DIAG
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP