免费注册	查看新帖 \|

广告


平台论坛博客文库

› 论坛 › 操作系统 › HP-UX › hp-ux 总是过一段时间就得重启。mp日志如下！

最近访问板块

发新帖

查看: 5165 | 回复: 7

上一主题

下一主题

hp-ux 总是过一段时间就得重启。mp日志如下！ [复制链接]

论坛徽章:: 0

电梯直达

跳转到指定楼层

1楼 [收藏(0)] [报告]

发表于 2009-02-25 17:02 |只看该作者 |倒序浏览

各位大大：

俺的HP RX4640机器（操作系统版本 B11.23）老是隔一段时间就需要重新启动。

mp日志如下：

113 SFW  0 2  0x5680029800E00BF0 0000000000801000 MC_PCI_BUS_REQUESTOR_ID
                                                   04 Jan 2009 06:37:00
114 SFW  0 2  0x5680029900E00C10 00000000FED28000 MC_PCI_BUS_RESPONDER_ID
                                                   04 Jan 2009 06:37:01
115 SFW  0 2  0x5680029A00E00C30 000000003F97E008 MC_PCI_BUS_TARGET_ID
                                                   04 Jan 2009 06:37:01
116 SFW  0  *3  0x7680010700E00C50 0000000000000000 OS_MCA_NOT_REGISTERED
                                                   04 Jan 2009 06:37:01
117 BMC    2  0x204960590E020C70 FFFF027000120300 Type-02 127002 1208322
                                                   04 Jan 2009 06:37:02
118 SFW    2  0xC149605911020C80 FFFF000A001D0300 Type-02 1d0a00 1903104
                                                   04 Jan 2009 06:37:05
119 SFW  0 2  0x5480006300E00C90 0000000000000000 BOOT_START
                                                   04 Jan 2009 06:37:05
120 SFW    2  0xC149605911020CB0 FFFF000A001D0300 Type-02 1d0a00 1903104
                                                   04 Jan 2009 06:37:05
121 SFW  1 2  0x5480006301E00CC0 0000000000000000 BOOT_START
                                                   04 Jan 2009 06:37:05
122 BMC    2  0x2049605916020CE0 FFFF0103FDC00300 Type-02 c00301 12583681
                                                   04 Jan 2009 06:37:10

查 syslog.log没什么异常，请各位大大帮忙分析下。
谢谢！

文库|博客

论坛徽章:: 0

2楼 [报告]

发表于 2009-02-25 17:50 |只看该作者

MP日志没有decode工具，看不了啊
看一下/var/opt/resmon/log/event.log和/var/tombstones下mca日志吧

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

zhanghaitao-neu

论坛徽章:: 0

3楼 [报告]

发表于 2009-02-25 17:53 |只看该作者

Decode 的一下Alert 3的 code
Event Code: 7680010700E00C50 0000000000000000

Record Type       = E0h
Reporting Entity ID = System Firmware - cpu 0
Event ID          = #263

...........................................................

Keyword          = OS_MCA_NOT_REGISTERED

Description:

The OS_MCA vector has not been registered

Cause / Action:

The OS has not registered an OS_MCA vector.

Recommendation:

None, the OS has failed to register the vector or has chosen not to.

___________________________________________________________

Alert Level       = 3  - Warning or non-critical
Data Type          = 22 - Implementation dependant
Data             = 00 00 00 00 00 00 00 00

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

4楼 [报告]

发表于 2009-02-26 10:11 |只看该作者

回复 #2 htg407 的帖子

没太看明白您的意思。
汗！

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 1

荣誉会员
日期:2011-11-23 16:44:17

5楼 [报告]

发表于 2009-03-05 09:45 |只看该作者

MCA,硬件问题，可能是卡也可能是io板，最小化检吧

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

6楼 [报告]

发表于 2009-03-06 17:42 |只看该作者

不建议随便拆，先收集信息。
有没有hpmc？ syslog,Oldsyslog,eventlog

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

7楼 [报告]

发表于 2009-03-07 11:13 |只看该作者

死机的时候在通过串口的笔记本能看到如下的一行提示：
shuttding down machine. please wait
Halting czfs01 to preserve data integrity
Reason:CMGMSD daemon failed
shutdown complete.
sync'ing disks(0 buffer to flush):
o buffers bit flushed
0 buffers still dirty

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

8楼 [报告]

发表于 2009-03-07 11:19 |只看该作者

有 HP MC
有 oracle RAC。
Thu Feb 26 04:19:49 2009
Errors in file /oracle/9.2.0.2/rdbms/log/czczj1_ora_3272.trc:
ORA-29702: error occurred in Cluster Group Service operation
Thu Feb 26 04:19:49 2009
LMON: terminating instance due to error 29702
Thu Feb 26 07:12:55 2009
Starting ORACLE instance (normal)

找是alert.log的报错部分，全部是报29702的错。
对应的.trc文件附后面：

/oracle/9.2.0.2/rdbms/log/czczj1_ora_3272.trc
Oracle9i Enterprise Edition Release 9.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production
ORACLE_HOME = /oracle/9.2.0.2
System name: HP-UX
Node name: czfs01
Release: B.11.23
Version: U
Machine: ia64
Instance name: czczj1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
3272

Ioctl ASYNC_CONFIG error, errno = 1
/oracle/9.2.0.2/rdbms/log/czczj1_ora_3272.trc
Oracle9i Enterprise Edition Release 9.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production
ORACLE_HOME = /oracle/9.2.0.2
System name: HP-UX
Node name: czfs01
Release: B.11.23
Version: U
Machine: ia64
Instance name: czczj1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
3272

Ioctl ASYNC_CONFIG error, errno = 1
/oracle/9.2.0.2/rdbms/log/czczj1_ora_3272.trc
Oracle9i Enterprise Edition Release 9.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production
ORACLE_HOME = /oracle/9.2.0.2
System name: HP-UX
Node name: czfs01
Release: B.11.23
Version: U
Machine: ia64
Instance name: czczj1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
3272

Ioctl ASYNC_CONFIG error, errno = 1
/oracle/9.2.0.2/rdbms/log/czczj1_ora_3272.trc
Oracle9i Enterprise Edition Release 9.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production
ORACLE_HOME = /oracle/9.2.0.2
System name: HP-UX
Node name: czfs01
Release: B.11.23
Version: U
Machine: ia64
Instance name: czczj1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
3272

Ioctl ASYNC_CONFIG error, errno = 1
/oracle/9.2.0.2/rdbms/log/czczj1_ora_3272.trc
Oracle9i Enterprise Edition Release 9.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production
ORACLE_HOME = /oracle/9.2.0.2
System name: HP-UX
Node name: czfs01
Release: B.11.23
Version: U
Machine: ia64
Instance name: czczj1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
3272

Ioctl ASYNC_CONFIG error, errno = 1
*** SESSION ID

3.1) 2009-02-23 08:50:28.198
Batch msg size = 2048
Batching factor: enqueue replay 47, ack 53
Batching factor: cache replay 29 size per lock 64
kjxggin: receive buffer size = 32768
kjxgmin: SKGXN ver (2 0 Hewlett-Packard SKGXN 2.0)
*** 2009-02-23 08:50:33.945
kjxgmrcfg: Reconfiguration started, reason 1
kjxgmcs: Setting state to 0 0.
*** 2009-02-23 08:50:33.945
   Name Service frozen
kjxgmcs: Setting state to 0 1.
kjfcpiora: publish my weight 59303
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 1 2.
   Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 1 3.
   Name Service recovery started
   Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 1 4.
   Multicasted all local name entries for publish
   Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 1 5.
   Name Service normal
   Name Service recovery done
*** 2009-02-23 08:50:34.308
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 1 6.
*** 2009-02-23 08:50:34.510
*** 2009-02-23 08:50:34.510
Reconfiguration started
Synchronization timeout interval: 600 sec
List of nodes: 0,
Global Resource Directory frozen
node 0
release 9 2 0 2
* kjshashcfg: I'm the only node in the cluster (node 0)
Active Sendback Threshold = 50 %
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 0
0 GCS shadows traversed, 0 cancelled, 0 closed
0 GCS resources traversed, 0 cancelled
set master node info
Submitted all remote-enqueue requests
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
*** 2009-02-23 08:50:34.694
0 GCS shadows traversed, 0 replayed, 0 unopened
Submitted all GCS cache requests
0 write requests issued in 0 GCS resources
0 PIs marked suspect, 0 flush PI msgs
*** 2009-02-23 08:50:34.773
Reconfiguration complete
Post SMON to start 1st pass IR
*** 2009-02-23 08:50:42.474
kjxgrtmc2: Member 0 thread 1 mounted
*** 2009-02-23 08:51:19.547
kjxgmpoll reconfig bitmap: 0 1
*** 2009-02-23 08:51:19.547
kjxgmrcfg: Reconfiguration started, reason 1
kjxgmcs: Setting state to 1 0.
*** 2009-02-23 08:51:19.590
   Name Service frozen
kjxgmcs: Setting state to 1 1.
*** 2009-02-23 08:51:19.761
Obtained RR update lock for sequence 1, RR seq 1
*** 2009-02-23 08:51:19.829
Voting results, upd 0, seq 2, bitmap: 0 1
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 2 2.
   Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 2 3.
   Name Service recovery started
   Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 2 4.
   Multicasted all local name entries for publish
   Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 2 5.
   Name Service normal
   Name Service recovery done
*** 2009-02-23 08:51:19.836
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 2 6.
*** 2009-02-23 08:51:19.936
*** 2009-02-23 08:51:19.936
Reconfiguration started
Synchronization timeout interval: 600 sec
List of nodes: 0,1,
Global Resource Directory frozen
node 0
node 1
release 9 2 0 2
* kjdrqrnums: node 1 resnum could not be queried (ret 7).
res_master_weight for node 0 is 59303
res_master_weight for node 1 is 59303
Total master weight = 118606
Dead  inst
Join  inst 1
Exist inst 0
Active Sendback Threshold = 50 %
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 820
1834 GCS shadows traversed, 0 cancelled, 0 closed
1834 GCS resources traversed, 0 cancelled
64036 GCS resources on freelist, 65233 on array, 65233 allocated
set master node info
Submitted all remote-enqueue requests
kjfcrfg: Number of mesgs sent to node 1 = 288
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
*** 2009-02-23 08:51:20.323
1834 GCS shadows traversed, 637 replayed, 0 unopened
Submitted all GCS cache requests
0 write requests issued in 1197 GCS resources
0 PIs marked suspect, 0 flush PI msgs
* kjdrqrnums: node 1 resnum could not be queried (ret 7).
*** 2009-02-23 08:51:20.537
Reconfiguration complete
Post SMON to start 1st pass IR
*** 2009-02-25 16:59:38.934
kjxgmpoll reconfig bitmap: 0
*** 2009-02-25 16:59:38.953
kjxgmrcfg: Reconfiguration started, reason 1
kjxgmcs: Setting state to 2 0.
*** 2009-02-25 16:59:39.065
   Name Service frozen
kjxgmcs: Setting state to 2 1.
*** 2009-02-25 16:59:39.274
Obtained RR update lock for sequence 2, RR seq 2
*** 2009-02-25 16:59:40.872
Voting results, upd 0, seq 3, bitmap: 0
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 3 2.
   Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 3 3.
   Name Service recovery started
   Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 3 4.
   Multicasted all local name entries for publish
   Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 3 5.
   Name Service normal
   Name Service recovery done
*** 2009-02-25 16:59:40.893
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 3 6.
kjfmact: call ksimdic on instance (1)
*** 2009-02-25 16:59:40.929
*** 2009-02-25 16:59:40.935
Reconfiguration started
Synchronization timeout interval: 600 sec
List of nodes: 0,
Global Resource Directory frozen
node 0
* kjshashcfg: I'm the only node in the cluster (node 0)
Active Sendback Threshold = 50 %
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 11060
63803 GCS shadows traversed, 1 cancelled, 9507 closed
27102 GCS resources traversed, 0 cancelled
38131 GCS resources on freelist, 65233 on array, 65233 allocated
set master node info
Submitted all remote-enqueue requests
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
*** 2009-02-25 16:59:41.594
63803 GCS shadows traversed, 0 replayed, 9508 unopened
Submitted all GCS cache requests
0 write requests issued in 54295 GCS resources
0 PIs marked suspect, 0 flush PI msgs
*** 2009-02-25 16:59:42.008
Reconfiguration complete
Post SMON to start 1st pass IR
*** 2009-02-26 04:19:48.565
kjxggpoll: received an error event from DBALL_DB
Return code from kjxggpoll: 10
error 29702 detected in background process
ORA-29702: error occurred in Cluster Group Service operation
ksuitm: waiting for [5] seconds before killing DIAG

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

发新帖

Chinaunix › 论坛 › 操作系统 › HP-UX › hp-ux 总是过一段时间就得重启。mp日志如下！

北京盛拓优讯信息技术有限公司. 版权所有京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号：11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员联系我们：huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP