- 论坛徽章:
- 0
|
先说下环境:
机器:IBM P630双机,SSA 7133阵列,做的raid 1
数据库:informix HDR方式
软件:装了一些应用软件
双机:IBM HACMP 4.5,双机热备
今天早上到机房,这2台小型机都down掉了,LCD上显示代码为9411,重新移动之后,1机恢复正常
好象2台机器down机时间不是一样。
顺便帖下文件系统信息:
第一台机器:
# df -k
Filesystem 1024-blocks Free %Used Iused %Iused Mounted on
/dev/hd4 131072 100316 24% 1493 3% /
/dev/hd2 1703936 1001948 42% 22099 6% /usr
/dev/hd9var 262144 57556 79% 539 1% /var
/dev/hd3 524288 489816 7% 304 1% /tmp
/dev/hd1 131072 126852 4% 18 1% /home
/proc - - - - - /proc
/dev/hd10opt 131072 119988 9% 339 2% /opt
/dev/lvinformix 1572864 366884 77% 2332 1% /opt/informix
/dev/lvdbtemp 1048576 1014420 4% 18 1% /opt/informix/temp
第二台机器:
# df -k
Filesystem 1024-blocks Free %Used Iused %Iused Mounted on
/dev/hd4 131072 48612 63% 1610 3% /
/dev/hd2 1572864 874688 45% 22136 6% /usr
/dev/hd9var 262144 69496 74% 512 1% /var
/dev/hd3 524288 195160 63% 197 1% /tmp
/dev/hd1 131072 126852 4% 18 1% /home
/proc - - - - - /proc
/dev/hd10opt 131072 119988 9% 338 2% /opt
/dev/lvinformix 1572864 517764 68% 2329 1% /opt/informix
/dev/lvdbtemp 1048576 1015616 4% 16 1% /opt/informix/temp
感觉这个地方还有有些问题,var文件系统空间是否有点小,需要扩大?
另外,这2台机器内存都为2G,而交换空间为512M,是否需要扩一下交换空间的大小?
查看错误日志如下:
第一台机器:
# errpt |more
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
C092AFE4 1212093805 I O ctcasd ctcasd Daemon Started
A6DF45AA 1212093805 I O RMCdaemon The daemon is started.
C0AA5338 1212093605 U S SYSDUMP SYSTEM DUMP
BFE4C025 1212093505 P H sysplanar0 UNDETERMINED ERROR
9D035E4D 1212002105 P S SYSVMM DATA STORAGE INTERRUPT, PROCESSOR
9DBCFDEE 1212093705 T O errdemon ERROR LOGGING TURNED ON
看了下,估计是
9D035E4D 1212002105 P S SYSVMM DATA STORAGE INTERRUPT, PROCESSOR
引起了小型机down机
# errpt -aj 9D035E4D |more
---------------------------------------------------------------------------
LABEL: DSI_PROC
IDENTIFIER: 9D035E4D
Date/Time: Mon Dec 12 00:21:43 BEIS
Sequence Number: 6469
Machine Id: 005E899C4C00
Node Id: host1
Class: S
Type: PERM
Resource Name: SYSVMM
Description
DATA STORAGE INTERRUPT, PROCESSOR
Probable Causes
SOFTWARE PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
DATA STORAGE INTERRUPT STATUS REGISTER
0A00 0000
SEGMENT REGISTER, SEGREG
0000 0000
DATA STORAGE INTERRUPT ADDRESS REGISTER
0000 0004
EXVAL
0000 0086
第二台机器:
# errpt |more
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
C092AFE4 1212094305 I O ctcasd ctcasd Daemon Started
A6DF45AA 1212094305 I O RMCdaemon The daemon is started.
C0AA5338 1212094105 U S SYSDUMP SYSTEM DUMP
BFE4C025 1212093505 P H sysplanar0 UNDETERMINED ERROR
83F4B3CB 1212061005 P O SYSPFS UNABLE TO ALLOCATE SPACE IN KERNEL HEAP
9DBCFDEE 1212094205 T O errdemon ERROR LOGGING TURNED ON
7975092C 1212060905 T O SYSPFS ALLOCATED KERNEL HEAP SPACE AFTER DELAY
E18E984F 1212053505 P S SRC SOFTWARE PROGRAM ERROR
查看详细信息:
# errpt -aj 7975092C |more
---------------------------------------------------------------------------
LABEL: JFS_KERNHEAP_DELAY
IDENTIFIER: 7975092C
Date/Time: Mon Dec 12 06:09:56 BEIS
Sequence Number: 6192
Machine Id: 005E8ADC4C00
Node Id: host2
Class: O
Type: TEMP
Resource Name: SYSPFS
Description
ALLOCATED KERNEL HEAP SPACE AFTER DELAY
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
ADDITIONAL INFORMATION
delay=4, (0 secs)
# errpt -aj 83F4B3CB |more
---------------------------------------------------------------------------
LABEL: JFS_KERNHEAP_LOW
IDENTIFIER: 83F4B3CB
Date/Time: Mon Dec 12 06:10:06 BEIS
Sequence Number: 6194
Machine Id: 005E8ADC4C00
Node Id: host2
Class: O
Type: PERM
Resource Name: SYSPFS
Description
UNABLE TO ALLOCATE SPACE IN KERNEL HEAP
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE
现在准备将2台机器的var空间扩为512M,交换空间扩为2G,和实际物理内存一样,然后看看结果。。
同时也请各位大虾看看上面的报错信息,帮忙定位一下,以便使问题能够彻底的解决,感激不尽! |
|