- 论坛徽章:
- 0
|
客户rac节点之一SMF服务损坏,导致节点无法正常启动。
1、在ok模式下启动系统
{2} ok
{2} ok boot
Resetting ...
Enabling system bus....... Done
Initializing CPUs......... Done
Initializing boot memory.. Done
Initializing OpenBoot
Probing system devices
Probing I/O buses
Probing system devices
Probing I/O buses
Sun Fire V490, No Keyboard
Copyright 2007 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.22.34, 8192 MB memory installed, Serial #72019670.
Ethernet address 0:14:4f:4a:ee:d6, Host ID: 844aeed6.
Rebooting with command: boot
Boot device: /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@w2100001862805323,0:a File and args:
SunOS Release 5.10 Version Generic_118833-36 64-bit
Copyright 1983-2006 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hostname: hhzdb2
Nov 3 10:39:39 svc.startd[8]: svc:/system/cluster/scmountdev:default: Method "/usr/cluster/lib/svc/method/scmountdev start" failed with exit status 1.
Nov 3 10:39:39 svc.startd[8]: svc:/system/cluster/scmountdev:default: Method "/usr/cluster/lib/svc/method/scmountdev start" failed with exit status 1.
Nov 3 10:39:39 svc.startd[8]: svc:/system/cluster/scmountdev:default: Method "/usr/cluster/lib/svc/method/scmountdev start" failed with exit status 1.
Nov 3 10:39:39 svc.startd[8]: system/cluster/scmountdev:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Nov 3 10:39:43 svc.startd[8]: svc:/application/management/seaport:default: Method "/usr/lib/sma_snmp/setseaport" failed with exit status 1.
Nov 3 10:39:43 svc.startd[8]: svc:/application/management/seaport:default: Method "/usr/lib/sma_snmp/setseaport" failed with exit status 1.
Nov 3 10:39:43 svc.startd[8]: svc:/application/management/seaport:default: Method "/usr/lib/sma_snmp/setseaport" failed with exit status 1.
Nov 3 10:39:43 svc.startd[8]: application/management/seaport:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Nov 3 10:39:45 svc.startd[8]: svc:/system/consadm:default: Method "/lib/svc/method/svc-consadm" failed with exit status 1.
Nov 3 10:39:45 svc.startd[8]: svc:/system/consadm:default: Method "/lib/svc/method/svc-consadm" failed with exit status 1.
Nov 3 10:39:45 svc.startd[8]: svc:/system/consadm:default: Method "/lib/svc/method/svc-consadm" failed with exit status 1.
……
Nov 3 10:39:46 svc.startd[8]: system/cluster/bootcluster:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Nov 3 10:39:47 svc.startd[8]: svc:/system/cvc:default: Method "/lib/svc/method/svc-cvcd" failed with exit status 96.
Nov 3 10:39:47 svc.startd[8]: system/cvc:default misconfigured: transitioned to maintenance (see 'svcs -xv' for details)
checking ufs filesystems
/dev/md/rdsk/d240: is logging.
……
Nov 3 10:40:02 inetd[370]: Property exec for method inetd_start of instance svc:/network/rpc/rusers:default is invalid
Nov 3 10:40:02 inetd[370]: Invalid configuration for instance svc:/network/rpc/rusers:default, placing in maintenance
Nov 3 10:40:02 inetd[370]: Property exec for method inetd_start of instance svc:/network/rpc/spray:default is invalid
Nov 3 10:40:12 hhzdb2 svc.startd[8]: application/graphical-login/cde-login:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Nov 3 10:40:12 hhzdb2 svc.startd[8]: svc:/application/management/common-agent-container-1:default: Method "/usr/lib/cacao/lib/tools/scripts/cacao_smf start default" failed with exit status 1.
Nov 3 10:40:12 hhzdb2 svc.startd[8]: application/management/common-agent-container-1:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Nov 3 10:40:12 hhzdb2 last message repeated 2 times
Nov 3 10:40:12 hhzdb2 svc.startd[8]: application/management/common-agent-container-1:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Nov 3 10:40:12 hhzdb2 svc.startd[8]: svc:/system/webconsole:console: Method "/lib/svc/method/svc-webconsole start" failed with exit status 95.
Nov 3 10:40:12 hhzdb2 svc.startd[8]: system/webconsole:console failed fatally: transitioned to maintenance (see 'svcs -xv' for details)
Nov 3 10:40:12 hhzdb2 svc.startd[8]: system/webconsole:console failed fatally: transitioned to maintenance (see 'svcs -xv' for details)
Nov 3 10:40:12 hhzdb2 svc.startd[8]: svc:/system/basicreg:default: Method "/usr/sbin/sconadm register -c -m autoreg" failed with exit status 1.
Nov 3 10:40:12 hhzdb2 svc.startd[8]: system/basicreg:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Nov 3 10:40:12 hhzdb2 last message repeated 2 times
Nov 3 10:40:12 hhzdb2 svc.startd[8]: system/basicreg:default failed: transitioned to maintenance (see 'svcs -xv' for details)
INIT: Command is respawning too rapidly. Check for possible errors.
id: h1 "/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null"
INIT: hhzdb2 console login: Command is respawning too rapidly. Check for possible errors.
id: h2 "/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null"
INIT: Command is respawning too rapidly. Check for possible errors.
id: h3 "/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null"
hhzdb2 console login: root
Password:
Nov 3 10:40:30 hhzdb2 login: ROOT LOGIN /dev/console
Last login: Tue Nov 3 09:18:05 on console
-sh: /bin/cat: 没找到
-sh: /bin/mail: 没找到
Sourcing //.profile-EIS.....
root@hhzdb2 #
通过诊断,发现存在大量Solaris服务无法正常启动,需要手动进行干预。
三、故障恢复
1、初次恢复SMF服务配置信息
root@hhzdb2 # /lib/svc/bin/restore_repository
See http://sun.com/msg/SMF-8000-MY for more information on the use of
this script to restore backup copies of the smf(5) repository.
If there are any problems which need human intervention, this script will
give instructions and then exit back to your shell.
Note that upon full completion of this script, the system will be rebooted
using reboot(1M), which will interrupt any active services.
/lib/svc/bin/restore_repository: /bin/sed: not found
/lib/svc/bin/restore_repository: /bin/ls: not found
There are no available backups of /etc/svc/repository.db.
The only available repository is "-seed-". Note that restoring the seed
will lose all customizations, including those made by the system during
the installation and/or upgrade process.
Enter -seed- to restore from the seed, or -quit- to exit:
/lib/svc/bin/restore_repository: test: argument expected
root@hhzdb2 # ls /bin | grep ls
/bin: 无此文件或目录
root@hhzdb2 #
在第一次进行SMF服务配置时,系统无法发现SMF服务备份文件。诊断发现/bin目录不存在,/bin目录为/usr/bin的一个链接目录,可进行手动链接。
2、手动链接/bin目录
root@hhzdb2 # ln -s /usr/bin /bin
root@hhzdb2 # ls -atl | grep bin
lrwxrwxrwx 1 root root 8 11月 3日 10:52 bin -> /usr/bin
drwxr-xr-x 7 root bin 5632 2007 12月 4 lib
drwxr-xr-x 2 root sys 1024 2007 12月 1 sbin
root@hhzdb2 #
3、恢复SMF配置及主机
root@hhzdb2 # /lib/svc/bin/restore_repository
See http://sun.com/msg/SMF-8000-MY for more information on the use of
this script to restore backup copies of the smf(5) repository.
If there are any problems which need human intervention, this script will
give instructions and then exit back to your shell.
Note that upon full completion of this script, the system will be rebooted
using reboot(1M), which will interrupt any active services.
The following backups of /etc/svc/repository.db exist, from
oldest to newest:
manifest_import-20071130_235016
manifest_import-20071201_001428
manifest_import-20071201_012445
boot-20090816_131855
boot-20090831_094035
boot-20091103_091605
boot-20091103_103938
The backups are named based on their type and the time what they were taken.
Backups beginning with "boot" are made before the first change is made to
the repository after system boot. Backups beginning with "manifest_import"
are made after svc:/system/manifest-import:default finishes its processing.
The time of backup is given in YYYYMMDD_HHMMSS format.
Please enter either a specific backup repository from the above list to
restore it, or one of the following choices:
CHOICE ACTION
---------------- ----------------------------------------------
boot restore the most recent post-boot backup
manifest_import restore the most recent manifest_import backup
-seed- restore the initial starting repository (All
customizations will be lost, including those
made by the install/upgrade process.)
-quit- cancel script and quit
Enter response [boot]: boot-20090816_131855
After confirmation, the following steps will be taken:
svc.startd(1M) and svc.configd(1M) will be quiesced, if running.
/etc/svc/repository.db
-- renamed --> /etc/svc/repository.db_old_20091103_105407
/etc/svc/repository-boot-20090816_131855
-- copied --> /etc/svc/repository.db
and the system will be rebooted with reboot(1M).
Proceed [yes/no]? yes
Quiescing svc.startd(1M) and svc.configd(1M): done.
/etc/svc/repository.db
-- renamed --> /etc/svc/repository.db_old_20091103_105407
/etc/svc/repository-boot-20090816_131855
-- copied --> /etc/svc/repository.db
The backup repository has been successfully restored.
Rebooting in 5 seconds.
Nov 3 10:54:19 hhzdb2 reboot: rebooted by root
Nov 3 10:54:19 hhzdb2 rpcbind: rpcbind terminating on signal.
Nov 3 10:54:19 hhzdb2 syslogd: going down on signal 15
Nov 3 10:54:19 hhzdb2 rpcbind: rpcbind terminating on signal.
syncing file systems... done
rebooting...
Resetting ...
Software Reset
Enabling system bus....... Done
Initializing CPUs......... Done
Initializing boot memory.. Done
Initializing OpenBoot
Probing system devices
Probing I/O buses
Probing system devices
Probing I/O buses
Sun Fire V490, No Keyboard
Copyright 2007 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.22.34, 8192 MB memory installed, Serial #72019670.
Ethernet address 0:14:4f:4a:ee:d6, Host ID: 844aeed6.
Rebooting with command: boot
Boot device: /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@w2100001862805323,0:a File and args:
SunOS Release 5.10 Version Generic_118833-36 64-bit
Copyright 1983-2006 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hostname: hhzdb2
Nov 3 10:55:26 /usr/lib/snmp/snmpdx: can't open the file
Nov 3 10:55:26 /usr/lib/snmp/snmpdx: can't open the file
Loading smf(5) service descriptions: 1/1
Booting as part of a cluster
NOTICE: CMM: Node hhzdb1 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node hhzdb2 (nodeid = 2) with votecount = 1 added.
NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d4s2) added; votecount = 1, bitmask of nodes with configured paths = 0x3.
NOTICE: clcomm: Adapter ce3 constructed
NOTICE: clcomm: Path hhzdb2:ce3 - hhzdb1:ce3 being constructed
NOTICE: clcomm: Adapter ce2 constructed
NOTICE: clcomm: Path hhzdb2:ce2 - hhzdb1:ce2 being constructed
NOTICE: CMM: Node hhzdb2: attempting to join cluster.
NOTICE: clcomm: Path hhzdb2:ce3 - hhzdb1:ce3 being initiated
NOTICE: clcomm: Path hhzdb2:ce2 - hhzdb1:ce2 being initiated
NOTICE: CMM: Node hhzdb1 (nodeid: 1, incarnation #: 1253030763) has become reachable.
NOTICE: clcomm: Path hhzdb2:ce2 - hhzdb1:ce2 online
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node hhzdb1 (nodeid = 1) is up; new incarnation number = 1253030763.
NOTICE: CMM: Node hhzdb2 (nodeid = 2) is up; new incarnation number = 1257216936.
NOTICE: CMM: Cluster members: hhzdb1 hhzdb2.
NOTICE: CMM: node reconfiguration #2 completed.
NOTICE: CMM: Node hhzdb2: joined cluster.
ip: joining multicasts failed (1 on clprivnet0 - will use link layer broadcasts for multicast
checking ufs filesystems
NOTICE: clcomm: Path hhzdb2:ce3 - hhzdb1:ce3 online
/dev/md/rdsk/d240: is logging.
hhzdb2 console login: 正在获取对所有已连接的磁盘的访问权限
Nov 3 10:55:48 hhzdb2 sendmail[448]: My unqualified host name (hhzdb2) unknown; sleeping for retry
Nov 3 10:55:48 hhzdb2 sendmail[449]: My unqualified host name (hhzdb2) unknown; sleeping for retry
Nov 3 10:55:49 hhzdb2 Cluster.Framework: stdout: 正在重置与非群集节点共享的 scsi 总线
Nov 3 10:55:54 hhzdb2 xntpd[565]: xntpd 3-5.93e+sun 03/08/29 16:23:05 (1.4)
Nov 3 10:55:54 hhzdb2 xntpd[565]: tickadj = 5, tick = 10000, tvu_maxslew = 495, est. hz = 100
Nov 3 10:55:54 hhzdb2 xntpd[565]: using kernel phase-lock loop 0041, drift correction 0.00000
Nov 3 10:55:55 hhzdb2 xntpd[565]: using kernel phase-lock loop 0041, drift correction 43.32100
starting NetWorker daemons:
nsrexecd
nsrd
Nov 3 10:56:23 hhzdb2 root: Sun StorEdge(TM) Enterprise Backup server: (notice) started
Nov 3 10:56:24 hhzdb2 root: S99sneep:root: Chassis Serial not available from system eeprom
Nov 3 10:56:24 hhzdb2 root: S99sneep:root: Repair Chassis Serial with /opt/SUNWsneep/bin/sneep
Nov 3 10:56:31 hhzdb2 Cluster.scdpmd: The status of device: /dev/did/rdsk/d6s0 is set to MONITORED
Nov 3 10:56:31 hhzdb2 Cluster.scdpmd: The state of the path to device: /dev/did/rdsk/d6s0 has changed to OK
Nov 3 10:56:31 hhzdb2 Cluster.scdpmd: The status of device: /dev/did/rdsk/d7s0 is set to MONITORED
Nov 3 10:56:31 hhzdb2 Cluster.scdpmd: The state of the path to device: /dev/did/rdsk/d7s0 has changed to OK
Nov 3 10:56:31 hhzdb2 Cluster.scdpmd: The status of device: /dev/did/rdsk/d4s0 is set to MONITORED
Nov 3 10:56:31 hhzdb2 Cluster.scdpmd: The state of the path to device: /dev/did/rdsk/d4s0 has changed to OK
Nov 3 10:56:33 hhzdb2 root: Oracle Cluster Ready Services starting up automatically.
Nov 3 10:56:33 hhzdb2 Cluster.RGM.rgmd: CMM: Node hhzdb1 (nodeid: 1, incarnation #: 1253030901) has become reachable.
Nov 3 10:56:33 hhzdb2 Cluster.RGM.rgmd: CMM: Cluster has reached quorum.
Nov 3 10:56:33 hhzdb2 Cluster.RGM.rgmd: CMM: Node hhzdb1 (nodeid = 1) is up; new incarnation number = 1253030901.
Nov 3 10:56:33 hhzdb2 Cluster.RGM.rgmd: CMM: Node hhzdb2 (nodeid = 2) is up; new incarnation number = 1257216993.
Nov 3 10:56:33 hhzdb2 root: Oracle Cluster Ready Services waiting for SunCluster and UDLM to start.
Nov 3 10:56:34 hhzdb2 Cluster.RGM.rgmd: launching method <bin/rac_framework_boot> for resource <rac-framework-rs>, resource group <rac-rg>, timeout <900> seconds
Nov 3 10:56:40 hhzdb2 Cluster.RGM.rgmd: method <bin/rac_framework_boot> completed successfully for resource <rac-framework-rs>, resource group <rac-rg>, time used: 0% of timeout <900 seconds>
Nov 3 10:56:40 hhzdb2 Cluster.RGM.rgmd: launching method <bin/rac_framework_start> for resource <rac-framework-rs>, resource group <rac-rg>, timeout <600> seconds
Nov 3 10:56:40 hhzdb2 Cluster.OPS.UCMMD: CMM: Node hhzdb1 (nodeid: 1, incarnation #: 1253030963) has become reachable.
Nov 3 10:56:42 hhzdb2 ID[SUNWudlm.udlm]: Unix DLM version (2) and SUN Unix DLM library version (1): compatible.
Nov 3 10:56:42 hhzdb2 Cluster.OPS.UCMMD: CMM: Cluster has reached quorum.
Nov 3 10:56:42 hhzdb2 Cluster.OPS.UCMMD: CMM: Node hhzdb1 (nodeid = 1) is up; new incarnation number = 1253030963.
Nov 3 10:56:42 hhzdb2 Cluster.OPS.UCMMD: CMM: Node hhzdb2 (nodeid = 2) is up; new incarnation number = 1257217000.
Nov 3 10:56:52 hhzdb2 Cluster.RGM.rgmd: method <bin/rac_framework_start> completed successfully for resource <rac-framework-rs>, resource group <rac-rg>, time used: 2% of timeout <600 seconds>
Nov 3 10:56:52 hhzdb2 Cluster.RGM.rgmd: launching method <bin/rac_framework_monitor_start> for resource <rac-framework-rs>, resource group <rac-rg>, timeout <3600> seconds
Nov 3 10:56:52 hhzdb2 Cluster.RGM.rgmd: launching method <bin/rac_svm_start> for resource <rac-svm-rs>, resource group <rac-rg>, timeout <600> seconds
Nov 3 10:56:52 hhzdb2 Cluster.RGM.rgmd: launching method <bin/rac_udlm_start> for resource <rac-udlm-rs>, resource group <rac-rg>, timeout <600> seconds
Nov 3 10:56:52 hhzdb2 Cluster.RGM.rgmd: method <bin/rac_framework_monitor_start> completed successfully for resource <rac-framework-rs>, resource group <rac-rg>, time used: 0% of timeout <3600 seconds>
Nov 3 10:56:53 hhzdb2 Cluster.RGM.rgmd: method <bin/rac_svm_start> completed successfully for resource <rac-svm-rs>, resource group <rac-rg>, time used: 0% of timeout <600 seconds>
Nov 3 10:56:53 hhzdb2 Cluster.RGM.rgmd: launching method <bin/rac_svm_monitor_start> for resource <rac-svm-rs>, resource group <rac-rg>, timeout <600> seconds
Nov 3 10:56:53 hhzdb2 Cluster.RGM.rgmd: method <bin/rac_udlm_start> completed successfully for resource <rac-udlm-rs>, resource group <rac-rg>, time used: 0% of timeout <600 seconds>
Nov 3 10:56:53 hhzdb2 Cluster.RGM.rgmd: launching method <bin/rac_udlm_monitor_start> for resource <rac-udlm-rs>, resource group <rac-rg>, timeout <600> seconds
Nov 3 10:56:53 hhzdb2 Cluster.RGM.rgmd: method <bin/rac_svm_monitor_start> completed successfully for resource <rac-svm-rs>, resource group <rac-rg>, time used: 0% of timeout <600 seconds>
Nov 3 10:56:53 hhzdb2 Cluster.RGM.rgmd: method <bin/rac_udlm_monitor_start> completed successfully for resource <rac-udlm-rs>, resource group <rac-rg>, time used: 0% of timeout <600 seconds>
hhzdb2 console login: root
Password:
Nov 3 10:57:14 hhzdb2 login: ROOT LOGIN /dev/console
Last login: Tue Nov 3 10:45:52 on console
Sun Microsystems Inc. SunOS 5.10 Generic January 2005
You have new mail.
Sourcing //.profile-EIS.....
root@hhzdb2 # |
|