免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
12下一页
最近访问板块 发新帖
查看: 4577 | 回复: 15
打印 上一主题 下一主题

[AIX]经验共享-修复无法启动的机器过程! [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2003-10-23 00:51 |只看该作者 |倒序浏览
lv做raw使用,有没有考虑给r设备文件修改权限和属主呢?楼主。

论坛徽章:
0
2 [报告]
发表于 2003-10-11 00:29 |只看该作者

[AIX]经验共享-修复无法启动的机器过程!

几天前发生的事\r\n环境:\r\n硬件  两台F85 +  7133  T40  \r\n软件    oracle817 ops  + hacmp 4.4 es \r\n系统  aix 433 ml09   两个concurrent vg\r\n几年前的系统,好久没做过系统备份!\r\n\r\n两台机器的hacmp 4.4 补丁\r\nlslpp -l | grep cluster  最高到 4.4.1.16\r\n系统的补丁\r\ninstfix -i  |  grep ML   到09  (有必要升级)\r\n\r\n两台机器的hacmp 配置了三个资源组,分别是\r\nsj85srv1\r\nsj85srv2\r\nsj85zcls\r\n\r\n其中sj85srv1\r\nnode relationship                                                  cascading\r\nparticipating node names/default node priority    sj85_1  sj85_2\r\nservice ip label                                                     sj85_1srv\r\n\r\napplication servers                                               appsrv1\r\n\r\nsj85srv2\r\nnode relationship                                                  cascading\r\nparticipating node names/default node priority    sj85_2  sj85_1\r\nservice ip label                                                     sj85_2srv\r\n\r\napplication servers                                               appsrv2\r\n\r\nsj85zcls\r\nnode relationship                                                  concurrent\r\nparticipating node names/default node priority    sj85_1  sj85_2\r\nservice ip label                                                     空\r\n\r\napplication servers                                               空\r\n\r\n可以看出两个资源组是负责ip地址的接管,另一个负责卷组。\r\n\r\n\r\n前几天添加数据文件,我的思路用hacmp直接添加的lv,\r\nsmitty hacmp---cluster system management------两边应该能够自动同步。做了之后,一边lv状态正常,但在另一边则\r\nlsvg -l zdatavg1   其中type 一项应该是jfs  ,显示的是 ??,于是我手工同步\r\nsynclvodm -v lvname  状态正常,数据库也能够同时认出两边的数据文件。此时没有向数据文件添加数据,过了几天,同事说数据文件两边不能同时用,有问题。当时想可能这种添加方式有问题,不行的话估计还得用古老的方式,一边建,importvg。此时正赶上我出差,让同事帮着看看,我就走了\r\n\r\n过几天回来情况还是这样,我想可能是系统的补丁太低,以后升级应该能高定,但现在没有盘,先把数据库弄好,思路是\r\n1、宕数据库\r\n2、宕ha\r\n3、varyonvg  (不是-c模式)\r\n4、建lv\r\n5、varyoffvg\r\n6、在另一边importvg,varyon\r\n\r\n照着这个思路作,1 ok ,2  有一台机器ha宕不下来,我是两边分别执行的smitty clstop     思路\r\n1、clstop  graceful\r\n2、clstop  force\r\n3、clstop  force 连续两次,一般应该能搞定\r\n\r\n1 2 3 都作后无法宕ha,没有想到更好的办法,重新启动机器,一台启动了,另一台主机无法启动,到检测设备,白屏出现后  0518\r\n挂起文件系统时 hang 。\r\n  \r\n思路\r\n是否跟另一台机器有关?重起几次能过去?\r\n\r\n试过都不行\r\n\r\n思路\r\n用光盘修复文件系统、替代etc/filesystems文件\r\n\r\n具体步骤见附件 ,还是不行!此时郁闷,起不来了,还没有以前的备份,这时同事出主意说恢复另一台机器的系统应该好使,想想应该可行,做!\r\n\r\n另一台机器   mksysb\r\n在这台机器上用磁带恢复,之前为了避免ip地址的冲突,先把机器的ip地址改走, smitty chinet\r\n恢复ok,思路\r\n1、改hostname\r\n2、改ip地址\r\n3、同步ha\r\n4、试oracle是否好使\r\n5、停ha\r\n6、正常varyonvg\r\n7、建lv\r\n8、varyoffvg\r\n9、importvg\r\n10、启动数据库,建表测试\r\n11、lsnrctl start\r\n\r\n照着这个思路,解决了,但还没有根本解决问题,因为以后每次都得这么做,不现实,过几天升级一下系统不定看看行不行!\r\n\r\n\r\n附:\r\n1、etc/hosts\r\n10.64.60.3     sj85_2srv  sj85_2\r\n10.10.10.2     sj85_2std  \r\n10.64.60.5     sj85_2boot\r\n  \r\n10.64.60.2     sj85_1srv  sj85_1\r\n10.10.10.1     sj85_1std\r\n10.64.60.4     sj85_1boot\r\n\r\n2、0518 代码 解决步骤\r\n\r\n\r\nRepairing File Systems with fsck in AIX V4 and V5 (LED 517 or 51 \r\nThis document covers the use of the fsck (file system check) command in \r\nMaintenance mode to repair inconsistencies in file systems. The procedure \r\ndescribed is useful when file system corruption in the primary root file \r\nsystems is suspected or, in many cases, to correct an IPL hang at LED value \r\n517, 518, or LED value 555. \r\n\r\nThis document applies to AIX V4 and V5.\r\n\r\n\r\n--------------------------------------------------------------------------------\r\n\r\nRecovery procedure\r\nBoot your system into a limited function maintenance shell (Service, or \r\nMaintenance mode) from AIX bootable media to perform file system checks on your \r\nroot file systems. \r\nPlease refer to your system user\'s or installation and service guide for \r\nspecific IPL procedures related to your type and model of hardware. You can also refer to the document titled \"Booting in Service Mode,\" available at http://techsupport.services.ibm.com/server/aix.techTips. \r\n\r\nWith bootable media of the same version and level as the system, boot the \r\nsystem. The bootable media can be any ONE of the following: \r\nBootable CD-ROM NON_AUTOINSTALL mksysb Bootable Install Tape \r\nFollow the screen prompts to the following menu: \r\n\r\n   Welcome to Base Operating System \r\n   Installation and Maintenance \r\n\r\nChoose Start Maintenance Mode for System Recovery (Option 3). \r\nThe next screen displays the Maintenance menu. \r\n\r\n\r\nChoose Access a Root Volume Group (Option 1). \r\nThe next screen displays a warning that indicates you will not be able to \r\nreturn to the Base OS menu without rebooting. \r\n\r\n\r\nChoose 0 continue. \r\nThe next screen displays information about all volume groups on the system. \r\n\r\n\r\nSelect the root volume group by number. \r\n\r\nChoose Access this volume group and start a shell before mounting file systems \r\n(Option 2). \r\nIf you get errors from the preceding option, do not continue with the rest of \r\nthis procedure. Correct the problem causing the error. If you need assistance \r\ncorrecting the problem causing the error, contact one of the following: \r\n\r\nlocal branch office your point of sale your AIX support center \r\nIf no errors occur, proceed with the following steps. \r\n\r\n\r\nRun the following commands to check and repair file systems. \r\nNOTE: The -y option gives fsck permission to repair file system corruption when necessary. This flag can be used to avoid having to manually answer multiple confirmation prompts, however, use of this flag can cause permanent, unnecessary data loss in some situations. \r\n\r\n        fsck /dev/hd4 \r\n        fsck /dev/hd2 \r\n        fsck /dev/hd3 \r\n        fsck /dev/hd9var \r\n        fsck /dev/hd1 \r\n\r\nTo format the default jfslog for the rootvg Journaled File System (JFS) file \r\nsystems, run the following command: \r\n        /usr/sbin/logform /dev/hd8 \r\n\r\nAnswer yes when asked if you want to destroy the log. \r\n\r\nIf your system is hanging at LED 517 or 518 during a Normal mode boot, it is \r\npossible the /etc/filesystems file is corrupt or missing. To temporarily \r\nreplace the disk-based /etc/filesystems file, run the following commands: \r\n        mount /dev/hd4 /mnt\r\n        mv /mnt/etc/filesystems /mnt/etc/filesystems.[MMDDYY]\r\n        cp /etc/filesystems /mnt/etc/filesystems\r\n        umount /mnt\r\n\r\nMMDDYY represents the current two-digit representation of the Month, Day and Year, respectively. \r\n\r\nType exit to exit from the shell. The file systems should automatically mount after you type exit. If you receive error messages, reboot into a limited function maintenance shell again to attempt to address the failure causes. \r\n\r\nIf you have user-created file systems in the rootvg volume group, run fsck on them now. Enter: \r\n        fsck /dev/[LVname] \r\n\r\nLVname is the name of your user-defined logical volume. \r\n\r\nIf you used the preceding procedure to temporarily replace the /etc/filesystems \r\nfile, and you have user-created file systems in the rootvg volume group, you \r\nmust also run the following command: \r\n        imfs -l /dev/[LVname]\r\n\r\nIf you have file systems in a volume group other than rootvg, run fsck on them now. Enter: \r\n        varyonvg [VGname]\r\n        fsck /dev/[LVname]\r\n\r\nVGname is the name of your user-defined volume group. \r\n\r\nIf you used the preceding procedure to temporarily replace the /etc/filesystems file, also run the following command: \r\n        imfs [VGname]\r\n\r\nThe preceding commands can be repeated for each user-defined volume group on the system. \r\n\r\nIf your system was hanging at LED 517 or 518 and you are unable to activate non-rootvg volume groups in Service mode, you can manually edit the /etc/filesystems file and add the appropriate entries. \r\nThe file /etc/filesystems.MMDDYY saved in the preceding steps may be used as a reference if it is readable. However, the imfs method is preferred since it uses information stored in the logical volume control block to re-populate the /etc/filesystems file. \r\n\r\nIf your system has a mode select key, turn it to the Normal position. \r\n\r\nReboot the system into Normal mode using the following command: \r\n        sync;sync;sync;reboot \r\n\r\nIf your system still halts at the LED 517 or 518 display, in many cases, it is \r\nfaster and more cost-effective to reinstall from a recent system backup. \r\nAttempting to isolate the cause of the problem can be very time-consuming and often results in the determination that a reinstall is required to correct the problem anyway.

论坛徽章:
0
3 [报告]
发表于 2003-10-11 00:30 |只看该作者
HACMP改IP地址步骤\r\n一、        确认要修改的网卡及IP地址\r\n二、        宕掉HACMP\r\n三、        修改/etc/hosts里面对应的ip地址\r\n四、        smitty chinet选择相应的网卡修改地址\r\n五、        在HACMP里修改对应网卡的ip地址\r\n六、        可能需要重新启动机器,结束!

论坛徽章:
0
4 [报告]
发表于 2003-10-11 00:44 |只看该作者

换硬盘

一台机器的硬盘报错,准备换掉\r\n#errpt |pg\r\npdisk4  0928190003    p h   disk operation error\r\n#lsdev -Cc pdisk   确认槽位,告警灯没亮\r\n14-08-3070-05-P\r\n或diag\r\n\r\nsmitty ssaraid\r\nhdisk2   72G  raid10   good\r\nhdisk5   36G  raid1   degraded 没有完全损坏,只是安全级别降低了\r\n\r\n看raid1状态   change/show attribute of an ssa raid  -------ssa0 -------hdisk5  degraded raid1 \r\n\r\nstate  degraded\r\nprimary disks       BlankReservedoz\r\nsecond disks       pdisk5\r\npercent result       not rebuilding\r\n\r\nprimary disk应该是pdisk4,已经不起作用了,思路\r\n1、换盘\r\n2、删除定义   rmdev -dl pdisk4\r\n3、刷新    cfgmgr\r\n4、swap member of an ssaraid array\r\n5 、rebuilding\r\n\r\nok!

论坛徽章:
0
5 [报告]
发表于 2003-10-11 00:49 |只看该作者

添加磁盘,扩容

添加磁盘的操作步骤\r\n1、规划raid,是否需要改ssa环\r\n2、加盘\r\n3、cfgmgr\r\n4、改磁盘属性smitty ssaraid-------change use of an ssa raid array-----ssa1-----属性改成array  candidate disk        加四块  10,11,12,13\r\n5、做raid  smitty ssaraid ----add an ssa raid  ------raid1-----primary ,secondary  其中enable use of hot spare 选成no 时间会比较快\r\n6、建vg   smitty mkvg\r\n7、另一台机器   cfgmgr \r\n8、importvg\r\n\r\n第四步改磁盘属性时,当磁盘刚添加到系统中时属性时 system disk,我们把它改称可以用的free disk,最后加到raid中去变成member disk。

论坛徽章:
0
6 [报告]
发表于 2003-10-11 10:42 |只看该作者
GOOD WORK\r\n\r\n有解决步骤和思路,可以给其他类似环境的人参考。我不懂AIX。。。\r\n\r\n标题我给你改了一下,自己认为这么写更好,自做主张了。这样也遵守规范了,毕竟是个好帖子呀。。。

论坛徽章:
0
7 [报告]
发表于 2003-10-16 18:18 |只看该作者

省级磁盘阵列卡(ssa)

扩容时升级ssa卡的微码,因为磁盘非常新,阵列卡却很老,怕认不出来。但cfgmgr后认出来了,本想不升级(因为要重起机器),但后来咨询资深的工程师说一定要升级,以前发生过丢失数据的情况,也就是说建冬冬的时候没有问题,应用的时候会莫名的丢数据,比较可怕!\r\n1、上传ssa卡的微码\r\n2、检查现在的微码,一般有两块卡,也有一块的,都要看看\r\n[qth85_2][/]#lscfg -vl ssa0\r\n  DEVICE            LOCATION          DESCRIPTION\r\n\r\n  ssa0              27-08             IBM SSA 160 SerialRAID Adapter\r\n                                      (14109100)\r\n\r\n        Part Number................. 27H1204\r\n        FRU Number.................. 34L5388\r\n        Serial Number...............S1145311\r\n        EC Level....................    E28793\r\n        Manufacturer................IBM053\r\n        ROS Level and ID............B300    0000\r\n        Loadable Microcode Level....05\r\n        Device Driver Level.........00\r\n        Displayable Message.........SSA-ADAPTER\r\n        Device Specific.(Z0)........SDRAM=128\r\n        Device Specific.(Z1)........CACHE=32\r\n        Device Specific.(Z2)........UID=0000006094BE1546\r\n        Device Specific.(YL)........P1-I8/Q1\r\n[B]B300[/B] \r\n\r\nros level and id 那一行就是了\r\n3、做好系统备份mksysb\r\n4、升级时有可能出现问题,先把系统里的所有已经安装的软件的状态设置为 commit ,以便我们不成功的时候恢复到当前的时间点,安装的时候我们把软件的安装状态设置为 apply ,并且\r\nSAVE replaced files?一定要选择 yes ,否则无法恢复。\r\n安装之前PREVIEW only  选yes ,试着装一便,成功再装!\r\n5、成功后,重新启动机器!(必须)\r\n6、检测微码升级是否成功  lscfg -vl ssa0    从B300到BD00\r\n\r\n附:\r\n1、查看已安装软件的状态\r\nsmitty install ---> List Software and Related Information---->\r\n List Software and Related Information---> List Software and Related Information\r\n2、将已安装的软件状态都变成commit\r\nsmitty install ---> Software Maintenance and Utilities-->\r\n Commit Applied Software Updates (Remove Saved Files)--->\r\n SOFTWARE name                                      [all]                   \r\n  PREVIEW only? (commit operation will NOT occur)    yes                        COMMIT requisites?                                   no   \r\n  EXTEND file systems if space needed?                yes                   \r\n  DETAILED output?                                    no

论坛徽章:
0
8 [报告]
发表于 2003-10-16 18:20 |只看该作者

查看系统序列号

主机\r\nuname -a\r\n磁盘\r\ndiag --->ssa service aids --->set service mode\r\n前两位可能忽略了,要注意

论坛徽章:
0
9 [报告]
发表于 2003-10-17 12:52 |只看该作者
不用谢 ,做点维护类的体力活是我的本份 。\r\n\r\n不懂。。。\r\n\r\n但是support,顶

论坛徽章:
0
10 [报告]
发表于 2003-10-19 12:51 |只看该作者
大家懂AIX的来看看。\r\n\r\n感谢hlj97tel经验共享,希望对其它会员有用。\r\n\r\n准备加精华了,对内容给些意见。。。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP