- 招聘 : Linux运维
- 论坛徽章:
- 0
|
接到case,用户删除了一个数据库表,几百M左右。(为什么会删除就不是咱该多问的了)\r\n没说的,通用做法:先不完全恢复到辅助库,然后导出导入。\r\n\r\n1.环境介绍\r\nInformix数据库,Netbackup 5.1,做的Onbar备份。\r\n1个Master server(server6)\r\n几个Media server(SSO 共享磁带库,后面提到的server1,server4都即为client也是media server)\r\n\r\n用户误删除了server4服务器上db_glxt数据库中一个表,几百M左右。\r\n\r\n准备要用server1服务器作为辅助服务器\r\n将server4上的数据库恢复到server1上,server1、server4都是生产用机,\r\nserver4的servername是aa_ol_yingye数据库为db_glxt\r\nserver1的servername是ol_aa_sett 数据库名为db_sett\r\n\r\n2.在server1上准备Informix环境\r\n1) 建立chunk:\r\n在目录/data目录下创建文件\r\ntouch rlv_phychunk \r\ntouch rlv_logchunk01 \r\ntouch rlv_logchunk02 \r\ntouch rlv_logchunk03 \r\ntouch rlv_logchunk04 \r\ntouch rlv_logchunk05 \r\ntouch rlv_logchunk06 \r\n...\r\ntouch rlv_datachunk15\r\ntouch rlv_datachunk16\r\ntouch rlv_datachunk17\r\ntouch rlv_rootchunk\r\n链接文件(目的是为了跟原来库的环境相同,源库是在/informix/dbs_glxt/下)\r\nln -s /data/rlv_phychunk /informix/dbs_glxt/phychk\r\nln -s /data/rlv_logchunk01 /informix/dbs_glxt/logchk1 \r\nln -s /data/rlv_logchunk02 /informix/dbs_glxt/logchk2 \r\nln -s /data/rlv_logchunk03 /informix/dbs_glxt/logchk3 \r\nln -s /data/rlv_logchunk04 /informix/dbs_glxt/logchk4 \r\nln -s /data/rlv_logchunk05 /informix/dbs_glxt/logchk5 \r\nln -s /data/rlv_logchunk06 /informix/dbs_glxt/logchk6\r\n...\r\nln -s /data/rlv_datachunk15 /informix/dbs_glxt/datachk15\r\nln -s /data/rlv_datachunk16 /informix/dbs_glxt/datachk16\r\nln -s /data/rlv_datachunk17 /informix/dbs_glxt/datachk17\r\nln -s /data/rlv_rootchunk /informix/dbs_glxt/rootchk\r\n\r\n2) 复制源库的相关文件\r\n将server4: infomix/etc/ixbar.135 拷到server1:/informix/ixbar.135\r\nserver4: infomix/etc/oncfg_ol_aa_yingye.135 拷到server1:/informix/etc/oncfg_ol_aa_yingye.135\r\nserver4: infomix/etc/onconfig.glxt_p 拷到server1:/informix/etc/onconfig.glxt_p\r\n\r\n3.准备Netbackup:\r\n1) To remove restrictions for all clients, create the following file on the NetBackup master server:\r\n/usr/openv/netbackup/db/altnames/No.Restrictions\r\n2) 检查了前一天晚上的备份情况\r\n\r\n4.su - informix然后export下面的环境变量\r\nexport ONCONFIG=onconfig.glxt_p\r\nexport INFORMIXSERVER=ol_aa_yingye\r\nexport INFXBSA_CLIENT=server4\r\n\r\n5.执行恢复:\r\nonbar -r -t \'2008-12-30 8:0:0\'\r\n报错了,呵呵。\r\n状态到开始mouting 介质了,过程中status code先59,后25.\r\n148769 Restore Done 25 server1 \r\n148770 Restore Done 25 (这里其实是先报59然后最后才25) server1\r\n\r\n(NetBackup 状态码: 25\r\n消息:无法连接到套接字上\r\n解释:在某个进程连接到另一个进程进行特定操作时,该进程超时。如果某个进程尝试连\r\n接到 NetBackup 请求后台驻留程序 (bprd) 或数据库管理器后台驻留程序 (bpdbm),而该\r\n后台驻留程序没有运行,则可能会发生此问题。(在 Windows 上,这些后台驻留程序是\r\nNetBackup 请求管理器服务和 NetBackup 数据库管理器服务。)如果网络或服务器负载过\r\n重且响应时间很长,或者 NetBackup 的试用许可证密钥已失效,则也可能会发生此问题。\r\n但是,该错误的最常见原因是主机名解析问题。\r\nNetBackup 状态码: 59\r\n消息:不允许访问客户机\r\n解释:主服务器或介质服务器试图访问客户机,但客户机未将该服务器识别为有效的服\r\n务器。)\r\n\r\n貌似都是bp.conf的问题,检查server1上的bp.conf:\r\nSERVER = server6\r\nSERVER = server1\r\nCLIENT_NAME = server1\r\nCLIENT_READ_TIMEOUT = 7200\r\nINFORMIX_HOME = /informix\r\nMEDIA_UNMOUNT_DELAY = 200\r\nMEDIA_REQUEST_DELAY = 30\r\nREQUIRED_INTERFACE = server1\r\n\r\n好像没问题,于是在set1上创建/usr/openv/netbackup/logs/bpcd打开调试日志,在备份一次从日志里看到点疑点:\r\n15:12:37.501 [25938] <2> bpcd main: setup_sockopts complete\r\n15:12:37.512 [25938] <2> bpcd peer_hostname: Connection from host server4 (10.64.0.33) port 898\r\n15:12:37.514 [25938] <2> bpcd valid_server: comparing server6 and server4\r\n15:12:37.516 [25938] <2> bpcd valid_server: comparing server1 and server4\r\n15:12:37.518 [25938] <16> bpcd valid_server: server4 is not a server\r\n15:12:37.518 [25938] <16> bpcd valid_server: server4 is not a media server\r\n15:12:37.519 [25938] <2> bpcd main: output socket port number = 628\r\n15:12:37.519 [25938] <2> bpcd peer_hostname: Connection from host server4 (10.64.0.33) port 898\r\n15:12:37.519 [25938] <2> bpcd main: Peer hostname is server4\r\n15:12:37.519 [25938] <2> bpcd main: Got socket for output 5, lport = 866\r\n15:12:37.520 [25938] <2> bpcd main: Connected on output socket\r\n15:12:37.520 [25938] <2> bpcd main: Duplicated socket on stderr\r\n这一段之前还有一段验证了server1和server6,都是valid_server,但是这里却server4 is not a server。\r\n\r\n怀疑:虽然是在server上做恢复,但是因为当初备份是以server4做为Media server备份,恢复的时候也要用server4当media server\r\n这个在后面会给出证据。\r\n\r\n在server1上增加了一行:\r\nSERVER = server4\r\n\r\n再执行onbar -r -t \'2008-12-30 8:0:0\' 没有问题了,开始读数据\r\n\r\n6.监控恢复进度和restore完成:\r\nbpdbjobs输出\r\n148769 Restore Done 25 server1\r\n148770 Restore Done 25 server1\r\n148771 Restore Done 0 server1\r\n148772 Restore Done 0 server1\r\n148773 Restore Done 0 server1\r\n148774 Restore Active server1\r\n\r\nserver1#[/usr/openv/netbackup/logs/bpcd]bpdbjobs -jobid 148771 -all_columns\r\n148771,2,3,0,,,server1,server4,1230622460,0000000352,1230622812,,1,,74112,0,,100,24198,informix,,,,,informix,server6,,,,,,1,/ol_aa_yingye/rootdbs/0,1,24198,,,1230622460,0000000352,1230622812,0,the requested operation was successfully completed,12,12/30/08 15:34:20 - begin Restore operation,12/30/08 15:34:21 - 1 images required,12/30/08 15:34:21 - media F805L2 required,12/30/08 15:34:27 - started process bptm (15302),12/30/08 15:34:27 - mounting F805L2,12/30/08 15:34:30 - connected,12/30/08 15:35:24 - mounted; mount time: 000:00:57,12/30/08 15:35:25 - positioning F805L2 to file 13,12/30/08 15:36:32 - positioned; position time: 000:01:07,12/30/08 15:36:32 - begin reading,12/30/08 15:40:11 - end reading; read time: 000:03:39,12/30/08 15:40:12 - end Restore operation; operation time: 000:05:52,74112,0,,63,,,,,,,,server4,,,,,,,,,,,\r\n\r\nserver1#[/usr/openv/netbackup/logs/bpcd]bpdbjobs -jobid 148772 -all_columns\r\n148772,2,3,0,,,server1,server4,1230622818,0000000022,1230622840,,1,,96,0,,100,27150,informix,,,,,informix,server6,,,,,,1,/ol_aa_yingye/logdbs/0,1,27150,,,1230622818,0000000022,1230622840,0,the requested operation was successfully completed,10,12/30/08 15:40:18 - begin Restore operation,12/30/08 15:40:20 - 1 images required,12/30/08 15:40:20 - media F805L2 required,12/30/08 15:40:34 - mounted,12/30/08 15:40:34 - positioning F805L2 to file 14,12/30/08 15:40:34 - positioned; position time: 000:00:00,12/30/08 15:40:34 - begin reading,12/30/08 15:40:38 - connected,12/30/08 15:40:38 - end reading; read time: 000:00:04,12/30/08 15:40:39 - end Restore operation; operation time: 000:00:21,96,0,,2823,,,,,,,,server4,,,,,,,,,,,\r\n\r\nserver1#[/usr/openv/netbackup/logs/bpcd]bpdbjobs -jobid 148773 -all_columns\r\n148773,2,3,0,,,server1,server4,1230622893,0000000016,1230622909,,1,,96,0,,100,27519,informix,,,,,informix,server6,,,,,,1,/ol_aa_yingye/phydbs/0,1,27519,,,1230622893,0000000016,1230622909,0,the requested operation was successfully completed,10,12/30/08 15:41:33 - begin Restore operation,12/30/08 15:41:34 - 1 images required,12/30/08 15:41:34 - media F805L2 required,12/30/08 15:41:42 - mounted,12/30/08 15:41:42 - positioning F805L2 to file 15,12/30/08 15:41:42 - positioned; position time: 000:00:00,12/30/08 15:41:42 - begin reading,12/30/08 15:41:43 - connected,12/30/08 15:41:44 - end reading; read time: 000:00:02,12/30/08 15:41:49 - end Restore operation; operation time: 000:00:16,96,0,,2823,,,,,,,,server4,,,,,,,,,,,\r\n\r\nserver1#[/usr/openv/netbackup/logs/bpcd]bpdbjobs -jobid 148774 -all_columns\r\n148774,2,1,,,,server1,server4,1230622926,0000007274,0000000000,,1,0,35192320,0,/ol_aa_yingye/datadbs1/0,0,27696,informix,,,,,informix,server6,,,,0,1,1,/ol_aa_yingye/datadbs1/0,1,27696,,,1230622926,0000007274,0000000000,,,11,12/30/08 15:42:06 - begin Restore operation,12/30/08 15:42:07 - 1 images required,12/30/08 15:42:07 - media F805L2 required,12/30/08 15:42:07 - media F815L2 required,12/30/08 15:42:20 - started process bptm (15585),12/30/08 15:42:20 - mounting F805L2,12/30/08 15:42:22 - connected,12/30/08 15:43:33 - mounted; mount time: 000:01:13,12/30/08 15:43:33 - positioning F805L2 to file 16,12/30/08 15:44:49 - positioned; position time: 000:01:16,12/30/08 15:44:49 - begin reading,35192320,0,,9532,,,,,,,,server4,,,,0,0,,,,,1,\r\n\r\n显然:client=server1,server=server4,master,server=server6每个chunk对应了一个restore的job.\r\n恢复完成到此NBU相关的工作完成,dbaccess导出导入是用户的工作了。\n\n[ 本帖最后由 天涯明月刀 于 2009-1-23 15:58 编辑 ] |
|