Chinaunix

标题: NBU 报错误24 socket write failed,无法备份 [打印本页]

作者: wowerwo    时间: 2009-09-22 15:00
标题: NBU 报错误24 socket write failed,无法备份
netbackup 4.5  服务器端 磁带备份 报错 STATUS 24: socket write failed
用的是NT服务器端,Solaris是客户端

问题报告里查看到如下提示
Unable to write progress log </usr/openv/netbackup/logs/user_ops/dbext/logs/…………>

无法写入日志,什么原因啊?NBU出错后,还发现一个问题,客户端正常时生成的日志的GROUP为SYBASE  ,现在生成的日志的GROUP为STAFF,该怎么办啊,各位老师帮帮忙~~~

bptm里是这样的内容:
10:19:04.453 [528.450] <2> bptm: INITIATING (VERBOSE = 0): -count -cmd -rt 6 -rn 0 -stunit backup-8mm2-robot-tl8-0 -den 17 -mt 2 -masterversion 451000
10:19:04.453 [528.450] <2> bptm: EXITING with status 0 <----------
10:19:04.500 [557.548] <2> bptm: INITIATING (VERBOSE = 0): -delete_expired
10:19:04.500 [557.548] <2> bptm: EXITING with status 0 <----------
10:19:17.984 [557.548] <2> bptm: INITIATING (VERBOSE = 0): -count -cmd -rt 6 -rn 0 -stunit backup-8mm2-robot-tl8-0 -den 17 -mt 2 -masterversion 451000
10:19:17.984 [557.548] <2> bptm: EXITING with status 0 <----------
10:19:18.093 [375.558] <2> bptm: INITIATING (VERBOSE = 0): -U
10:19:18.093 [375.558] <2> bptm: EXITING with status 0 <----------
10:19:20.203 [548.536] <2> bptm: INITIATING (VERBOSE = 0): -count -cmd -rt 6 -rn 0 -stunit backup-8mm2-robot-tl8-0 -den 17 -mt 2 -masterversion 451000
10:19:20.203 [548.536] <2> bptm: EXITING with status 0 <----------
10:24:30.328 [564.532] <2> bptm: INITIATING (VERBOSE = 0): -count -cmd -rt 6 -rn 0 -stunit backup-8mm2-robot-tl8-0 -den 17 -mt 2 -masterversion 451000
10:24:30.328 [564.532] <2> bptm: EXITING with status 0 <----------
10:29:04.750 [536.533] <2> bptm: INITIATING (VERBOSE = 0): -count -cmd -rt 6 -rn 0 -stunit backup-8mm2-robot-tl8-0 -den 17 -mt 2 -masterversion 451000
10:29:04.750 [536.533] <2> bptm: EXITING with status 0 <----------
10:29:04.796 [573.564] <2> bptm: INITIATING (VERBOSE = 0): -delete_expired
10:29:04.796 [573.564] <2> bptm: EXITING with status 0 <----------

[ 本帖最后由 wowerwo 于 2009-9-22 15:01 编辑 ]
作者: westlife521    时间: 2009-09-23 16:19
Exact Error Message
EXIT STATUS 24: socket write failed

Details:
Overview:
The transmission control protocol (TCP) network parameter tcp_ip_abort_interval may cause this error if it has been tuned incorrectly.

The tcp_ip_abort_interval is the total retransmission timeout value for a TCP connection in milliseconds. For a given TCP connection, if TCP has been retransmitting for tcp_ip_abort_interval period of time and it has not received any acknowledgment from the other endpoint during this period, TCP closes this connection. By default, the tcp_ip_abort_interval parameter is 480000 milliseconds (8 minutes).

Troubleshooting:
To obtain the current tcp_ip_abort_interval parameter value, the following command can be run.  This is an operating system command and will be found in one of the system directories, depending on the platform.  For example, /usr/sbin/ndd can be found on Solaris systems.

# ndd -get /dev/tcp tcp_ip_abort_interval

When tuning the tcp_ip_abort_interval, the following TCP network parameter values must also be taken into consideration:
tcp_rexmit_interval_initial: The initial retransmission timeout (RTO) value for a TCP connection in milliseconds. The default value is 3000 milliseconds (3 seconds).
tcp_rexmit_interval_min: The minimum retransmission timeout (RTO) value in milliseconds. The default value is 400 milliseconds.
tcp_rexmit_interval_max: The maximum retransmission timeout value (RTO) in milliseconds. The default value is 60000 milliseconds (60 seconds).
To obtain the above current TCP parameter values, the following commands can be run:
# ndd -get /dev/tcp tcp_rexmit_interval_initial
# ndd -get /dev/tcp tcp_rexmit_interval_min
# ndd -get /dev/tcp tcp_rexmit_interval_max

Log Files:  N/A

Resolution:
If the tcp_ip_abort_interval timer value is reduced to a value less than the tcp_rexmit_interval_max timer value or any other tcp_rexmit variable (shown above) then connections can get aborted. This is due to the tcp_ip_abort_interval timer expiring before the tcp_rexmit_interval_max (or other tcp_rexmit variable) timer is reached. When the tcp_ip_abort_interval timer value is reached, the TCP connection is closed (RESET signal).

The TCP connection reset will be presented in the bpbkar log file as a "Errno = 32: Broken pipe" error message. This error message will then be followed with an "Exit status = 24: socket write failed" error message.

If the tcp_ip_abort_interval parameter value must be reduced, the value should be at least four times greater than the tcp_rexmit_interval_max parameter value as recommended by Sun Microsystems. In addition, Sun Microsystems recommends the tcp_rexmit_interval_max value to be at least eight times the value of  tcp_rexmit_interval_min.
It is important to note that the inetd process needs to be restarted after modifying these parameters. If this does not occur, the current tcp_rexmit parameter values will be retained.
The Sun Microsystems default TCP parameter values are adequate for the majority of servers and applications currently in use.
The default TCP parameter values should not be modified without adequate research and should follow Sun Microsystems recommendations.
作者: wowerwo    时间: 2009-09-23 16:59
这些查了,都一样~~
作者: wowerwo    时间: 2009-09-23 17:00
# ndd -get /dev/tcp tcp_ip_abort_interval
48000
# ndd -get /dev/tcp tcp_rexmit_interval_initial
3000
# ndd -get /dev/tcp tcp_rexmit_interval_min
400
# ndd -get /dev/tcp tcp_rexmit_interval_max
60000
作者: 无牙    时间: 2009-09-23 21:55
把日志目录的权限改成777试一试。
作者: wowerwo    时间: 2009-09-25 13:58
原帖由 无牙 于 2009-9-23 21:55 发表
把日志目录的权限改成777试一试。


不好意思,怎么改啊?

# ls -l
total 4
drwxrwxrwx   2 sybase   sybase      2048 Sep 25 04:52 logs

[ 本帖最后由 wowerwo 于 2009-9-25 14:00 编辑 ]
作者: 无牙    时间: 2009-09-25 16:18
要把把/usr/openv/netbackup/logs/user_ops/dbext/logs 的权限改了。

chmod -R 777 /usr/openv/netbackup/logs/user_ops




欢迎光临 Chinaunix (http://bbs.chinaunix.net/) Powered by Discuz! X3.2