cron执行scp脚本只能scp 9个文件,这是怎么回事?
本帖最后由 abcfy2 于 2013-05-19 00:04 编辑写了个备份mysql并scp远程备份的脚本,大致过程如下
#!/bin/bash
list="database1 database2 database3 ... database n"
for database in $list
do
mysql -u root -ppasswd $database > $database.sql
scp $database.sql 192.168.1.2:
done
单独执行没有任何问题,放在crontab执行发现只有9个文件scp过去了,最后几个全都收到了ssh认证失败的错误,这个是怎么回事?
ssh_exchange_identification: Connection closed by remote host^M
lost connection
database n 为啥多个空格
回复 2# wenhq
细节就不要在意了,只是描述一个大致的算法而已。实际中是用show databases;之后赋值给变量再for循环的 回复 2# wenhq
几乎已经可以确定是crontab后台执行引起的问题了,./mysql_backup.sh & 这样执行的话就收到了ssh_exchange的报错。
难道后台执行有什么玄机吗? 看了下远程服务器的auth.log日志,发现了这样的问题。
直接执行./mysql_backup.sh,日志中是这样的:
May 18 23:25:26 ok988com sshd: pam_sm_authenticate: username =
May 18 23:25:26 ok988com sshd: Passphrase file wrapped
May 18 23:25:27 ok988com sshd: pam_unix(sshd:session): session opened for user ok988 by (uid=0)
May 18 23:25:27 ok988com sshd: pam_unix(sshd:session): session closed for user ok988
May 18 23:25:34 ok988com sshd: pam_sm_authenticate: Called
May 18 23:25:34 ok988com sshd: pam_sm_authenticate: username =
May 18 23:25:34 ok988com sshd: Passphrase file wrapped
May 18 23:25:35 ok988com sshd: pam_unix(sshd:session): session opened for user ok988 by (uid=0)
May 18 23:25:35 ok988com sshd: pam_unix(sshd:session): session closed for user ok988
May 18 23:25:41 ok988com sshd: pam_sm_authenticate: Called
May 18 23:25:41 ok988com sshd: pam_sm_authenticate: username =
May 18 23:25:41 ok988com sshd: Passphrase file wrapped
May 18 23:25:42 ok988com sshd: pam_unix(sshd:session): session opened for user ok988 by (uid=0)
May 18 23:25:42 ok988com sshd: pam_unix(sshd:session): session closed for user ok988
May 18 23:25:48 ok988com sshd: pam_sm_authenticate: Called
May 18 23:25:48 ok988com sshd: pam_sm_authenticate: username =
May 18 23:25:48 ok988com sshd: Passphrase file wrapped
May 18 23:25:49 ok988com sshd: pam_unix(sshd:session): session opened for user ok988 by (uid=0)
May 18 23:25:49 ok988com sshd: pam_unix(sshd:session): session closed for user ok988
May 18 23:25:54 ok988com sshd: pam_sm_authenticate: Called
May 18 23:25:54 ok988com sshd: pam_sm_authenticate: username =
……
以下省略。
可以看到sshd的进程很有规律性,scp一个文件的时候打开ssh通道,传输完毕之后关闭这个通道,然后继续打开下一个ssh通道,scp传输下一个文件
crontab执行这个脚本的时候,auth.log是这样的。
May 17 04:00:08 ok988com sshd: pam_sm_authenticate: Called
May 17 04:00:08 ok988com sshd: pam_sm_authenticate: username =
May 17 04:00:08 ok988com sshd: Passphrase file wrapped
May 17 04:00:09 ok988com sshd: Accepted password for ok988 from 擦掉IP port 48714 ssh2
May 17 04:00:09 ok988com sshd: pam_unix(sshd:session): session opened for user ok988 by (uid=0)
May 17 04:00:09 ok988com sshd: Received disconnect from 擦掉IP: 11: disconnected by user
May 17 04:00:09 ok988com sshd: pam_unix(sshd:session): session closed for user ok988
May 17 04:00:15 ok988com sshd: pam_sm_authenticate: Called
May 17 04:00:15 ok988com sshd: pam_sm_authenticate: username =
May 17 04:00:15 ok988com sshd: Passphrase file wrapped
May 17 04:00:16 ok988com sshd: Accepted password for ok988 from 擦掉IP port 48718 ssh2
May 17 04:00:16 ok988com sshd: pam_unix(sshd:session): session opened for user ok988 by (uid=0)
May 17 04:00:16 ok988com sshd: Received disconnect from 擦掉IP: 11: disconnected by user
May 17 04:00:16 ok988com sshd: pam_unix(sshd:session): session closed for user ok988
May 17 04:00:24 ok988com sshd: pam_sm_authenticate: Called
May 17 04:00:24 ok988com sshd: pam_sm_authenticate: username =
May 17 04:00:24 ok988com sshd: Passphrase file wrapped
May 17 04:00:25 ok988com sshd: Accepted password for ok988 from 擦掉IP port 48720 ssh2
May 17 04:00:25 ok988com sshd: pam_unix(sshd:session): session opened for user ok988 by (uid=0)
May 17 04:00:25 ok988com sshd: Received disconnect from 擦掉IP: 11: disconnected by user
May 17 04:00:25 ok988com sshd: pam_unix(sshd:session): session closed for user ok988
May 17 04:00:31 ok988com sshd: pam_sm_authenticate: Called
May 17 04:00:31 ok988com sshd: pam_sm_authenticate: username =
May 17 04:00:31 ok988com sshd: Passphrase file wrapped
May 17 04:00:32 ok988com sshd: Accepted password for ok988 from 擦掉IP port 48726 ssh2
May 17 04:00:32 ok988com sshd: pam_unix(sshd:session): session opened for user ok988 by (uid=0)
May 17 04:00:32 ok988com sshd: Received disconnect from 擦掉IP: 11: disconnected by user
May 17 04:00:32 ok988com sshd: pam_unix(sshd:session): session closed for user ok988
May 17 04:00:38 ok988com sshd: pam_sm_authenticate: Called
May 17 04:00:38 ok988com sshd: pam_sm_authenticate: username =
May 17 04:00:38 ok988com sshd: Passphrase file wrapped
May 17 04:00:39 ok988com sshd: Accepted password for ok988 from 擦掉IP port 48727 ssh2
May 17 04:00:39 ok988com sshd: pam_unix(sshd:session): session opened for user ok988 by (uid=0)
May 17 04:00:39 ok988com sshd: Received disconnect from 擦掉IP: 11: disconnected by user
May 17 04:00:39 ok988com sshd: pam_unix(sshd:session): session closed for user ok988
May 17 04:00:44 ok988com sshd: pam_sm_authenticate: Called
May 17 04:00:44 ok988com sshd: pam_sm_authenticate: username =
May 17 04:00:44 ok988com sshd: Passphrase file wrapped
依然很有规律,但是确实只有9个sshd。后来不知道怎么就想到了多线程的问题,shell中实现多线程的方式是后台执行,隐隐约约记得好像crontab就是后台执行的,再一翻man手册,mansshd_config,发现有一个MaxStartup参数,定义了最大允许的远程ssh的数量,默认是10,刚好9个文件传输通道加上1个ssh登录认证通道,这不就刚好是10个么?
然后果断选择后台执行脚本./mysql_backup.sh,果然重现了这个问题,看来后台执行真的跟前台执行有很大的不同,后台执行的时候scp并没有把自己的ssh通道关闭(但是为什么会看到Received disconnect from 擦掉IP: 11: disconnected by user?百思不得其解),导致ssh通道占满了默认的10个,后面就再也没有ssh认证信息了,全部被拒绝了。 本帖最后由 wenhq 于 2013-05-19 08:04 编辑
client close connect 是 Tcpip 结束会话的第一步。
你可以用scp -v 看一下。并使用tcpdump 抓包看看,前台和后台的区别。
注在后台运行hou,手动ssh 登陆scp dest server 能上去么? 回复 6# wenhq
后台是可以上去的,不然也不会成功传输9个文件了。
发现后台跟前台运行在目标服务器上的auth.log显示是不一样的,还是不太明白为什么后台执行的时候对方的auth.log明明显示到了收到disconnect的消息,但是却没有真正断开SSH通道? 回复 7# abcfy2
TCP/IP断开连接是需要4次挥手。你看到client段发送断开连接请求,但是如果服务器端没有回应说我也发送完数据。那么这个连接是没有断开的。
你netstat -talntp|grep 22 输出看下,服务器端的状态. 本帖最后由 abcfy2 于 2013-05-19 10:13 编辑
回复 8# wenhq
这次真是奇怪了,用for循环netstat -anltp,发现这次的SSH端口是一次开放一个,断掉之后继续下一次循环,继续开启ssh端口,和前端执行完全一样,并没有重现那个问题
算了不折腾了,把scp放到for循环之外吧,每次循环的时候把文件名加入到文件列表中,执行一次scp就行了,这样就没有后顾之忧了。
回复 9# abcfy2
crontab 里的?