免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2815 | 回复: 2
打印 上一主题 下一主题

computing node suddently lost network connection [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2010-03-26 09:36 |只看该作者 |倒序浏览
科学计算机群中的计算节点经常突然丢失网络连接,那位大侠知道什么原因?

kernel: bnx2: eth0 NIC Copper Link is Down
The /var/log/messages file after a cluster boot is:

May 15 18:37:05 uranus mountd[4593]: Caught signal 15, un-registering and exiting.
May 15 18:37:17 uranus rpc.statd[4088]: Caught signal 15, un-registering and exiting.
May 15 18:39:47 uranus kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:39:54 uranus sshd[4143]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:39:55 uranus xinetd[4155]: /etc/xinetd.d/RCS is not a regular file. It is being skipped.
May 15 18:40:00 uranus automount[3976]: lookup_mount: lookup(file): key "mysql" not found in map
May 15 18:40:00 uranus automount[3976]: lookup_mount: lookup(file): key "mysql" not found in map
May 15 18:40:12 uranus smartd[4824]: Problem creating device name scan list
May 15 18:50:34 compute-0-0.local rpc.statd[2609]: Caught signal 15, un-registering and exiting.
May 15 18:50:34 compute-0-2.local rpc.statd[2605]: Caught signal 15, un-registering and exiting.
May 15 18:50:34 compute-0-3.local rpc.statd[2595]: Caught signal 15, un-registering and exiting.
May 15 18:50:34 compute-0-1.local rpc.statd[2600]: Caught signal 15, un-registering and exiting.
May 15 18:53:50 compute-0-3.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:53:50 compute-0-3.local kernel: ata_piix 0000:00:1f.2: no available legacy port
May 15 18:53:50 compute-0-1.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:53:50 compute-0-2.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:53:51 compute-0-0.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:53:58 compute-0-3.local sshd[2827]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:53:59 compute-0-2.local sshd[2850]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:53:59 compute-0-1.local sshd[2852]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:54:00 compute-0-0.local sshd[2848]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:54:08 compute-0-3.local smartd[3073]: Problem creating device name scan list
May 15 18:54:09 compute-0-2.local smartd[3096]: Problem creating device name scan list
May 15 18:54:10 compute-0-1.local smartd[3100]: Problem creating device name scan list
May 15 18:54:11 compute-0-0.local smartd[3096]: Problem creating device name scan list

论坛徽章:
0
2 [报告]
发表于 2010-03-26 09:47 |只看该作者
Mar 25 11:01:05 play8dz kernel: bnx2: eth0 NIC Copper Link is Down
Mar 25 11:35:02 compute-0-0.local rpc.statd[4101]: Caught signal 15, un-registering and exiting.
Mar 25 11:35:12 compute-0-1.local automount[4002]: umount_autofs_indirect: ask umount returned busy /home
Mar 25 11:35:17 compute-0-1.local rpc.statd[3757]: Caught signal 15, un-registering and exiting.
Mar 25 11:38:18 compute-0-0.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
Mar 25 11:38:23 compute-0-0.local sshd[4133]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
Mar 25 11:38:25 compute-0-0.local smartd[4314]: Problem creating device name scan list
Mar 25 11:45:20 compute-0-1.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
Mar 25 11:45:24 compute-0-1.local sshd[4357]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
Mar 25 11:45:26 compute-0-1.local smartd[4540]: Problem creating device name scan list

论坛徽章:
0
3 [报告]
发表于 2010-03-26 09:48 |只看该作者
[root@compute-0-19 ~]# more /var/log/messages
Mar 23 15:37:20 compute-0-19 kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
Mar 23 15:37:20 compute-0-19 kernel: bnx2: eth0 NIC Copper Link is Down
Mar 23 15:37:23 compute-0-19 sshd[4356]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
Mar 23 15:37:28 compute-0-19 ntpdate[4386]: no server suitable for synchronization found
Mar 23 15:37:29 compute-0-19 smartd[4539]: Problem creating device name scan list
Mar 23 15:48:40 compute-0-19 rockscommand[5849]: unknown roll name "%"
Mar 23 15:48:53 compute-0-19 rockscommand[5850]: unknown roll name "%"
Mar 25 11:02:10 compute-0-19 kernel: bnx2: eth0 NIC Copper Link is Down
Mar 25 15:53:38 compute-0-19 kernel: bnx2: eth0 NIC Copper Link is Down
Mar 25 15:58:28 compute-0-19 syslogd: sendto: Network is unreachable
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP