- 论坛徽章:
- 0
|
科学计算机群中的计算节点经常突然丢失网络连接,那位大侠知道什么原因?
kernel: bnx2: eth0 NIC Copper Link is Down
The /var/log/messages file after a cluster boot is:
May 15 18:37:05 uranus mountd[4593]: Caught signal 15, un-registering and exiting.
May 15 18:37:17 uranus rpc.statd[4088]: Caught signal 15, un-registering and exiting.
May 15 18:39:47 uranus kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:39:54 uranus sshd[4143]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:39:55 uranus xinetd[4155]: /etc/xinetd.d/RCS is not a regular file. It is being skipped.
May 15 18:40:00 uranus automount[3976]: lookup_mount: lookup(file): key "mysql" not found in map
May 15 18:40:00 uranus automount[3976]: lookup_mount: lookup(file): key "mysql" not found in map
May 15 18:40:12 uranus smartd[4824]: Problem creating device name scan list
May 15 18:50:34 compute-0-0.local rpc.statd[2609]: Caught signal 15, un-registering and exiting.
May 15 18:50:34 compute-0-2.local rpc.statd[2605]: Caught signal 15, un-registering and exiting.
May 15 18:50:34 compute-0-3.local rpc.statd[2595]: Caught signal 15, un-registering and exiting.
May 15 18:50:34 compute-0-1.local rpc.statd[2600]: Caught signal 15, un-registering and exiting.
May 15 18:53:50 compute-0-3.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:53:50 compute-0-3.local kernel: ata_piix 0000:00:1f.2: no available legacy port
May 15 18:53:50 compute-0-1.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:53:50 compute-0-2.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:53:51 compute-0-0.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:53:58 compute-0-3.local sshd[2827]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:53:59 compute-0-2.local sshd[2850]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:53:59 compute-0-1.local sshd[2852]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:54:00 compute-0-0.local sshd[2848]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:54:08 compute-0-3.local smartd[3073]: Problem creating device name scan list
May 15 18:54:09 compute-0-2.local smartd[3096]: Problem creating device name scan list
May 15 18:54:10 compute-0-1.local smartd[3100]: Problem creating device name scan list
May 15 18:54:11 compute-0-0.local smartd[3096]: Problem creating device name scan list |
|