免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 8552 | 回复: 4
打印 上一主题 下一主题

一个令人匪夷所思的linux下多线程recv,recvfrom 超时问题 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2010-06-10 09:40 |只看该作者 |倒序浏览
1。本人在软件开发中遇到了一个令人匪夷所思的问题,在多线程中收集目标子网中主机的信息,
   本人对recvfrom,recv等函数都设置了超时,平时运行时都没有发现问题,但是在压力测试的时候发现即使超时了recvfrom,recv函数依然挂起,没有退出。
   更令人奇怪的是在windows平台上,该程序运行良好,不会挂起,但是在linux上就发现了上面的问题。
2。以下是堆栈信息:
以下是recv挂起:
Thread 5 (Thread -154604624 (LWP 19922)):
#0  0x00bbe7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x001fe168 in pthread_join () from /lib/tls/libpthread.so.0
#2  0xf6fa50b3 in ThreadOp::nodeSearchCallBack () from /home/caox/common_component/lib/libhrm_manager_net.so
#3  0x001fd1d5 in start_thread () from /lib/tls/libpthread.so.0
#4  0x00c9c2da in clone () from /lib/tls/libc.so.6

Thread 4 (Thread -165094480 (LWP 19923)):
#0  0x00bbe7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00201a24 in sem_wait@GLIBC_2.0 () from /lib/tls/libpthread.so.0
#2  0xf6fef4ec in ?? () from /home/caox/common_component/lib/libhrm_manager_net.so
#3  0xf6fa48a4 in ThreadOp::threadJoinOp () from /home/caox/common_component/lib/libhrm_manager_net.so
#4  0x001fd1d5 in start_thread () from /lib/tls/libpthread.so.0
#5  0x00c9c2da in clone () from /lib/tls/libc.so.6

Thread 3 (Thread -177476688 (LWP 2062):
#0  0x00bbe7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x001fe168 in pthread_join () from /lib/tls/libpthread.so.0
#2  0xf6fa7be6 in ThreadOp::runNodeCallBack () from /home/caox/common_component/lib/libhrm_manager_net.so
#3  0x001fd1d5 in start_thread () from /lib/tls/libpthread.so.0
#4  0x00c9c2da in clone () from /lib/tls/libc.so.6

Thread 2 (Thread -177882192 (LWP 2063):
#0  0x00bbe7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x002027e8 in recv () from /lib/tls/libpthread.so.0
#2  0xf6f33465 in TELNET::recv_msg () from /home/caox/common_component/lib/libhrm_manager_net.so
#3  0xf6f3467b in TELNET::communicate () from /home/caox/common_component/lib/libhrm_manager_net.so
#4  0xf6f35528 in TELNET::getName_Via_Telnet () from /home/caox/common_component/lib/libhrm_manager_net.so
#5  0xf6f32a33 in TELNET::executeProtocol () from /home/caox/common_component/lib/libhrm_manager_net.so
#6  0xf6fa519e in ThreadOp::runProtocolCallBack () from /home/caox/common_component/lib/libhrm_manager_net.so
#7  0x001fd1d5 in start_thread () from /lib/tls/libpthread.so.0
#8  0x00c9c2da in clone () from /lib/tls/libc.so.6

Thread 1 (Thread -154601792 (LWP 19921)):
#0  0x00bbe7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x001fe168 in pthread_join () from /lib/tls/libpthread.so.0
#2  0xf6fa540a in ThreadOp::nodeSearch () from /home/caox/common_component/lib/libhrm_manager_net.so
#3  0xf6fa010c in NodeSearch::startSearch () from /home/caox/common_component/lib/libhrm_manager_net.so
#4  0xf6fa1b26 in NodeSearch::searchIPv4 () from /home/caox/common_component/lib/libhrm_manager_net.so
#5  0xf6fa2dd9 in NodeSearch::search () from /home/caox/common_component/lib/libhrm_manager_net.so
#6  0x080607b6 in TopologyDiscovery:iscovery ()
#7  0x08052238 in DiscoveryMain::run ()
#8  0x080526c8 in main ()
3:以下是recvfrom挂起:

Thread 6 (Thread -154604624 (LWP 24912)):
#0  0x00bbe7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00202868 in recvfrom () from /lib/tls/libpthread.so.0
#2  0xf6f492b8 in ArpMonitor::monitorARP () from /home/caox/common_component/lib/libhrm_manager_net.so
#3  0x001fd1d5 in start_thread () from /lib/tls/libpthread.so.0
#4  0x00c9c2da in clone () from /lib/tls/libc.so.6

Thread 5 (Thread -165094480 (LWP 24913)):
#0  0x00bbe7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x001fe168 in pthread_join () from /lib/tls/libpthread.so.0
#2  0xf6fa50a7 in ThreadOp::nodeSearchCallBack () from /home/caox/common_component/lib/libhrm_manager_net.so
#3  0x001fd1d5 in start_thread () from /lib/tls/libpthread.so.0
#4  0x00c9c2da in clone () from /lib/tls/libc.so.6

Thread 4 (Thread -175584336 (LWP 24914)):
#0  0x00bbe7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00201a24 in sem_wait@GLIBC_2.0 () from /lib/tls/libpthread.so.0
#2  0xf6fef4e4 in ?? () from /home/caox/common_component/lib/libhrm_manager_net.so
#3  0xf6fa4898 in ThreadOp::threadJoinOp () from /home/caox/common_component/lib/libhrm_manager_net.so
#4  0x001fd1d5 in start_thread () from /lib/tls/libpthread.so.0
#5  0x00c9c2da in clone () from /lib/tls/libc.so.6

Thread 3 (Thread -197637200 (LWP 26000)):
#0  0x00bbe7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x001fe168 in pthread_join () from /lib/tls/libpthread.so.0
#2  0xf6fa7bda in ThreadOp::runNodeCallBack () from /home/caox/common_component/lib/libhrm_manager_net.so
#3  0x001fd1d5 in start_thread () from /lib/tls/libpthread.so.0
#4  0x00c9c2da in clone () from /lib/tls/libc.so.6

Thread 2 (Thread -198177872 (LWP 26007)):
#0  0x00bbe7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00202868 in recvfrom () from /lib/tls/libpthread.so.0
#2  0xf6f2df70 in ARP::ping_linux () from /home/caox/common_component/lib/libhrm_manager_net.so
#3  0xf6f2d3d5 in ARP::ping () from /home/caox/common_component/lib/libhrm_manager_net.so
#4  0xf6f2cc2c in ARP::executeProtocol () from /home/caox/common_component/lib/libhrm_manager_net.so
#5  0xf6fa5192 in ThreadOp::runProtocolCallBack () from /home/caox/common_component/lib/libhrm_manager_net.so
#6  0x001fd1d5 in start_thread () from /lib/tls/libpthread.so.0
#7  0x00c9c2da in clone () from /lib/tls/libc.so.6

Thread 1 (Thread -154601792 (LWP 24911)):
#0  0x00bbe7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x001fe168 in pthread_join () from /lib/tls/libpthread.so.0
#2  0xf6fa53fe in ThreadOp::nodeSearch () from /home/caox/common_component/lib/libhrm_manager_net.so
#3  0xf6fa0100 in NodeSearch::startSearch () from /home/caox/common_component/lib/libhrm_manager_net.so
#4  0xf6fa1b1a in NodeSearch::searchIPv4 () from /home/caox/common_component/lib/libhrm_manager_net.so
#5  0xf6fa2dcd in NodeSearch::search () from /home/caox/common_component/lib/libhrm_manager_net.so
#6  0x080607c6 in TopologyDiscovery:iscovery ()
#7  0x08052247 in DiscoveryMain::run ()
#8  0x080526d8 in main ()
(gdb)


4.然后本人和同事进行了如下尝试,把socket全部改成非阻塞的,使用select函数来进行超时,发现还是有时候即使超时了也会在recvfrom,recv中挂起。
5.有事甚至在getaddrinfo中也挂起...

论坛徽章:
0
2 [报告]
发表于 2010-06-10 10:05 |只看该作者
对不起,忘了说linux平台了,FC3,FC9都有类似问题,希望各位大侠来探讨探讨

论坛徽章:
0
3 [报告]
发表于 2010-06-10 10:08 |只看该作者
多平台socket不好做,或是我能力不够,做了一个windows和linux平台通用的通讯库。
linux下正常,windows上有问题多多。
linux下非阻塞recv没问题,楼主先从代码入手查查吧

论坛徽章:
0
4 [报告]
发表于 2010-06-10 10:16 |只看该作者
代码走查,看了N遍了,自己完全看不出来啊。。。。痛苦中啊。。。。

论坛徽章:
0
5 [报告]
发表于 2010-07-27 15:28 |只看该作者
除了怀疑你代码超时设置有问题,想不出别的原因。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP