- 论坛徽章:
- 0
|
我的开发板上跑五个应用程序,一个server和四个client,四个client进程通过socket和server通信,server维护数据库sqlite3。我使用一个工具从“/proc/stat”文件里面读取的CPU占用率,一秒钟读取一次。CPU占用率100%的时候系统能正常工作,应用的功能也能完整实现,也可能是没有长时间的测试,反正跑两个小时是不会出问题的,就是感觉系统运转得很慢。五个进程都跑起来了以后,CPU占用率100%。随便杀死一个client,CPU马上降到30%以下,如果重新启动那个被杀死的client进程,CPU又马上回到了100%。我用gprof工具分析100%时server进程得到如下数据:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
100.00 6.10 6.10 __libc_check_standard_fds
0.00 6.10 0.00 33577915 0.00 0.00 getApp
0.00 6.10 0.00 16788588 0.00 0.00 LEAVE_CRITICAL
0.00 6.10 0.00 16788587 0.00 0.00 ENTER_CRITICAL
。。。。。。
看起来__libc_check_standard_fds,函数最占CPU。
我用开发环境:
开发板:S3C2410+linux2.6.18
编译器:gcc-3.4.6
C库: glibc-2.3.6
我后来有下载了glic-2.3.6的源代码,发现LIBC_START_MAIN()调用了__libc_check_standard_fds()
。。。
#ifndef SHARED
/* Some security at this point. Prevent starting a SUID binary where
the standard file descriptors are not opened. We have to do this
only for statically linked applications since otherwise the dynamic
loader did the work already. */
if (__builtin_expect (__libc_enable_secure, 0))
__libc_check_standard_fds ();
#endif
。。。
而__libc_check_standard_fds()函数里面
void __libc_check_standard_fds (void)
{
/* This is really paranoid but some people actually are. If /dev/null
should happen to be a symlink to somewhere else and not the device
commonly known as "/dev/null" we bail out. We can detect this with
the O_NOFOLLOW flag for open() but only on some system. */
#ifndef O_NOFOLLOW
# define O_NOFOLLOW 0
#endif
/* Check all three standard file descriptors. */
check_one_fd (STDIN_FILENO, O_RDONLY | O_NOFOLLOW);
check_one_fd (STDOUT_FILENO, O_RDWR | O_NOFOLLOW);
check_one_fd (STDERR_FILENO, O_RDWR | O_NOFOLLOW);
}
继续看check_one_fd 发现里面有死循环while(1);
/* Should other OSes (e.g., Hurd) have different versions which can
be written in a better way? */
static void
check_one_fd (int fd, int mode)
{
/* Note that fcntl() with this parameter is not a cancellation point. */
if (__builtin_expect (__libc_fcntl (fd, F_GETFD), 0) == -1
&& errno == EBADF)
{
struct stat64 st;
/* Something is wrong with this descriptor, it's probably not
opened. Open /dev/null so that the SUID program we are
about to start does not accidently use this descriptor. */
int nullfd = open_not_cancel (_PATH_DEVNULL, mode, 0);
/* We are very paranoid here. With all means we try to ensure
that we are actually opening the /dev/null device and nothing
else.
Note that the following code assumes that STDIN_FILENO,
STDOUT_FILENO, STDERR_FILENO are the three lowest file
decsriptor numbers, in this order. */
if (__builtin_expect (nullfd != fd, 0)
|| __builtin_expect (__fxstat64 (_STAT_VER, fd, &st), 0) != 0
|| __builtin_expect (S_ISCHR (st.st_mode), 1) == 0
#if defined DEV_NULL_MAJOR && defined DEV_NULL_MINOR
|| st.st_rdev != makedev (DEV_NULL_MAJOR, DEV_NULL_MINOR)
#endif
)
/* We cannot even give an error message here since it would
run into the same problems. */
while (1)
/* Try for ever and ever. */
ABORT_INSTRUCTION;
}
}
ABORT_INSTRUCTION 为空。
看起来是操作标准文件描述符的时候出了问题,用系统调用fcntl操作012文件的时候出问题了,在跟下去
。。。
我已经崩溃了,呵呵。
我想到一个解决方法,就是在 while (1)后面sleep一秒或者几毫秒,也就是说要改glibc源代码,这样似乎可以治标但是不能治本,不知道会不会有其他安全的问题,而过程也很麻烦要重建工具链和环境。。。。。。
更重要的是我很想知道为什么会出现这样的现象?以及我的分析和跟踪到底对不对?
我也待该看了一下SUID方面的东西,实在不知道怎么入手,SUID/SGID编程也不是很明白,解决这个问题有很着急。
呵呵,大家随便来聊两句,交流交流。 |
|