免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 8364 | 回复: 9
打印 上一主题 下一主题

select和线程池 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2006-04-15 13:52 |只看该作者 |倒序浏览
真不好意思总问低智商问题,
我做了一个socket服务器是并发的,每接收一个就要新fork一个进程
可是他们说我做的不好因为如果同时有100个客户同时连接恐怕就会搞死程序
所以他们让我用线程池限定一个最大连接数,超过这个数字就等待
可是我没做过线程,只会用fork
他们也都是学java的说帮不上我忙
让我自己想
select我也没用过看书时书上说

int select(nfds, readfds, writefds, exceptfds, timeout)  
ndfs:select监视的文件句柄数,视进程中打开的文件数而定,一般设为呢要监视各文件 中的最大文件号加一。

那么是不是用select就可以完全的取代了线程池的技术了讷?
我若用select函数而不用线程池这样可以么?
希望各位能指点一下小弟,谢谢

论坛徽章:
0
2 [报告]
发表于 2006-04-15 14:54 |只看该作者
你可参考
《UNIX系统编程》 Unix™ Systems Programming: Communication, Concurrency, and Threads

一书22.2节--服务器体系结构。
在这节中分析了“每个请求一个进程”、“每个请求一个线程”及
构建“工作者线程池”等内容。

我这里电子版是英文的。


下面是22.2节

22.2 Server Architectures
Chapter 18 introduced three models of client-server communication: the serial-server (Example 18.2), the parent-server (Example 18.3), and the threaded-server (Example 18.6), respectively. Because the parent-server strategy creates a new child process to handle each client request, it is sometimes called process-per-request. Similarly, the threaded-server strategy creates a separate thread to handle each incoming request, so it is often called the thread-per-request strategy.

An alternative strategy is to create processes or threads to form a worker pool before accepting requests. The workers block at a synchronization point, waiting for requests to arrive. An arriving request activates one thread or process while the rest remain blocked. Worker pools eliminate creation overhead, but may incur extra synchronization costs. Also, performance is critically tied to the size of the pool. Flexible implementations may dynamically adjust the number of threads or processes in the pool to maintain system balance.

Example 22.1
In the simplest worker-pool implementation, each worker thread or process blocks on the accept function, similar to a simple serial server.

for (  ;  ; )   {
   accept request
   process request
}

Although POSIX specifies that accept be thread-safe, not all operating systems currently support thread safety. Alternatively, workers can block on a lock that provides exclusive access to accept, as the next example shows.

Example 22.2
The following worker-pool implementation places the accept function in a protected critical section so that only one worker thread or process blocks on accept at a time. The remaining workers block at the lock or are processing a request.

for (  ;  ; )  {
   obtain lock (semaphore or mutex)
      accept request
   release lock
   process request
}

POSIX provides semaphores for interprocess synchronization and mutex locks for synchronization within a process.

Exercise 22.3
If a server uses N workers, how many simultaneous requests can it process? What is the maximum number of simultaneous client connections?

Answer:

The server can process N requests simultaneously. However, additional client connections can be queued by the network subsystem. The backlog parameter of the listen function provides a hint to the network subsystem on the maximum number of client requests to queue. Some systems multiply this hint by a fudge factor. If the network subsystem sets its maximum backlog value to B, a maximum of N + B clients can be connected to the server at any one time, although only N clients may be processed at any one time.

Another worker-pool approach for threaded servers uses a standard producer-consumer configuration in which the workers block on a bounded buffer. A master thread blocks on accept while waiting for a connection. The accept function returns a communication file descriptor. Acting as the producer, the master thread places the communication file descriptor for the client connection in the bounded buffer. The worker threads are consumers that remove file descriptors and complete the client communication.

The buffer implementation of the worker pool introduces some interesting measurement issues and additional parameters. If connection requests come in bursts and service time is short, buffering can smooth out responses by accepting more connections ahead than would be provided by the underlying network subsystem. On the other hand, if service time is long, accepted connections languish in the buffer, possibly triggering timeouts at the clients. The number of additional connections that can be accepted ahead depends on the buffer size and the order of the statements synchronizing communication between the master producer and the worker consumers.

Exercise 22.4
How many connections ahead can be accepted for a buffer of size M with a master and N workers organized as follows?

Master:
   for (  ;  ;  ) {
      obtain a slot
      accept connection
      copy the file descriptor to slot
      signal item
    }

Worker:
   for (  ;  : ) {
      obtain an item (the file descriptor)
      process the communication
      signal slot
   }

Answer:

If N >= M, then each worker holds a slot while processing the request, and the master cannot accept any connections ahead. For N < M the master can process M – N connections ahead.

Exercise 22.5
How does the following strategy differ from that of Exercise 22.4? How many connections ahead can be accepted for a buffer of size M with a master and N workers organized as follows?

Master:
   for (  ;  ;  ) {
      accept connection
      obtain a slot
      copy the file descriptor to slot
      signal item
   }

Worker:
   for (  ;  ;  ) {
      obtain an item (a file descriptor)
      signal slot
      process the communication
   }

Answer:

The strategy here differs from that of Exercise 22.4 in two respects. First, the master accepts a connection before getting a slot. Second, each worker thread immediately releases the slot (signal slot) after copying the communication file descriptor. In this case, the master can accept up to M+1 connections ahead.

Exercise 22.6
In what way do system parameters affect the number of connections that are made before the server accepts them?

Answer:

The backlog parameter set by listen determines how many connections the network subsystem queues. The TCP flow control mechanisms limit the amount that the client can send before the server calls accept for that connection. The backlog parameter is typically set to 100 or more for a busy server, in contrast to the old default value of 5 [115].

Exercise 22.7
What a priori advantages and disadvantages do worker-pool implementations have over thread-per-request implementations?

Answer:

For short requests, the overhead of thread creation and buffer allocation can be significant in thread-per-request implementations. Also, these implementations do not degrade gracefully when the number of simultaneous connections exceeds system capacity—these implementations usually just keep accepting additional connections, which can result in system failure or thrashing. Worker-pool implementations save the overhead of thread creation. By setting the worker-pool size appropriately, a system administrator can prevent thrashing and crashing that might occur during busy times or during a denial-of-service attack. Unfortunately, if the worker-pool size is too low, the server will not run to full capacity. Hence, good worker-pool deployments need the support of performance measurements.

Exercise 22.8
Can the buffer-pool approach be implemented with a pool of child processes?

Answer:

The communication file descriptors are small integer values that specify position in the file descriptor table. These integers only have meaning in the context of the same process, so a buffer-pool implementation with child processes would not be possible.

In thread-per-request architectures, the master thread blocks on accept and creates a thread to handle each request. While the size of the pool limits the number of concurrent threads competing for resources in worker pool approaches, thread-per-request designs are prone to overallocation if not carefully monitored.

Exercise 22.9
What is a process-per-request strategy and how might it be implemented?

Answer:

A process-per-request strategy is analogous to a thread-per-request strategy. The server accepts a request and forks a child (rather than creating a thread) to handle it. Since the main thread does not fork a child to handle the communication until the communication file descriptor is available, the child inherits a copy of the file descriptor table in which the communication file descriptor is valid.

The designs thus far have focused on the communication file descriptor as the principal resource. However, heavily used web servers are often limited by their disks, I/O subsystems and memory caches. Once a thread receives a communication file descriptor and is charged with handling the request, it must locate the resource on disk. This process may require a chain of disk accesses.

Example 22.10
The client request to retrieve /usp/exercises/home.html may require several disk accesses by the OS file subsystem. First, the file subsystem locates the inode corresponding to usp by reading the contents of the web server's root directory and parsing the information to find usp. Once the file subsystem has retrieved the inode for usp, it reads and parses data blocks from usp to locate exercises. The process continues until the file subsystem has retrieved the actual data for home.html. To eliminate some of these disk accesses, the operating system may cache inodes indexed by pathname.

To avoid extensive disk accesses to locate a resource, servers often cache the inode numbers of the most popular resources. Such a cache might be effectively managed by a single thread or be controlled by a monitor.

Disk accesses are usually performed through the I/O subsystem of the operating system. The operating system provides caching and prefetching of blocks. To eliminate the inefficiency of extra copying and blocking through the I/O subsystem, web servers sometimes cache their most popular pages in memory or in a disk area that bypasses the operating system file subsystem.

[ 本帖最后由 westgarden 于 2006-4-15 14:56 编辑 ]

论坛徽章:
0
3 [报告]
发表于 2006-04-15 15:00 |只看该作者
哇,一句也看不懂啊

论坛徽章:
0
4 [报告]
发表于 2006-04-15 15:35 |只看该作者
所以他们让我用线程池限定一个最大连接数,超过这个数字就等待

感觉这个即使用select也需要用线程中的条件变量或信号灯技术.可以用select+多线程技术或者线程池的技术.

论坛徽章:
0
5 [报告]
发表于 2006-04-15 16:22 |只看该作者
那么楼上的意思就是说,只用select方法不能代替线程池啦?
那我就应该做一个线程池了?

可是我不太了解具体实施方法
好像就是弄个数组来存放正在处理的数据,处理完了的数据就离开空出位置
然后每进来一个数据就查看一下这个数组
若有空位置就放到那里,若没有就让数据先等待
我理解的对吧~~~~

可是我还是不太懂,如果用线程池,会不会造成数据丢失呢?
客户端只是发送一个字符传就走了,
客户怎么被堵塞等待的呢?
客户端可不管服务器读不读数据,他往这边一个send就byebye啦
那不就完蛋了么?

论坛徽章:
0
6 [报告]
发表于 2006-04-16 10:53 |只看该作者
原帖由 westgarden 于 2006-4-15 14:54 发表
你可参考
《UNIX系统编程》 Unix™ Systems Programming: Communication, Concurrency, and Threads

一书22.2节--服务器体系结构。
在这节中分析了“每个请求一个进程”、“每个请求一个线程”及 ...


老兄,这本书哪儿有电子版的啊?给个链接吧。谢谢了。

论坛徽章:
0
7 [报告]
发表于 2006-04-16 13:00 |只看该作者
网上搜索 Linux环境下的通用线程池设计
应该会看到一篇doc的文章,说的挺详细的.

论坛徽章:
0
8 [报告]
发表于 2006-04-16 15:40 |只看该作者
原帖由 mingjwan 于 2006-4-16 10:53 发表
老兄,这本书哪儿有电子版的啊?给个链接吧。谢谢了。


中文电子版的没有,只有英文的。

Unix™ Systems Programming: Communication, Concurrency, and Threads
By Kay A. Robbins, Steven Robbins
两位作者是大学教授,是对夫妻。
此书在 amazom.com 上是4.5星,是本很好的教科书和技术参考书。

例子及代码都很丰富。有人认为,在这点上此书太过罗嗦。
不过,对于初学者来说这种“罗嗦”很必要。


以下4个emule链接(你可以自己搜索一下)是可用的:

eBook.Prentice.Hall.PTR.-.Unix.Systems.Programming.Second.Edition.ShareReactor.chm (2.51 MB)

Unix Systems Programming Communication Concurrency and Threads.chm (2.64 MB)

Prentice Hall - Unix Systems Programming - Communication Concurrency And Threads 2Nd Ed 2003.pdf (10.41 MB)

Prentice Hall - Unix Systems Programming Second.Edition.pdf (4.8 MB)

论坛徽章:
0
9 [报告]
发表于 2006-04-17 16:00 |只看该作者

回复 1楼 lishengxu 的帖子

不是兄弟弱智,只是比别人晚训练几个月而已。
select和线程池是并发编程的基本工具,select在Steven的unix网络编程上讲的很详细了,线程池除了上面兄弟推荐的书,还有一本《深入理解计算机系统》,里面也讲了基本的原理和代码框架的。
并发编程用fork是比较耗资源,上百个就明显感觉得到,但是pthread起1k个也没有什么问题的。

论坛徽章:
0
10 [报告]
发表于 2006-04-18 19:21 |只看该作者
提示: 作者被禁止或删除 内容自动屏蔽
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP