- 论坛徽章:
- 1
|
write(), thread safety, and POSIX
[Posted April 18, 2006 by corbet]
Dan Bonachea recently reported a problem. It seems that he has a program where multiple threads are simultaneously writing to the same file descriptor. Occasionally, some of that output disappears - overwritten by other threads. Random loss of output data is not generally considered to be a desirable sort of behavior, and, says Dan, POSIX requires that write() calls be thread-safe. So he would like to see this behavior fixed.
Andrew Morton quickly pointed out the source of this behavior. Consider how write() is currently implemented:
asmlinkage ssize_t sys_write(unsigned int fd, const char __user *buf,
size_t count)
{
struct file *file;
ssize_t ret = -EBADF;
int fput_needed;
file = fget_light(fd, &fput_needed);
if (file) {
loff_t pos = file_pos_read(file);
ret = vfs_write(file, buf, count, &pos);
file_pos_write(file, pos);
fput_light(file, fput_needed);
}
return ret;
}
There is no locking around this function, so it is possible for two (or more) threads performing simultaneous writes to obtain the same value for pos. They will each then write their data to the same file position, and the thread which writes last wins.
Putting some sort of lock (using the inode lock, perhaps) around the entire function would solve the problem and make write() calls thread-safe. The cost of this solution would be high, however: an extra layer of locking when almost no application actually needs it. Serializing write() operations in this way would also rule out simultaneous writes to the same file - a capability which can be useful to some applications.
So some developers have questioned whether this behavior should be fixed at all. It is not something which causes problems for over 99.9% of applications, and, for those which need to be able to perform this sort of simultaneous write, there are other options available. These include user-space locking or using the O_APPEND option. So, it is asked, why add unnecessary overhead to the kernel?
Linus responds that it is a "quality of implementation" issue, and that if there is a low-cost way of getting the system to behave the way users would like, it might as well be done. His proposal is to apply a lock to the file position in particular. His patch adds a f_pos_lock mutex to the file structure and uses that lock to serialize uses of and changes to the file position. This change will have the effect of serializing calls to write(), while leaving other forms (asynchronous I/O, pwrite()) unserialized.
The patch has not drawn a lot of comments, and it has not been merged as of this writing. Its ultimate fate will probably depend on whether avoiding races in this obscure case is truly seen to be worth the additional cost imposed on all users.
不知道这个问题后面怎么处理的。 |
|