Chinaunix

标题: linux write 函数是否是线程安全的？ [打印本页]

作者: alwaysR9 时间: 2015-09-10 17:45
标题: linux write 函数是否是线程安全的？
我做了两个实验：

第一个实验，创建一个本地文件，然后用5个线程对这个文件进行写入，结果前面的写入内容被后面的写入内容覆盖；对write函数加锁之后结果就正常了，就似乎验证了write函数是非线程安全的。

第二个实验，创建一个客户端的TCP socket，然后用5个线程对这个socket进行写入；服务器端把内容读取出来并打印，发现打印结果与客户端发送内容一致，没有出现异常，似乎说明write TCP socket是线程安全的。

我的问题是：
如果write不是线程安全的，为什么写TCP socket却正常，是否因为系统为socket操作加锁了？

实验代码如下

#include <unistd.h>
#include <errno.h>
#include <pthread.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <sys/types.h>
#include <sys/select.h>
#include <sys/stat.h>
#include <arpa/inet.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUFF_SIZE 1024
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
struct ThreadArg
{
int id;
int fd;
};
void*
proc(void* arg)
{
struct ThreadArg* p_arg = (struct ThreadArg*) arg;
char msg[BUFF_SIZE];
int n_msg;
n_msg = snprintf(msg, BUFF_SIZE, "thread_%d\n", p_arg->id);
int i;
for (i = 0; i < 5; ++ i)
{
//pthread_mutex_lock(& mutex);
if (write(p_arg->fd, msg, n_msg) < 0)
perror("thread %d write fail");
//pthread_mutex_unlock(& mutex);
}
}
int
open_socket(char* ip)
{
int connfd;
struct sockaddr_in serv_addr;
if ( (connfd = socket(AF_INET, SOCK_STREAM, 0)) == -1)
return -1;
memset(&serv_addr, 0, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_port = htons(9999);
inet_pton(AF_INET, ip, &serv_addr.sin_addr);
if ( connect(connfd, (struct sockaddr*) &serv_addr, sizeof(serv_addr)) == -1)
return -1;
return connfd;
}
int
open_file(char* file_name)
{
return open(file_name, O_RDWR | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
}
int
main(int argc, char** argv)
{
int fd;
pthread_t tids[5];
struct ThreadArg targ[5];
//if ( (fd = open_socket("127.0.0.1")) < 0) // 实验二
// exit(1);
if ( (fd = open_file("data")) < 0) // 实验一
exit(1);
/* start child threads */
int i;
for (i = 0; i < 5; ++ i)
{
targ[i].id = i;
targ[i].fd = fd;
pthread_create(tids+i, NULL, proc, targ+i);
}
for (i = 0; i < 5; ++ i)
pthread_join(tids[i], NULL);
close(fd);
exit(0);
}

复制代码

实验二需要的服务器程序代码如下：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <sys/epoll.h>
#include <fcntl.h>
#include <errno.h>
const int MAX_EVENTS = 1024;
const int BUFF_SIZE = 1024;
void err_quit(const char* msg) {
printf("%s, error code = %d\n", msg, errno);
exit(1);
}
void err_sys(const char* msg) {
printf("%s, error code = %d\n", msg, errno);
}
int create_and_bind(int port_no) {
int listen_fd;
struct sockaddr_in serv_addr;
if ( (listen_fd = socket(AF_INET, SOCK_STREAM, 0)) < 0)
return -1;
memset(&serv_addr, 0, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr = INADDR_ANY;
serv_addr.sin_port = htons(port_no);
if ( bind(listen_fd, (struct sockaddr*) &serv_addr, sizeof(serv_addr)) < 0)
return -1;
return listen_fd;
}
int communicate(const int fd)
{
int n_msg = 9;
char msg[BUFF_SIZE];
int count = 0;
int n_read;
while ( (n_read = read(fd, msg, n_msg)) > 0)
{
msg[n_msg] = 0;
printf("%s", msg);
++ count;
}
printf("msg number = %d\n", count);
if (n_read < 0)
return -1;
return 0;
}
int main(int argc, char** argv) {
int listen_fd;
int conn_fd;
/* create and bind listening socket */
listen_fd = create_and_bind(9999);
if (listen_fd < 0)
err_quit("create and bind listening socket failed!");
/* listening */
listen(listen_fd, 100);
while (1) {
if ( (conn_fd = accept(listen_fd, NULL, NULL)) < 0)
err_sys("accept connection socket failed!");
else
if (communicate(conn_fd) < 0)
perror("read socket fail");
else
close(conn_fd);
}
close(listen_fd);
exit(0);
}

复制代码

作者: folklore 时间: 2015-09-10 18:04
系统调用都是线程安全的。

作者: bskay 时间: 2015-09-10 18:19
貌似方式错了,多线程应该不是用来做IO的, 而是用于处理器消耗型的任务并发

作者: hellioncu 时间: 2015-09-10 22:30
应该说write是线程安全的，但最终的结果如何取决于“文件”的实现。

write写普通文件，指定偏移量不重叠写，最终结果是肯定的
write写TCP socket，你每次写个几十K，结果很可能就会不一样了

作者: sculida 时间: 2015-09-11 00:28
本帖最后由 sculida 于 2015-09-15 23:29 编辑

其实往socket_fd里write也不安全，比如我把你的msg改成thread_%d0123456789001234567890012345678900123456789001234567890
然后每个线程写100遍。你自己执行看看，是不是文字也错乱了。我就得到下面这一段输出
---------------------------------------
thread_1 123456789012345678901234567890123456890
thread_1 123456789012345678901234567890123456890
thread_1 123456789012345678901234567890123456890
thread_1 123456789012345678901234567890123456890
23thread_1 123456789012345678901234567890123456890
thread_1 123456789012345678901234567890123456890
---------------------------------
关于线程与io，可以使用pwrite和pread，参见《unix高级环境编程》的章节：12.10 线程与IO

作者: alwaysR9 时间: 2015-09-11 15:40
本帖最后由 alwaysR9 于 2015-09-11 15:45 编辑

回复 5# sculida

我重新测试了write socket， 5个线程，每个线程每次写2KB数据到socket，写100次。
服务器收到的数据依然是正确的顺序
你再看看你改的程序，或者仔细测试一下，会不会你在测试输出时出错了。服务器端n_msg变量的值一定要设对，否则read出客户端发送的消息长度不对，就会造成输出看起来像乱序了一样

作者: cokeboL 时间: 2015-09-11 16:46
回复 6# alwaysR9

那么点数据和次数，能测出的概率很小。。

作者: cokeboL 时间: 2015-09-11 16:52
只考虑原理，阻塞IO，一次写入成功的数量可能比要写入的数量小，系统调用时陷入内核，多个线程竞争一个fd，A这次写了10个字节系统调用结束后，当前调度到哪个线程的write，如果不是刚才没写完的那个，就可能出错咯

作者: alwaysR9 时间: 2015-09-11 19:33
回复 8# cokeboL

我去找源码看看，只有源码能说明write有没有加锁

作者: folklore 时间: 2015-09-11 20:54
回复 8# cokeboL

write是原子的，系统一定要保证这个语义，不然这系统没法用了

作者: myworkstation 时间: 2015-09-11 23:30
我从posix的角度来解释一下这个问题：
首先我们根据posix的规定可以明定哪些系统调用是线程安全的"A function that may be safely invoked concurrently by multiple threads. Each function defined in the System Interfaces volume of POSIX.1-2008 is thread-safe unless explicitly stated otherwise. "，由标准的定义可知除非另行说明，否则常见的系统调用都是线程安全的。那么具体到write，其规定中并没有直接说明线程安全的问题，但着重说明了多个write可能导致的数据覆盖问题。write具体做了什么由如下规定"On a regular file or other file capable of seeking, the actual writing of data shall proceed from the position in the file indicated by the file offset associated with fildes. Before successful return from write(), the file offset shall be incremented by the number of bytes actually written. On a regular file, if the position of the last byte written is greater than or equal to the length of the file, the length of the file shall be set to this position plus one. ",从上面的操作可知write不是个原子操作，一次成功的write需要seeking和writing两个动作，所以原则上来讲write是非线程安全的。进而关于write有如下规定"This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control."，显然多进程操作同一文件肯定是需要同步的，但是多线程就没有更多的规定了，只能推断write是非线程安全的（因为不是原子操作），至于write写socket那么就肯定是线程安全的，因为有如下规定"If fildes refers to a socket, write ( ) shall be equivalent to send ( ) with no flags set.",而且send是线程安全的。

作者: alwaysR9 时间: 2015-09-12 09:49
回复 11# myworkstation

感谢你这么细致的解释，你的解释也与我的实验结果是一致的
我现在还是一个小菜鸟，对手册的阅读也没能达到细致，以后闲下来应该仔细读读man手册。

我还想请教一下，查看linux源码是否需要把源码下载下来？

作者: cokeboL 时间: 2015-09-12 12:19
回复 10# folklore

以前没仔细研究，还以为write和send像read recv一样，对socket操作一次返回的值可能小于传入的参数，我自己写的send都是循环的，看来没必要

看完11楼的好像明了一些，是不是普通文件有offset，而socket没有，所以对于普通文件不见得是原子而对socket是原子，而且，阻塞和非阻塞的情况
一致？

作者: giantchen 时间: 2015-09-12 12:26
[ 本帖最后由 giantchen 于 2015-09-11 21:37 编辑 ]

POSIX 标准明确规定 write() 是线程安全的。

pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09

2.9.1 Thread-Safety

All functions defined by this volume of POSIX.1-2008 shall be thread-safe, except that the following functions need not be thread-safe.

后面列的函数名单里没有 write，所以按标准它是线程安全的。

不过，据Linux手册，write 在 3.14 之后的内核才符合标准。

man7.org/linux/man-pages/man2/write.2.html

BUGS

   According to POSIX.1-2008/SUSv4 Section XSI 2.9.7 ("Thread
   Interactions with Regular File Operations"):

         All of the following functions shall be atomic with respect to
         each other in the effects specified in POSIX.1-2008 when they
         operate on regular files or symbolic links: ...

   Among the APIs subsequently listed are write() and writev(2).  And
   among the effects that should be atomic across threads (and
   processes) are updates of the file offset.  However, on Linux before
   version 3.14, this was not the case: if two processes that share an
   open file description (see open(2)) perform a write() (or writev(2))
   at the same time, then the I/O operations were not atomic with
   respect updating the file offset, with the result that the blocks of
   data output by the two processes might (incorrectly) overlap.  This
   problem was fixed in Linux 3.14.

作者: folklore 时间: 2015-09-12 14:10
回复 13# cokeboL

seek +wirte是不是原子的（显然分开的两个调用无论如何也不可能是原子的），和·write是不是原子的没有关系。

作者: cokeboL 时间: 2015-09-12 19:40
回复 15# folklore

两次系统调用肯定离原子八竿子远了。我是想问，单次write，写入成功的数量一定等于传入的参数吗？

作者: shang2010 时间: 2015-09-12 22:43
不安全，自己封装安全接口，供上层业务简单轻松调用

作者: 呼啦哈拉 时间: 2015-09-14 09:22
路过看看～～

作者: folklore 时间: 2015-09-15 14:01
回复 16# cokeboL

write　一个Socket也是原子的，但不能保证能写入你想要的长度（其实这个理论上对于文件系统也是一样的，只不过基本不会发生）。

所谓原子，是说这个调用本身是原子的，比如两个进程往同一个文件同一个offset写，原子性保证这个文件中的内容或者是进程1重写进程2，
或者进程2改写进程1，而不是最后的结果是进程1和进程2的混杂。

作者: cokeboL 时间: 2015-09-15 14:47
本帖最后由 cokeboL 于 2015-09-15 14:47 编辑

回复 20# folklore

tks，那我循环write就不算画蛇添足了

作者: sculida 时间: 2015-09-15 23:24
回复 6# alwaysR9
我创建了26个线程，每个线程写了4095个A（或B或C）到socket，发现有一行只有4032个'L'
服务端代码如下：

#include<stdio.h>
#include<sys/socket.h>
#include<netinet/in.h>
#include<arpa/inet.h>
#include<pthread.h>
#include<string.h>
#define PORT 9998
#define MAX_PTHREAD 26
#define NUMCHAR 4096
#define LOOP 1000
static int workfd;
void *pthreadDo(void *arg);
int main() {
printf("begin socket\n");
int sockfd=socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in sockaddr={0};
sockaddr.sin_family=AF_INET;
sockaddr.sin_port=htons(PORT);
inet_pton(AF_INET, "127.0.0.1", &(sockaddr.sin_addr));
printf("begin bind\n");
bind(sockfd, (const struct sockaddr*)&sockaddr, sizeof(struct sockaddr_in));
printf("begin listen\n");
listen(sockfd, SOMAXCONN);
printf("begin accept\n");
workfd=accept(sockfd, NULL, NULL);
printf("get workfd %d\n", workfd);
int i=0;
pthread_t ntid;
for (i=0; i<MAX_PTHREAD; i++) {
pthread_create(&ntid, NULL, pthreadDo,&i);
}
while (1) {}
return 0;
}
void *pthreadDo(void *arg) {
char c=*((int*)arg) + 'A';
char str[NUMCHAR]={0};
memset(str, c, NUMCHAR-1);
str[NUMCHAR-1]='\n';
int i=0;
for (i=0; i<LOOP; i++) {
write(workfd, str, NUMCHAR);
}
return NULL;
}

复制代码

客户端直接使用telnet 127.0.0.1 9998>a.txt
然后使用grep找出这一行（只有4032个'L'的这一行）grep -n "[L]\{4032\}" a.txt|grep -v "[L]\{4033\}"

作者: alwaysR9 时间: 2015-09-16 00:02
本帖最后由 alwaysR9 于 2015-09-16 15:17 编辑

本楼作废, 这个函数不是write函数的源码, 实际上write函数通过调用sys_write系统调用来完成的.

我找到的write函数源码:

int write(struct file *filp, void *data, size_t size) {
int rc;
if (!filp) return -EINVAL;
if (!data && size > 0) return -EINVAL;
if (filp->flags == O_RDONLY) return -EACCES;
if (!filp->fs->ops->write) return -ENOSYS;
if (lock_fs(filp->fs, FSOP_WRITE) < 0) return -ETIMEOUT;
if (filp->flags & O_TEXT) {
rc = write_translated(filp, data, size);
} else {
rc = filp->fs->ops->write(filp, data, size, filp->pos);
if (rc > 0) filp->pos += rc;
}
unlock_fs(filp->fs, FSOP_WRITE);
return rc;
}

复制代码

1. write函数加锁了 lock_fs() , 但是我不清楚这个是不是和pthread_mutex一样的功能.
2. 根据filp->flag的不同, write分成两种情况, 我也不太清楚 O_TEXT表示什么文件属性.
3. filp->fs->ops->write(filp, data, size, filp->pos) 这个函数的定义没有找到, 我明天再找找.

write_translated()函数的定义:

static int write_translated(struct file *filp, void *data, size_t size) {
char *buf;
char *p, *q;
int rc;
int lfcnt;
int bytes;
char lfbuf[LFBUFSIZ];
// Translate LF to CR/LF on output
buf = (char *) data;
p = buf;
bytes = lfcnt = 0;
while ((unsigned) (p - buf) < size) {
// Fill the buffer, except maybe last char
q = lfbuf;
while ((unsigned) (q - lfbuf) < LFBUFSIZ - 1 && (unsigned) (p - buf) < size) {
char ch = *p++;
if (ch == LF) {
lfcnt++;
*q++ = CR;
}
*q++ = ch;
}
// Write the buffer and update total
rc = filp->fs->ops->write(filp, lfbuf, q - lfbuf, filp->pos);
if (rc > 0) filp->pos += rc;
if (rc < 0) return rc;
bytes += rc;
if (rc < q - lfbuf) break;
}
return bytes - lfcnt;
}

复制代码

作者: giantchen 时间: 2015-09-16 01:32
回复 23# alwaysR9

POSIX 定义的线程安全（原子性）和你理解的线程安全（原子性）可能不是一回事，特别在发生 short write 的时候。

作者: alwaysR9 时间: 2015-09-16 16:16
本帖最后由 alwaysR9 于 2015-09-16 16:35 编辑

回复 22# sculida

谢谢你的程序,我用你的程序发现问题了, 看来多线程write socket原来也是不安全的

第2486行的字母G只有52个, 而正常每行应该包含4095个可见字符.
26000行输出中, 一共有115行出现问题, 或者超出4095个字符, 或者少于4095个字符.
ps. 你的程序有一处bug : pthread_create(&tid, NULL, proc, &i); 最后一个参数不应该传地址, 传地址使主线程和子线程共享变量 i , 而主线程里 i 在不断变化, 没有加同步保护.

作者: windoze 时间: 2015-09-16 16:51
write是线程安全的，但write的目的地不一定。

作者: 何必抱怨 时间: 2015-09-18 17:35
我这边试了一下实验1似乎没有覆盖

作者: Museless 时间: 2015-09-19 01:12
write执行后，返回前，就会把fd的文件位移给进行向后移，写和移位不是原子操作，非线程安全是因为这里。你可以用pwrite，这个就是位移加写是线程安全的。至于说socket，那是另外的设计了。

作者: alwaysR9 时间: 2015-09-19 11:24

Museless 发表于 2015-09-19 01:12
write执行后，返回前，就会把fd的文件位移给进行向后移，写和移位不是原子操作，非线程安全是因为这里。你可 ...

这是sys_write源码, 内核版本4.2:

SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, size_t, count)
579 {
580 struct fd f = fdget_pos(fd);
581 ssize_t ret = -EBADF;
582
583 if (f.file) {
584 loff_t pos = file_pos_read(f.file); // 获得文件指针的位置
585 ret = vfs_write(f.file, buf, count, &pos); // 从文件指针处开始写文件
586 if (ret >= 0) // 接下来3行,用来更新文件指针
587 file_pos_write(f.file, pos);
588 fdput_pos(f);
589 }
590
591 return ret;
592 }

复制代码

从源码可以看到sys_write函数中, 获得文件指针,写文件,更新文件指针 3个操作是非原子的.

下面是vfs_write函数的源码, 这两个函数值得注意: file_start_write(), file_end_write()

ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos)
524 {
525 ssize_t ret;
526
527 if (!(file->f_mode & FMODE_WRITE))
528 return -EBADF;
529 if (!(file->f_mode & FMODE_CAN_WRITE))
530 return -EINVAL;
531 if (unlikely(!access_ok(VERIFY_READ, buf, count)))
532 return -EFAULT;
533
534 ret = rw_verify_area(WRITE, file, pos, count);
535 if (ret >= 0) {
536 count = ret;
537 file_start_write(file); // 我推测该函数的功能是上锁
538 ret = __vfs_write(file, buf, count, pos); // 从文件指针处开始写文件
539 if (ret > 0) {
540 fsnotify_modify(file);
541 add_wchar(current, ret);
542 }
543 inc_syscw(current);
544 file_end_write(file); // 我推测该函数的功能是释放锁
545 }
546
547 return ret;
548 }

复制代码

最后是file_end_write函数的源码:

static inline void file_end_write(struct file *file)
2476 {
2477 if (!S_ISREG(file_inode(file)->i_mode)) // 判断是否是常规文件, 若不是常规文件, 直接返回
2478 return;
2479 __sb_end_write(file_inode(file)->i_sb, SB_FREEZE_WRITE); // 该函数的实现在下一个函数中
2480 }

复制代码

void __sb_end_write(struct super_block *sb, int level)
1163 {
1164 percpu_up_read(sb->s_writers.rw_sem + level-1); // 该函数实现在下一个函数中
1165 }

复制代码

void percpu_up_read(struct percpu_rw_semaphore *brw)
105 {
106 rwsem_release(&brw->rw_sem.dep_map, 1, _RET_IP_); // 函数名中的 rwsem 表示 "读写信号量", 从函数名可以推测该函数功能是释放锁
107
108 if (likely(update_fast_ctr(brw, -1)))
109 return;
110
111 /* false-positive is possible but harmless */
112 if (atomic_dec_and_test(&brw->slow_read_ctr))
113 wake_up_all(&brw->write_waitq); // **等待队列
114 }

复制代码

我对源码的理解可以用一个流程来说明:

sys_write (file):
1. 获得文件指针的位置
2. 从文件指针处开始写入
2.1 对常规文件加锁, 对非常规文件不加锁
2.2 写文件
2.3 对常规文件解锁, 对非常规文件不解锁
3. 更新文件指针的位置

复制代码

不知道我对源码的解读是否正确, 欢迎大家来拍砖

作者: alwaysR9 时间: 2015-09-19 11:29

何必抱怨发表于 2015-09-18 17:35
我这边试了一下实验1似乎没有覆盖

在我笔记本上每次运行第一个实验都会出现覆盖问题, 可能我的机器运行比较慢导致的, cpu只有2.4GHZ
你的机器什么配置?

作者: irp 时间: 2015-09-20 08:03
write() 线程安全是指，多个线程执行write的时候，write()函数访问的共享变量有同步保护，write的数据本身不在此范畴之内，文件current offset是属于被保护的共享变量。

作者: 何必抱怨 时间: 2015-09-21 09:18
回复 30# alwaysR9
ubuntu 15.04, i5.

作者: alwaysR9 时间: 2015-09-21 09:26

irp 发表于 2015-09-20 08:03
write() 线程安全是指，多个线程执行write的时候，write()函数访问的共享变量有同步保护，write的数据本身不 ...

我感觉文件的offset没有被保护起来, 以下是sys_write的源码:

SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, size_t, count)
579 {
580 struct fd f = fdget_pos(fd);
581 ssize_t ret = -EBADF;
582
583 if (f.file) {
584 loff_t pos = file_pos_read(f.file); // 获得文件offset的位置
585 ret = vfs_write(f.file, buf, count, &pos); // 写文件
586 if (ret >= 0) // 接下来3行, 更新文件offset
587 file_pos_write(f.file, pos);
588 fdput_pos(f);
589 }
590
591 return ret;
592 }

复制代码

很明显获得offset, 写文件, 更新offset 三个操作是非原子的

假设线程A, B写同一个文件, 线程A先获得了offset值, 此时线程A被挂起; 线程B开始执行,B获得offset值, 从offset处写入一段字符, 挂起; 线程A从与B相同的offset处写入字符(线程A将B写入的字符覆盖)....
上面的情况可能会出现, 我做的多线程写文件实验确实出现了这种情况. 所以我认为write函数是非线程安全的, 不知道我的理解对不对 ?

作者: irp 时间: 2015-09-21 10:12
write(), thread safety, and POSIX
[Posted April 18, 2006 by corbet]
Dan Bonachea recently reported a problem. It seems that he has a program where multiple threads are simultaneously writing to the same file descriptor. Occasionally, some of that output disappears - overwritten by other threads. Random loss of output data is not generally considered to be a desirable sort of behavior, and, says Dan, POSIX requires that write() calls be thread-safe. So he would like to see this behavior fixed.
Andrew Morton quickly pointed out the source of this behavior. Consider how write() is currently implemented:

asmlinkage ssize_t sys_write(unsigned int fd, const char __user *buf,
                              size_t count)
{
struct file *file;
ssize_t ret = -EBADF;
int fput_needed;

file = fget_light(fd, &fput_needed);
if (file) {
      loff_t pos = file_pos_read(file);
      ret = vfs_write(file, buf, count, &pos);
      file_pos_write(file, pos);
      fput_light(file, fput_needed);
}

return ret;
}
There is no locking around this function, so it is possible for two (or more) threads performing simultaneous writes to obtain the same value for pos. They will each then write their data to the same file position, and the thread which writes last wins.

Putting some sort of lock (using the inode lock, perhaps) around the entire function would solve the problem and make write() calls thread-safe. The cost of this solution would be high, however: an extra layer of locking when almost no application actually needs it. Serializing write() operations in this way would also rule out simultaneous writes to the same file - a capability which can be useful to some applications.

So some developers have questioned whether this behavior should be fixed at all. It is not something which causes problems for over 99.9% of applications, and, for those which need to be able to perform this sort of simultaneous write, there are other options available. These include user-space locking or using the O_APPEND option. So, it is asked, why add unnecessary overhead to the kernel?

Linus responds that it is a "quality of implementation" issue, and that if there is a low-cost way of getting the system to behave the way users would like, it might as well be done. His proposal is to apply a lock to the file position in particular. His patch adds a f_pos_lock mutex to the file structure and uses that lock to serialize uses of and changes to the file position. This change will have the effect of serializing calls to write(), while leaving other forms (asynchronous I/O, pwrite()) unserialized.

The patch has not drawn a lot of comments, and it has not been merged as of this writing. Its ultimate fate will probably depend on whether avoiding races in this obscure case is truly seen to be worth the additional cost imposed on all users.

不知道这个问题后面怎么处理的。

作者: giantchen 时间: 2015-09-21 11:50
回复 34# irp

manpages 上写了在 3.14 版修复。 github.com/torvalds/linux/commit/9c225f2655e36a470c4f58dbbc99244c5fc7f2d4

作者: giantchen 时间: 2015-09-21 11:53
回复 33# alwaysR9

很明显你没有注意到 fdget_pos 有加锁动作，而 fdput_pos 有解锁动作。因此 write 是原子的。

作者: alwaysR9 时间: 2015-09-21 17:05

giantchen 发表于 2015-09-21 11:53
回复 33# alwaysR9

是的, 这个函数里确实有上锁操作, 源码:

unsigned long __fdget_pos(unsigned int fd)
741 {
742 unsigned long v = __fdget(fd);
743 struct file *file = (struct file *)(v & ~3);
744
745 if (file && (file->f_mode & FMODE_ATOMIC_POS)) {
746 if (file_count(file) > 1) {
747 v |= FDPUT_POS_UNLOCK;
748 mutex_lock(&file->f_pos_lock);
749 }
750 }
751 return v;
752 }

复制代码

上锁的条件是: (file->f_mode & FMODE_ATOMIC_POS), 不清楚普通文件和socket是否满足这个条件, 我也没查到如何获取f_mode的函数

假设write对普通文件上锁了, 那多线程写文件时的覆盖现象是什么原因呢?

作者: irp 时间: 2015-09-21 19:57
本帖最后由 irp 于 2015-09-21 21:15 编辑

自己搭一个kernel调试的环境吧。

作者: giantchen 时间: 2015-09-21 23:57
回复 37# alwaysR9

你的 Linux 版本太低。manpages 上写了在 3.14 版修复。

作者: 计算机科学 时间: 2016-01-04 00:21
write本身是线程安全的，多线程下应该注意文件当前位置错乱的问题。

作者: lxyscls 时间: 2016-01-04 10:16
1、线程安全，多线程写的数据都能够保证写出去，不会写掉、重叠；
2、但是写出去的数据顺序是不是你想要的，就只有鬼知道了

欢迎光临 Chinaunix (http://bbs.chinaunix.net/)