论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2006-09-17 23:48 |只看该作者 |倒序浏览

原来的贴

http://bbs.chinaunix.net/viewthr ... &extra=page%3D2

讨论的问题是许多进程同时写,会不会交叉,需要不需要LOCK的问题.

根据质疑我又研究了.

初步结果:

1) 如果多个写同时WRITE pipe, 如果每次写的大小是小于4096(一页), LINUX确保写是原子的, 数据绝对不会交叉,也就是不需要LOCK. 如果数据再大, 需要LOCK. 这个原子写的最大值是PIPE_BUF所定义. 该结论是可靠的.

2) 如果写文件, 每次写小于4096, 也是原子的, 数据不交叉, 也不需要LOCK. 结论用程序测试是对的,但没有得到可靠的理论根据. 数据大于4096, 不是原子的. 对于写LOG file, 一行一行的写, 因为每行<<4096,所以程序的结果是原子的.

原来贴子中的我的观点是所有的都是原子的(无论数据大小),是错误的.

这个问题还要继续讨论, 一直到结论完全正确圆满为止.

[ 本帖最后由思一克于 2006-9-18 08:38 编辑 ]

文库|博客

harly

稍有积蓄

论坛徽章:: 0

2楼 [报告]

发表于 2006-09-18 00:13 |只看该作者

1) 如果多个写同时WRITE pipe, 如果每次写的大小是4096(一页), LINUX确保写是原子的, 数据绝对不会交叉,也就是不需要LOCK. 如果数据再大, 需要LOCK. 这个原子写的最大值是PIPE_BUF所定义. 该结论是可靠的.

2) 如果写文件, 每次写小于4096, 也是原子的, 数据不交叉, 也不需要LOCK. 结论用程序测试是对的,但没有得到可靠的理论根据. 数据大于4096, 不是原子的. 对于写LOG file, 一行一行的写, 因为每行<<4096,所以程序的结果是原子的.

对第二点心存怀疑。我想如果不被信号中断等应该会写完的，即使被中断，那么如果不重启的化应该这次的写失败了，还不至于继续交叉写吧，我也没有理论根据，继续关注这个问题。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

圆点坐标

白手起家

论坛徽章:: 0

3楼 [报告]

发表于 2006-09-18 09:08 |只看该作者

要从文件系统的角度去分析这个问题比较好，之所以是4096，那是因为在linux中一个page的大小是4096，如果超过4096，在内核会产生多个page，但是当page产生时，会有一些比如dirty啊，lock等标记，如果lock标记没有去掉，这个paged的东西时不会被修改的。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

sunlan

版主

论坛徽章:: 0

4楼 [报告]

发表于 2006-09-18 20:19 |只看该作者

为验证多进程并发写可能出现的问题，我做了个测试。测试的环境是SCO OSR5.0.6，单CPU

测试所用代码如下：

p.c
#include <stdio.h>
main( )
{
FILE *fp;
int i;
fp=fopen( "aaa.txt", "a" );
for( i=0; i<50000; i++ )
{
fprintf( fp, "pid=[%d] aaaaaaaaaaaaaaaaaaaaaaaaaa\n", getpid() );
}
fclose( fp );
}

复制代码

cc -o p p.c
程序里没有直接使用write，而是使用了fprintf，这是因为在写日志时更多的使用的是带格式的数据，很少有直接使用write的。

测试用的脚本p.sh
p&
p&
p&
p&
p&

使用5个进程同时对aaa.txt进行写操作。

对于aaa.txt中的大部分记录，都是下面的格式：
pid=[1900]  aaaaaaaaaaaaaaaaaaaaaaaaaa

但也很少的几条记录被写成了下面这样：
pid=[1900]  aaaaaaaaaaaaaaaaaaaaaaaaaa
pid=[1900pid=[1901]  aaaaaaaaaaaaaaaaaaaaaaaaaa
pid=[1901]  aaaaaaaaaaaaaaaaaaaaaaaaaa

感觉是在进程切换时发生的。

由于是在单CPU环境下做的测试，因此并不能真实模拟多进程并发时的情况。如果谁手头有多CPU的机器，可以试一下看会发生什么样的情况。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

coldwarm

稍有积蓄

论坛徽章:: 0

5楼 [报告]

发表于 2006-09-18 21:09 |只看该作者

不应当使用fprintf,使用fprintf根本验证不出来。它的I/O是默认是带缓冲的，跟write的行为是不同的。
如果使用setbuf关闭stdout的缓冲区，fprintf应当被转换成一系列的write调用，每次写一个字节。

read和write针对不同的操作对象在某些情况下不具备原子性（比如超过PIPE_BUF,nfs操作等）。

我觉得关键是在于在用户缓冲区和系统的缓冲区之间是否存在缓冲行为。如果存在缓冲，那么它必然不是原子的。

这是因为在写日志时更多的使用的是带格式的数据，很少有直接使用write的
snprintf格式化，用write写也一样。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

思一克

白手起家

论坛徽章:: 0

6楼 [报告]

发表于 2006-09-19 08:38 |只看该作者

用sprintf到一个buffer中，

然后一起将buffer write 就是原子了。buffer长度控制在4096内（对于LOG的一行，足够）。

在测试看

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

linuxiang

稍有积蓄

论坛徽章:: 0

7楼 [报告]

发表于 2006-09-19 14:09 |只看该作者

一次小小的提问，时隔很长时间，没想到思一克版主这么执着，小弟真是很佩服。
对于第二个问题，我认为跟文件系统有关系，写小于等于一个page时是原子的（一般linux的page是4096），写大于一个page时，可能就不是原子的。
具体细节我也不清楚，希望高手深入解释一下。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

sunlan

版主

论坛徽章:: 0

8楼 [报告]

发表于 2006-09-19 16:38 |只看该作者

下面是The Open Group Base Specifications Issue 6里对write的部分说明

If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined.

After a write() to a regular file has successfully returned:

Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.

Any subsequent successful write() to the same byte position in the file shall overwrite that file data.

Write requests to a pipe or FIFO shall be handled in the same way as a regular file with the following exceptions:

There is no file offset associated with a pipe, hence each write request shall append to the end of the pipe.

Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. Writes of greater than {PIPE_BUF} bytes may have data interleaved, on arbitrary boundaries, with writes by other processes, whether or not the O_NONBLOCK flag of the file status flags is set.

If the O_NONBLOCK flag is clear, a write request may cause the thread to block, but on normal completion it shall return nbyte.

If the O_NONBLOCK flag is set, write() requests shall be handled differently, in the following ways:

The write() function shall not block the thread.

A write request for {PIPE_BUF} or fewer bytes shall have the following effect: if there is sufficient space available in the pipe, write() shall transfer all the data and return the number of bytes requested. Otherwise, write() shall transfer no data and return -1 with errno set to [EAGAIN].

A write request for more than {PIPE_BUF} bytes shall cause one of the following:

When at least one byte can be written, transfer what it can and return the number of bytes written. When all data previously written to the pipe is read, it shall transfer at least {PIPE_BUF} bytes.

When no data can be written, transfer no data, and return -1 with errno set to [EAGAIN].

When attempting to write to a file descriptor (other than a pipe or FIFO) that supports non-blocking writes and cannot accept the data immediately:

If the O_NONBLOCK flag is clear, write() shall block the calling thread until the data can be accepted.

If the O_NONBLOCK flag is set, write() shall not block the thread. If some data can be written without blocking the thread, write() shall write what it can and return the number of bytes written. Otherwise, it shall return -1 and set errno to [EAGAIN].

Upon successful completion, where nbyte is greater than 0, write() shall mark for update the st_ctime and st_mtime fields of the file, and if the file is a regular file, the S_ISUID and S_ISGID bits of the file mode may be cleared.

For regular files, no data transfer shall occur past the offset maximum established in the open file description associated with fildes.

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

redac

稍有积蓄

论坛徽章:: 0

9楼 [报告]

发表于 2007-02-25 16:05 |只看该作者

继续关注这个问题!

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

Edengundam

家境小康

论坛徽章:: 0

10楼 [报告]

发表于 2007-02-25 19:53 |只看该作者

原帖由 思一克 于 2006-9-17 23:48 发表
原来的贴

http://bbs.chinaunix.net/viewthr ... &extra=page%3D2

讨论的问题是许多进程同时写,会不会交叉,需要不需要LOCK的问题.

根据质疑我又研究了.

初步结果:

1) ...

sunlan给出的描述和版主的第一个结论完全相符合.

但是这个描述中对普通文件没有任何的明确说明, 只有:

Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.

复制代码

假设进程A准备了一个100MB的buffer调用一次write. 进程B在A调用write中, 尝试去write这个文件. 都是用APPEND. 那么如果进程B中的 write 能成功返回, 就需要FLOCK了. 假设B只写一个BYTE吧.

write系统调用写文件时候, 会调用内核算法取得一个内核管理的buffer. 老的UNIX这个叫getblk.
调用getblk函数貌似是不能被中断的. 也就是说, write超过一个buffer大小的数据, 这个进程就会调用多次getblk.

我想为了公平的原则, 那么write系统调用是允许被"抢占"的, 也就是被中断的. 但是如果write已经成功调用进入了getblk, 并且封锁了低于这个中断的所有中断, 那么这时候就没有进程调度了. 如果没有进入这类的函数, 那么write是可以被其他的进程所"抢占".

因此, 会出现写大于1个buffer的数据时候导致问题的出现.
如果write不能被抢占, 那么一个恶意用户可以申请一个大的buffer, 不断利用write来"阻止"其他用户的正常使用.

这个问题, 应该去看下write系统调用的实现...懒得找了, 版主不妨去看下. 手头没有代码...

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

12 3 / 3 页下一页

返回列表

Chinaunix › 论坛 › 程序设计 › C/C++ › 再论read,write的原子性和多进程是否需要FLOCK问题.对我 ...

再论read,write的原子性和多进程是否需要FLOCK问题.对我原贴中观点做更正 [复制链接]