- 论坛徽章:
- 0
|
Unix下针对邮件,搜索,网络硬盘等海量存储的分布式文件系统项目
in Coda Lists
On Mon, Aug 15, 2005 at 01:31:43PM -0400, Kris Maglione wrote:
>; There's nothing stopping Coda (in theory. I haven't seen the code
>; relating to this) from implementing both partial and full file caching.
>; Whether it be a knob between two modes of caching, a switch to require
>; the fetching of all blocks (with unneeded ones at a lower priority, put
>; off until essential data is retrieved), or just a program like hoard
>; deciding what files need to be cached fully, and doing so. I'm not
>; saying that this should or will be implemented, but it is possible, in
>; theory. For Coda and AFS.
Actually there are many reasons to not have block level caching in Coda.
- VM deadlocks
Because we have a userspace cache manager we could get into the
situation where we are told to write out dirty data, but this causes
us to request one or more memory pages from the kernel, either
because we allocate memory, or are simply paging in some of the
application/library code. The kernel might then decide to give us
pages that would require write-back of more dirty state to the
userspace daemon. We would have to push venus into the kernel, which
is what AFS did, but they aren't dealing with a lot of the same
complexities like replication and reintegration.
- Code complexity
It is already a hard enough problem to do optimistic replication and
reintegration with whole files. The last thing I need right now is
to add additional complexity so we suddenly have to reason about
situations where we only happen to have parts of a locally modified
file, which might already have been partially reintegration, but
then overwritten on the server by another client and how to commit,
revert or merge these local changes in the global replica(s). As
well as effectivly maintaining the required data structures. The
current RVM limitations are on number of file objects and not
dependent on file size. You can cache 100 zero length files with the
same overhead as far as the client in concerned as 100 files that
are 1GB in size.
- Network performance
It is more efficient to fetch a large file at once compared to
requesting individual blocks. Available network bandwidth keeps
increasing, but latency is bounded by the laws of physics. So the
60ms roundtrip from coast-to-coast will remain. So requesting 1000
individual 4KB blocks will always cost at least 60 seconds, while
fetching a the same 4MB as a single file will become cheaper over
time.
- Local performance
Handling upcalls is quite expensive, there are at least 2 context
switches and possibly some swapping/paging involved to get the
request up to the cache manager and the response back to the
application. Doing this on individual read and write operations
would make the system a lot less responsive.
- Consistency model
It is really easy to explain Coda's consistency model wrt other
clients. You fetch a copy of the file when it is opened, and it is
written back to the servers when it is closed (and it was modified).
Now try to do the same if the client uses block-level caching. The
picture quickly becomes very blurry, and Transarc AFS actually had
(has?) a serious bug that leads to unexpected data loss in this area
if people were assuming that it actually still provides AFS semantics.
Also once a system provides block-level access, people start to
expect the file system provides something close to UNIX semantics,
which is really not a very usable model for any distributed
filesystem.
Jan |
|