- 论坛徽章:
- 0
|
Unix下针对邮件,搜索,网络硬盘等海量存储的分布式文件系统项目
1) http://zgp.org/linux-elitists/20040101205016.E5998@shaitan.lightconsulting.com.html
2) http://zgp.org/linux-elitists/20040101205016.E5998@shaitan.lightconsulting.com.html
3. Elastic Quota File System (EQFS) Proposal
23 Jun 2004 - 30 Jun 2004 (46 posts) Archive Link: "Elastic Quota File System
(EQFS)"
People: Amit Gud, Olaf Dabrunz, Mark Cooke
Amit Gud said:
Recently I'm into developing an Elastic Quota File System (EQFS). This file
system works on a simple concept ... give it to others if you're not using
it, let others use it, but on the guarantee that you get it back when you
need it!!
Here I'm talking about disk quotas. In any typical network, e.g.
sourceforge, each user is given a fixed amount of quota. 100 Mb in case of
sourceforge. 100 Mb is way over some project requirements and too small for
some projects. EQFS tries to solve this problem by exploiting the users'
usage behavior at runtime. That is the user's quota which he doesn't need
is given to the users who need it, but on 100% assurance that the originl
user can any time reclaim his/her quota.
Before getting into implementation details I want to have public opinion
about this system. All EQFS tries to do is it maximizes the disk space
usage, which otherwise is wasted if the user doesn't really need the
allocated user..on the other hand it helps avoid the starvation of the user
who needs more space. It also helps administrator to get away with the
problem of variable quota needs..as EQFS itself adjusts according to the
user needs.
Mark Watts asked how it would be possible to "guarantee" that the user would
get the space back when they wanted it. Amit expanded:
Ok, this is what I propose:
Lets say there are just 2 users with 100 megs of individual quota, user A
is using 20 megs and user B is running out of his quota. Now what B could
do is delete some files himself and make some free space for storing other
files. Now what I say is instead of deleting the files, he declares those
files as elastic.
Now, moment he makes that files elastic, that much amount of space is added
to his quota. Here Mark Cooke's equation applies with some modifications: N
no. of users, Qi allocated quota of ith user Ui individual disk usage of
ith user ( should be <= allocated quota of ith user ), D disk threshold;
thats the amount of disk space admin wants to allow the users to use
(should be >;= sum of all users' allocated quota, i.e. summation Qi ; for i
= 0 to N - 1).
Total usage of all the users (here A & B) should be at _anytime_ less than
D. i.e. summation Ui <= D; for i = 0 to N - 1.
The point to note here is that we are not bothering how much quota has been
allocated to an individual user by the admin, but we are more interested in
the usage pattern followed by the users. E.g. if user B wants additional
space of say 25 megs, he picks up 25 megs of his files and 'marks' them
elastic. Now his quota is increased to 125 megs and he can now add more 25
megs of files; at the same time allocated quota for user A is left
unaffected. Applying the above equation total usage now is A: 20 megs, B:
125 megs, now total 145 <= D, say 200 megs. Thus this should be ok for the
system, since the usage is within bounds.
Now what happens if Ui >; D? This can happen when user A tries to recliam
his space. i.e. if user A adds say more 70 megs of files, so the total
usage is now - A: 90 megs, B: 125 megs; 215 ! <= D. The moment the total
usage crosses the value, 'action' will be taken on the elastic files. Here
elastic files are of user B so only those will be affected and users A's
data will be untouched, so in a way this will be completely transparent to
user A. What action should be taken can be specified by the user while
making the files elastic. He can either opt to delete the file, compress it
or move it to some place (backup) where he know he has write access. The
corresponding action will be taken until the threshold is met.
Will this work?? We are relying on the 'free' space ( i.e. D - Ui ) for the
users to benefit. The chances of having a greater value for D - Ui
increases with the increase in the number of users, i.e. N. Here we are
talking about 2 users but think of 10000+ users where all the users will
probably never use up _all_ the allocated disk space. This user behavior
can be well exploited.
EQFS can be best fitted in the mail servers. Here e.g. I make whole
linux-kernel mailing list elastic. As long as Ui <= D I get to keep all the
messages, whenever Ui >; D, messages with latest dates will be 'acted' upon.
For variable quota needs, admin can allocate different quotas for different
users, but this can get tiresome when N is large. With EQFS, he can
allocate fixed quota for each user ( old and new ) , set up a value for D
and relax. The users will automatically get the quota they need. One may
ask that this can be done by just setting up value of D, checking it
against summation Ui and not allocating individual quotas at all. But when
summation Ui crosses D value, whose file to act on? Moreover with both
individual quotas and D, we give users 'controlled' flexibility just like
elastic - it can be stretched but not beyond a certain range.
What happens when an user tries to eat up all the free ( D - Ui ) space?
This answer is implementation dependent because you need to make a
decision: should an user be allowed to make a file elastic when Ui == D . I
think by saying 'yes' we eliminate some users' mischief of eating up all
free space.
Olaf Dabrunz replied:
+ having files disappear at the discretion of the filesystem seems to be
bad behaviour: either I need this file, then I do not want it to just
disappear, or I do not need it, and then I can delete it myself.
Since my idea of which files I need and which I do not need changes
over time, I believe it is far better that I can control which files I
need and which I do not need whenever other constraints (e.g. quota
filled up) make this decision necessary. Also, then I can opt to try to
convince someone to increase my quota.
+ moving the file to some other place (backup) does not seem to be a
viable option:
o If the backup media is always accessible, then why can't the user
store the "elastic" files there immediately?
->; advantages:
# the user knows where his file is
# applications that remember the path to a file will be able to
access it
o If the backup media will only be accessible after manually
inserting it into some drive, this amounts to sending an E-Mail to
the backup admin and then pass a list of backup files to the backup
software.
But now getting the file back involves a considerable amount of
manual and administrative work. And it involves bugging the backup
admin, who now becomes the bottleneck of your EQFS.
So this narrows down to the effective handling of backup procedures and the
effective administration of fixed quotas and centralization of data.
If you have many users it is also likely that there are more people
interested in big data-files. So you need to help these people organize
themselves e.g. by helping them to create mailing-list, web-pages or
letting them install servers that makes the data centrally available with
some interface that they can use to select parts of the data.
I would rather suggest that if the file does not fit within a given quota,
the user should apply for more quota and give reasons for that.
I believe that flexible or "elastic" allocation of ressources is a good
idea in general, but it only works if you have cheap and easy ways to
control both allocation and deallocation. So in the case of CBQ in networks
this works, since bandwidth can easily and quickly be allocated and
deallocated.
But for filesystem space this requires something like a "slower (= less
expensive), bigger, always accessible" third level of storage in the "RAM,
disk, ..." hierarchy. And then you would need an easy or even transparent
way to access files on this third level storage. And you need to make sure
that, although you obviously *need* the data for something, you still can
afford to increase retrieval times by several orders of magnitude at the
discretion of the filesystem.
But usually all this can be done by scripts as well.
Still, there is a scenario and a combination of features for such a
filesystem that IMHO would make it useful:
+ Provide allocation of overquota as you described it.
+ Let the filesystem move (parts of) the "elastic" files to some
third-level backing-store on an as-needed basis. This provides you with
a not-so-cheap (but cheaper than manual handling) resource management
facility.
Now you can use the third-level storage as a backing store for hard-drive
space, analoguous to what swap-space provides for RAM. And you can "swap
in" parts of files from there and cache them on the hard drive. So
"elastic" files are actually files that are "swappable" to backing store.
This assumes that the "elastic" files meet the requirements for a "working
set" in a similar fashion as for RAM-based data. I.e. the swap operations
need only be invoked relatively seldom.
If this is not the case, your site/customer needs to consider buying more
hard drive space (and maybe also RAM).
The tradeoff for the user now is:
+ do not have the big file(s) OR
+ have them and be able to use them in a random-access fashion from any
application, but maybe only with a (quite) slow access time, but
without additional administrative/manual hassle
Maybe this is a good tradeoff for a significant amount of users. Maybe
there are sites/customers that have the required backing store (or would
consider buying into this). I do not know. Find a sponsor, do some field
research and give it a try. |
|