- 论坛徽章:
- 0
|
回复 2楼 nntp 的帖子
谢谢NNTP,这里贴一些其他人给的意见,供大家参考,从你们的建议中,我学到了很多。
Not sure exactly what a PBS is but form what you describe it would
run on top of an openmosix cluster. OpenMosix creates what is called a
Single System Image. This wya your jobs do not have to know anything
about a cluster, they do not have to be submitted to a scheduler, you
just run and forget. The cluster will automatically shift the load
around to get best cpu per job usage. A scheduler on the other hand
requires either the jobs to understnad how to do parrallel work or for
the submitter to pre-split the job to make use of the cluster. That is
a bit of a simple explantation but Ithin it should give you a good
idea of the difference.
================================================================
While having similar aims, the way batch systems like the grid engine
achieve them is quite different from the openMosix approach. Roughly
speaking, batch systems start jobs on free nodes by "ssh-ing" to the
given node (or do something a bit more clever but still somehow
equivalent). This is also the reason why they have bring their own job
management tools (e.g. qstat, qsub etc.) -- jobs just have to be
processes on your local node. This does also limit the kind of jobs
you can use with batch systems, as they're usually unable to execute
interactive or X applications.
Contrary to that, openMosix engages in a much lower level. It is a
kernel patch that allows processes running locally on a node to be
transmitted to another node transparently during runtime. The last
part is quite important, as it has some interesting consequences:
* If there is a load inequality among the cluster nodes, it can be
equalized much smoother by simply migrating some jobs to the idling
machines within seconds. Batch systems can only equalize load by
starting new jobs which isn't as elegant and, more important, will
fail if the queue is empty.
* You don't have to use special job management tools, as openMosix
can migrated nearly every process on your node (ok, there are some
limitations: no multithreading, no shared memory, but for oM's use
case this is a weak limitation).
* oM does also work with interactive and X applications. For instance
if you have a graphical fractal generator which is creating high
load on your login machine, oM could easily migrate it to an idling
machine without you noticing it.
So, oM is, despite some limitations, way more elegant. Give it a try.
================================================================
By using the openMosix kernel along side other clustering apps, a more
generalised beowulf style cluster can be built to cater for all types of
use.
I have used PBS and found it tricky to set up jobs to run quickly across
nodes but that would NOT mean that you cannot use PBS along side openMosix.
If a job you schedule for a particular node is openMosix firiendly, then
openMosix could cause that particular job to migrate on to a faster free
node and if your particular job spawns sub processes that are openMosix
friendly, then each one of those processes could infact migrate in order to
get 100% CPU usage from all the openMosix nodes in your network.
ie
PBS spawns 10 openMosix friendly processes for 1 node on the network,
openMosix would migrate each of those processes to a different node.
If one node is then used for something else, then openMosix could migrate
the process again to find the maximum CPU use for that process.
Without openMosix, PBS would only allow you to set the same 10 processes to
run across 10 nodes and stay where they run.
Quote from Andreas Sch鋐er:
So, oM is, despite some limitations, way more elegant. Give it a try.
*Yes, and use it along side PBS and other clustering apps.*
I also use DSH from within cron...
My cron scheduler runs on a designated master node, jobs are set using
crontab -e from any node and DSH is used to run them.
eg:
0 * * * * dsh -c -m 192.168.1.20 -m 192.168.1.21 /home/mydir/myscript.sh
Would cause the script 'myscript.sh' to run hourly on nodes 192.168.1.20 and
21 concurrently.
(please note that /home and /var/spool/cron/ are available from NFS )
If myscript.sh contains oM friendly processes, then those too will migrate
across the network to other nodes. |
|