- 论坛徽章:
- 0
|
2. Usage Examples and Syntax
============================
2.1 Basic Usage
---------------
Creating, modifying, using the cgroups can be done through the cgroup
virtual filesystem.
To mount a cgroup hierarchy with all available subsystems, type:
# mount -t cgroup xxx /dev/cgroup
The "xxx" is not interpreted by the cgroup code, but will appear in
/proc/mounts so may be any useful identifying string that you like.
Note: Some subsystems do not work without some user input first. For instance,
if cpusets are enabled the user will have to populate the cpus and mems files
for each new cgroup created before that group can be used.
To mount a cgroup hierarchy with just the cpuset and memory
subsystems, type:
# mount -t cgroup -o cpuset,memory hier1 /dev/cgroup
To change the set of subsystems bound to a mounted hierarchy, just
remount with different options:
# mount -o remount,cpuset,blkio hier1 /dev/cgroup
Now memory is removed from the hierarchy and blkio is added.
Note this will add blkio to the hierarchy but won't remove memory or
cpuset, because the new options are appended to the old ones:
# mount -o remount,blkio /dev/cgroup
To Specify a hierarchy's release_agent:
# mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
xxx /dev/cgroup
Note that specifying 'release_agent' more than once will return failure.
Note that changing the set of subsystems is currently only supported
when the hierarchy consists of a single (root) cgroup. Supporting
the ability to arbitrarily bind/unbind subsystems from an existing
cgroup hierarchy is intended to be implemented in the future.
Then under /dev/cgroup you can find a tree that corresponds to the
tree of the cgroups in the system. For instance, /dev/cgroup
is the cgroup that holds the whole system.
If you want to change the value of release_agent:
# echo "/sbin/new_release_agent" > /dev/cgroup/release_agent
It can also be changed via remount.
If you want to create a new cgroup under /dev/cgroup:
# cd /dev/cgroup
# mkdir my_cgroup
Now you want to do something with this cgroup.
# cd my_cgroup
In this directory you can find several files:
# ls
cgroup.procs notify_on_release tasks
(plus whatever files added by the attached subsystems)
Now attach your shell to this cgroup:
# /bin/echo $$ > tasks
You can also create cgroups inside your cgroup by using mkdir in this
directory.
# mkdir my_sub_cs
To remove a cgroup, just use rmdir:
# rmdir my_sub_cs
This will fail if the cgroup is in use (has cgroups inside, or
has processes attached, or is held alive by other subsystem-specific
reference).
2.2 Attaching processes
-----------------------
# /bin/echo PID > tasks
Note that it is PID, not PIDs. You can only attach ONE task at a time.
If you have several tasks to attach, you have to do it one after another:
# /bin/echo PID1 > tasks
# /bin/echo PID2 > tasks
...
# /bin/echo PIDn > tasks
You can attach the current shell task by echoing 0:
# echo 0 > tasks
Note: Since every task is always a member of exactly one cgroup in each
mounted hierarchy, to remove a task from its current cgroup you must
move it into a new cgroup (possibly the root cgroup) by writing to the
new cgroup's tasks file.
Note: If the ns cgroup is active, moving a process to another cgroup can
fail.
2.3 Mounting hierarchies by name
--------------------------------
Passing the name=<x> option when mounting a cgroups hierarchy
associates the given name with the hierarchy. This can be used when
mounting a pre-existing hierarchy, in order to refer to it by name
rather than by its set of active subsystems. Each hierarchy is either
nameless, or has a unique name.
The name should match [\w.-]+
When passing a name=<x> option for a new hierarchy, you need to
specify subsystems manually; the legacy behaviour of mounting all
subsystems when none are explicitly specified is not supported when
you give a subsystem a name.
The name of the subsystem appears as part of the hierarchy description
in /proc/mounts and /proc/<pid>/cgroups.
2.4 Notification API
--------------------
There is mechanism which allows to get notifications about changing
status of a cgroup.
To register new notification handler you need:
- create a file descriptor for event notification using eventfd(2);
- open a control file to be monitored (e.g. memory.usage_in_bytes);
- write "<event_fd> <control_fd> <args>" to cgroup.event_control.
Interpretation of args is defined by control file implementation;
eventfd will be woken up by control file implementation or when the
cgroup is removed.
To unregister notification handler just close eventfd.
NOTE: Support of notifications should be implemented for the control
file. See documentation for the subsystem.
3. Kernel API
=============
3.1 Overview
------------
Each kernel subsystem that wants to hook into the generic cgroup
system needs to create a cgroup_subsys object. This contains
various methods, which are callbacks from the cgroup system, along
with a subsystem id which will be assigned by the cgroup system.
Other fields in the cgroup_subsys object include:
- subsys_id: a unique array index for the subsystem, indicating which
entry in cgroup->subsys[] this subsystem should be managing.
- name: should be initialized to a unique subsystem name. Should be
no longer than MAX_CGROUP_TYPE_NAMELEN.
- early_init: indicate if the subsystem needs early initialization
at system boot.
Each cgroup object created by the system has an array of pointers,
indexed by subsystem id; this pointer is entirely managed by the
subsystem; the generic cgroup code will never touch this pointer.
3.2 Synchronization
-------------------
There is a global mutex, cgroup_mutex, used by the cgroup
system. This should be taken by anything that wants to modify a
cgroup. It may also be taken to prevent cgroups from being
modified, but more specific locks may be more appropriate in that
situation.
See kernel/cgroup.c for more details.
Subsystems can take/release the cgroup_mutex via the functions
cgroup_lock()/cgroup_unlock().
Accessing a task's cgroup pointer may be done in the following ways:
- while holding cgroup_mutex
- while holding the task's alloc_lock (via task_lock())
- inside an rcu_read_lock() section via rcu_dereference()
3.3 Subsystem API
-----------------
Each subsystem should:
- add an entry in linux/cgroup_subsys.h
- define a cgroup_subsys object called <name>_subsys
If a subsystem can be compiled as a module, it should also have in its
module initcall a call to cgroup_load_subsys(), and in its exitcall a
call to cgroup_unload_subsys(). It should also set its_subsys.module =
THIS_MODULE in its .c file.
Each subsystem may export the following methods. The only mandatory
methods are create/destroy. Any others that are null are presumed to
be successful no-ops.
struct cgroup_subsys_state *create(struct cgroup_subsys *ss,
struct cgroup *cgrp)
(cgroup_mutex held by caller)
Called to create a subsystem state object for a cgroup. The
subsystem should allocate its subsystem state object for the passed
cgroup, returning a pointer to the new object on success or a
negative error code. On success, the subsystem pointer should point to
a structure of type cgroup_subsys_state (typically embedded in a
larger subsystem-specific object), which will be initialized by the
cgroup system. Note that this will be called at initialization to
create the root subsystem state for this subsystem; this case can be
identified by the passed cgroup object having a NULL parent (since
it's the root of the hierarchy) and may be an appropriate place for
initialization code.
void destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
(cgroup_mutex held by caller)
The cgroup system is about to destroy the passed cgroup; the subsystem
should do any necessary cleanup and free its subsystem state
object. By the time this method is called, the cgroup has already been
unlinked from the file system and from the child list of its parent;
cgroup->parent is still valid. (Note - can also be called for a
newly-created cgroup if an error occurs after this subsystem's
create() method has been called for the new cgroup).
int pre_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp);
Called before checking the reference count on each subsystem. This may
be useful for subsystems which have some extra references even if
there are not tasks in the cgroup. If pre_destroy() returns error code,
rmdir() will fail with it. From this behavior, pre_destroy() can be
called multiple times against a cgroup.
int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct task_struct *task, bool threadgroup)
(cgroup_mutex held by caller)
Called prior to moving a task into a cgroup; if the subsystem
returns an error, this will abort the attach operation. If a NULL
task is passed, then a successful result indicates that *any*
unspecified task can be moved into the cgroup. Note that this isn't
called on a fork. If this method returns 0 (success) then this should
remain valid while the caller holds cgroup_mutex and it is ensured that either
attach() or cancel_attach() will be called in future. If threadgroup is
true, then a successful result indicates that all threads in the given
thread's threadgroup can be moved together.
void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct task_struct *task, bool threadgroup)
(cgroup_mutex held by caller)
Called when a task attach operation has failed after can_attach() has succeeded.
A subsystem whose can_attach() has some side-effects should provide this
function, so that the subsystem can implement a rollback. If not, not necessary.
This will be called only about subsystems whose can_attach() operation have
succeeded.
void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup *old_cgrp, struct task_struct *task,
bool threadgroup)
(cgroup_mutex held by caller)
Called after the task has been attached to the cgroup, to allow any
post-attachment activity that requires memory allocations or blocking.
If threadgroup is true, the subsystem should take care of all threads
in the specified thread's threadgroup. Currently does not support any
subsystem that might need the old_cgrp for every thread in the group.
void fork(struct cgroup_subsy *ss, struct task_struct *task)
Called when a task is forked into a cgroup.
void exit(struct cgroup_subsys *ss, struct task_struct *task)
Called during task exit.
int populate(struct cgroup_subsys *ss, struct cgroup *cgrp)
(cgroup_mutex held by caller)
Called after creation of a cgroup to allow a subsystem to populate
the cgroup directory with file entries. The subsystem should make
calls to cgroup_add_file() with objects of type cftype (see
include/linux/cgroup.h for details). Note that although this
method can return an error code, the error code is currently not
always handled well.
void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp)
(cgroup_mutex held by caller)
Called at the end of cgroup_clone() to do any parameter
initialization which might be required before a task could attach. For
example in cpusets, no task may attach before 'cpus' and 'mems' are set
up.
void bind(struct cgroup_subsys *ss, struct cgroup *root)
(cgroup_mutex and ss->hierarchy_mutex held by caller)
Called when a cgroup subsystem is rebound to a different hierarchy
and root cgroup. Currently this will only involve movement between
the default hierarchy (which never has sub-cgroups) and a hierarchy
that is being created/destroyed (and hence has no sub-cgroups).
4. Questions
============
Q: what's up with this '/bin/echo' ?
A: bash's builtin 'echo' command does not check calls to write() against
errors. If you use it in the cgroup file system, you won't be
able to tell whether a command succeeded or failed.
Q: When I attach processes, only the first of the line gets really attached !
A: We can only return one error code per call to write(). So you should also
put only ONE pid. |
|