解释
sys_shmget()
The entire call to sys_shmget() is protected by the global shared memory semaphore.
In the case where a new shared memory segment must be created, the newseg() function is called to create and initialize a new shared memory segment. The ID of the new segment is returned to the caller.
In the case where a key value is provided for an existing shared memory segment, the corresponding index in the shared memory descriptors array is looked up, and the parameters and permissions of the caller are verified before returning the shared memory segment ID. The look up operation and verification are performed while the global shared memory spinlock is held.
sys_shmctl()
IPC_INFO
A temporary shminfo64 buffer is loaded with system-wide shared memory parameters and is copied out to user space for access by the calling application.
SHM_INFO
The global shared memory semaphore and the global shared memory spinlock are held while gathering system-wide statistical information for shared memory. The shm_get_stat() function is called to calculate both the number of shared memory pages that are resident in memory and the number of shared memory pages that are swapped out. Other statistics include the total number of shared memory pages and the number of shared memory segments in use. The counts of swap_attempts and swap_successes are hard-coded to zero. These statistics are stored in a temporary shm_info buffer and copied out to user space for the calling application.
SHM_STAT, IPC_STAT
For SHM_STAT and IPC_STATA, a temporary buffer of type struct shmid64_ds is initialized, and the global shared memory spinlock is locked.
For the SHM_STAT case, the shared memory segment ID parameter is expected to be a straight index (i.e. 0 to n where n is the number of shared memory IDs in the system). After validating the index, ipc_buildid() is called (via shm_buildid()) to convert the index into a shared memory ID. In the passing case of SHM_STAT, the shared memory ID will be the return value. Note that this is an undocumented feature, but is maintained for the ipcs( program.
For the IPC_STAT case, the shared memory segment ID parameter is expected to be an ID that was generated by a call to shmget(). The ID is validated before proceeding. In the passing case of IPC_STAT, 0 will be the return value.
For both SHM_STAT and IPC_STAT, the access permissions of the caller are verified. The desired statistics are loaded into the temporary buffer and then copied out to the calling application.
SHM_LOCK, SHM_UNLOCK
After validating access permissions, the global shared memory spinlock is locked, and the shared memory segment ID is validated. For both SHM_LOCK and SHM_UNLOCK, shmem_lock() is called to perform the function. The parameters for shmem_lock() identify the function to be performed.
IPC_RMID
During IPC_RMID the global shared memory semaphore and the global shared memory spinlock are held throughout this function. The Shared Memory ID is validated, and then if there are no current attachments, shm_destroy() is called to destroy the shared memory segment. Otherwise, the SHM_DEST flag is set to mark it for destruction, and the IPC_PRIVATE flag is set to prevent other processes from being able to reference the shared memory ID.
IPC_SET
After validating the shared memory segment ID and the user access permissions, the uid, gid, and mode flags of the shared memory segment are updated with the user data. The shm_ctime field is also updated. These changes are made while holding the global shared memory semaphore and the global share memory spinlock.
sys_shmat()
sys_shmat() takes as parameters, a shared memory segment ID, an address at which the shared memory segment should be attached(shmaddr), and flags which will be described below.
If shmaddr is non-zero, and the SHM_RND flag is specified, then shmaddr is rounded down to a multiple of SHMLBA. If shmaddr is not a multiple of SHMLBA and SHM_RND is not specified, then EINVAL is returned.
The access permissions of the caller are validated and the shm_nattch field for the shared memory segment is incremented. Note that this increment guarantees that the attachment count is non-zero and prevents the shared memory segment from being destroyed during the process of attaching to the segment. These operations are performed while holding the global shared memory spinlock.
The do_mmap() function is called to create a virtual memory mapping to the shared memory segment pages. This is done while holding the mmap_sem semaphore of the current task. The MAP_SHARED flag is passed to do_mmap(). If an address was provided by the caller, then the MAP_FIXED flag is also passed to do_mmap(). Otherwise, do_mmap() will select the virtual address at which to map the shared memory segment.
NOTE shm_inc() will be invoked within the do_mmap() function call via the shm_file_operations structure. This function is called to set the PID, to set the current time, and to increment the number of attachments to this shared memory segment.
After the call to do_mmap(), the global shared memory semaphore and the global shared memory spinlock are both obtained. The attachment count is then decremented. The the net change to the attachment count is 1 for a call to shmat() because of the call to shm_inc(). If, after decrementing the attachment count, the resulting count is found to be zero, and if the segment is marked for destruction (SHM_DEST), then shm_destroy() is called to release the shared memory segment resources.
Finally, the virtual address at which the shared memory is mapped is returned to the caller at the user specified address. If an error code had been returned by do_mmap(), then this failure code is passed on as the return value for the system call.
sys_shmdt()
The global shared memory semaphore is held while performing sys_shmdt(). The mm_struct of the current process is searched for the vm_area_struct associated with the shared memory address. When it is found, do_munmap() is called to undo the virtual address mapping for the shared memory segment.
Note also that do_munmap() performs a call-back to shm_close(), which performs the shared-memory book keeping functions, and releases the shared memory segment resources if there are no other attachments.
struct shminfo64 {
unsigned long shmmax;
unsigned long shmmin;
unsigned long shmmni;
unsigned long shmseg;
unsigned long shmall;
unsigned long __unused1;
unsigned long __unused2;
unsigned long __unused3;
unsigned long __unused4;
};
struct shm_info {
int used_ids;
unsigned long shm_tot; /* total allocated shm */
unsigned long shm_rss; /* total resident shm */
unsigned long shm_swp; /* total swapped shm */
unsigned long swap_attempts;
unsigned long swap_successes;
};
struct shmid64_ds {
struct ipc64_perm shm_perm; /* operation perms */
size_t shm_segsz; /* size of segment (bytes) */
__kernel_time_t shm_atime; /* last attach time */
unsigned long __unused1;
__kernel_time_t shm_dtime; /* last detach time */
unsigned long __unused2;
__kernel_time_t shm_ctime; /* last change time */
unsigned long __unused3;
__kernel_pid_t shm_cpid; /* pid of creator */
__kernel_pid_t shm_lpid; /* pid of last operator */
unsigned long shm_nattch; /* no. of current attaches */
unsigned long __unused4;
unsigned long __unused5;
};
newseg()
The newseg() function is called when a new shared memory segment needs to be created. It acts on three parameters for the new segment the key, the flag, and the size. After validating that the size of the shared memory segment to be created is between SHMMIN and SHMMAX and that the total number of shared memory segments does not exceed SHMALL, it allocates a new shared memory segment descriptor. The shmem_file_setup() function is invoked later to create an unlinked file of type tmpfs. The returned file pointer is saved in the shm_file field of the associated shared memory segment descriptor. The files size is set to be the same as the size of the segment. The new shared memory segment descriptor is initialized and inserted into the global IPC shared memory descriptors array. The shared memory segment ID is created by shm_buildid() (via ipc_buildid()). This segment ID is saved in the id field of the shared memory segment descriptor, as well as in the i_ino field of the associated inode. In addition, the address of the shared memory operations defined in structure shm_file_operation is stored in the associated file. The value of the global variable shm_tot, which indicates the total number of shared memory segments system wide, is also increased to reflect this change. On success, the segment ID is returned to the caller application.
shm_get_stat()
shm_get_stat() cycles through all of the shared memory structures, and calculates the total number of memory pages in use by shared memory and the total number of shared memory pages that are swapped out. There is a file structure and an inode structure for each shared memory segment. Since the required data is obtained via the inode, the spinlock for each inode structure that is accessed is locked and unlocked in sequence.
shmem_lock()
shmem_lock() receives as parameters a pointer to the shared memory segment descriptor and a flag indicating lock vs. unlock.The locking state of the shared memory segment is stored in an associated inode. This state is compared with the desired locking state; shmem_lock() simply returns if they match.
While holding the semaphore of the associated inode, the locking state of the inode is set. The following list of items occur for each page in the shared memory segment:
find_lock_page() is called to lock the page (setting PG_locked) and to increment the reference count of the page. Incrementing the reference count assures that the shared memory segment remains locked in memory throughout this operation.
If the desired state is locked, then PG_locked is cleared, but the reference count remains incremented.
If the desired state is unlocked, then the reference count is decremented twice once for the current reference, and once for the existing reference which caused the page to remain locked in memory. Then PG_locked is cleared.
shm_destroy()
During shm_destroy() the total number of shared memory pages is adjusted to account for the removal of the shared memory segment. ipc_rmid() is called (via shm_rmid()) to remove the Shared Memory ID. shmem_lock is called to unlock the shared memory pages, effectively decrementing the reference counts to zero for each page. fput() is called to decrement the usage counter f_count for the associated file object, and if necessary, to release the file object resources. kfree() is called to free the shared memory segment descriptor.
shm_inc()
shm_inc() sets the PID, sets the current time, and increments the number of attachments for the given shared memory segment. These operations are performed while holding the global shared memory spinlock.
shm_close()
shm_close() updates the shm_lprid and the shm_dtim fields and decrements the number of attached shared memory segments. If there are no other attachments to the shared memory segment, then shm_destroy() is called to release the shared memory segment resources. These operations are all performed while holding both the global shared memory semaphore and the global shared memory spinlock.
shmem_file_setup()
The function shmem_file_setup() sets up an unlinked file living in the tmpfs file system with the given name and size. If there are enough systen memory resource for this file, it creates a new dentry under the mount root of tmpfs, and allocates a new file descriptor and a new inode object of tmpfs type. Then it associates the new dentry object with the new inode object by calling d_instantiate() and saves the address of the dentry object in the file descriptor. The i_size field of the inode object is set to be the file size and the i_nlink field is set to be 0 in order to mark the inode unlinked. Also, shmem_file_setup() stores the address of the shmem_file_operations structure in the f_op field, and initializes f_mode and f_vfsmnt fields of the file descriptor properly. The function shmem_truncate() is called to complete the initialization of the inode object. On success, shmem_file_setup() returns the new file descriptor.
5.4 Linux IPC Primitives
Generic Linux IPC Primitives used with Semaphores, Messages,and Shared Memory
The semaphores, messages, and shared memory mechanisms of Linux are built on a set of common primitives. These primitives are described in the sections below.
ipc_alloc()
If the memory allocation is greater than PAGE_SIZE, then vmalloc() is used to allocate memory. Otherwise, kmalloc() is called with GFP_KERNEL to allocate the memory.
ipc_addid()
When a new semaphore set, message queue, or shared memory segment is added, ipc_addid() first calls grow_ary() to insure that the size of the corresponding descriptor array is sufficiently large for the system maximum. The array of descriptors is searched for the first unused element. If an unused element is found, the count of descriptors which are in use is incremented. The kern_ipc_perm structure for the new resource descriptor is then initialized, and the array index for the new descriptor is returned. When ipc_addid() succeeds, it returns with the global spinlock for the given IPC type locked.
ipc_rmid()
ipc_rmid() removes the IPC descriptor from the the global descriptor array of the IPC type, updates the count of IDs which are in use, and adjusts the maximum ID in the corresponding descriptor array if necessary. A pointer to the IPC descriptor associated with given IPC ID is returned.
ipc_buildid()
ipc_buildid() creates a unique ID to be associated with each descriptor within a given IPC type. This ID is created at the time a new IPC element is added (e.g. a new shared memory segment or a new semaphore set). The IPC ID converts easily into the corresponding descriptor array index. Each IPC type maintains a sequence number which is incremented each time a descriptor is added. An ID is created by multiplying the sequence number with SEQ_MULTIPLIER and adding the product to the descriptor array index. The sequence number used in creating a particular IPC ID is then stored in the corresponding descriptor. The existence of the sequence number makes it possible to detect the use of a stale IPC ID.
ipc_checkid()
ipc_checkid() divides the given IPC ID by the SEQ_MULTIPLIER and compares the quotient with the seq value saved corresponding descriptor. If they are equal, then the IPC ID is considered to be valid and 1 is returned. Otherwise, 0 is returned.
grow_ary()
grow_ary() handles the possibility that the maximum (tunable) number of IDs for a given IPC type can be dynamically changed. It enforces the current maximum limit so that it is no greater than the permanent system limit (IPCMNI) and adjusts it down if necessary. It also insures that the existing descriptor array is large enough. If the existing array size is sufficiently large, then the current maximum limit is returned. Otherwise, a new larger array is allocated, the old array is copied into the new array, and the old array is freed. The corresponding global spinlock is held when updating the descriptor array for the given IPC type.
ipc_findkey()
ipc_findkey() searches through the descriptor array of the specified ipc_ids object, and searches for the specified key. Once found, the index of the corresponding descriptor is returned. If the key is not found, then -1 is returned.
ipcperms()
ipcperms() checks the user, group, and other permissions for access to the IPC resources. It returns 0 if permission is granted and -1 otherwise.
ipc_lock()
ipc_lock() takes an IPC ID as one of its parameters. It locks the global spinlock for the given IPC type, and returns a pointer to the descriptor corresponding to the specified IPC ID.
ipc_unlock()
ipc_unlock() releases the global spinlock for the indicated IPC type.
ipc_lockall()
ipc_lockall() locks the global spinlock for the given IPC mechanism (i.e. shared memory, semaphores, and messaging).
ipc_unlockall()
ipc_unlockall() unlocks the global spinlock for the given IPC mechanism (i.e. shared memory, semaphores, and messaging).
ipc_get()
ipc_get() takes a pointer to a particular IPC type (i.e. shared memory, semaphores, or message queues) and a descriptor ID, and returns a pointer to the corresponding IPC descriptor. Note that although the descriptors for each IPC type are of different data types, the common kern_ipc_perm structure type is embedded as the first entity in every case. The ipc_get() function returns this common data type. The expected model is that ipc_get() is called through a wrapper function (e.g. shm_get()) which casts the data type to the correct descriptor data type.
ipc_parse_version()
ipc_parse_version() removes the IPC_64 flag from the command if it is present and returns either IPC_64 or IPC_OLD.
Generic IPC Structures used with Semaphores,Messages, and Shared Memory
The semaphores, messages, and shared memory mechanisms all make use of the following common structures:
struct kern_ipc_perm
Each of the IPC descriptors has a data object of this type as the first element. This makes it possible to access any descriptor from any of the generic IPC functions using a pointer of this data type.
struct ipc_ids
The ipc_ids structure describes the common data for semaphores, message queues, and shared memory. There are three global instances of this data structure-- semid_ds, msgid_ds and shmid_ds-- for semaphores, messages and shared memory respectively. In each instance, the sem semaphore is used to protect access to the structure. The entries field points to an IPC descriptor array, and the ary spinlock protects access to this array. The seq field is a global sequence number which will be incremented when a new IPC resource is created.
struct ipc_ids {
int size;
int in_use;
int max_id;
unsigned short seq;
unsigned short seq_max;
struct semaphore sem;
spinlock_t ary;
struct ipc_id* entries;
};
struct ipc_id
An array of struct ipc_id exists in each instance of the ipc_ids structure. The array is dynamically allocated and may be replaced with larger array by grow_ary() as required. The array is sometimes referred to as the descriptor array, since the kern_ipc_perm data type is used as the common descriptor data type by the IPC generic functions.