免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1451 | 回复: 0
打印 上一主题 下一主题

Take charge of processor affinity [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2007-04-11 12:34 |只看该作者 |倒序浏览

Why (three reasons) and how to use hard (versus soft) CPU affinity
Level: Intermediate
Eli Dow
(
[email=emdow@us.ibm.com?subject=Take charge of processor affinity]emdow@us.ibm.com[/email]
), Software Engineer, IBM Linux Test and Integration Center
29 Sep 2005
Knowing a little bit about how the Linux® 2.6 scheduler treats CPU affinity can help you design better userspace applications. Soft affinity means that processes do not frequently migrate between processors, whereas hard affinity means that processes run on processors you specify. This article describes current affinity mechanisms, explains why and how to use hard affinity, and provides sample code showing you how to use the available functionality.
Simply stated, CPU affinity is the tendency for a process to run on a given CPU as long as possible without being moved to some other processor. The Linux kernel process scheduler inherently enforces what is commonly referred to as soft CPU affinity, which means that processes generally do not frequently migrate between processors. This state is desirable because processes that seldom migrate often incur less overhead.
The 2.6 Linux kernel also contains a mechanism that allows developers to programmatically enforce hard CPU affinity. This means your applications can explicitly specify which processor (or set of processors) a given process may run on.
What is Linux kernel hard affinity?
In the Linux kernel, all processes have a data structure associated with them called the task_struct. This structure is important for a number of reasons, most pertinent being the cpus_allowed bitmask. This bitmask consists of a series of n bits, one for each of n logical processors in the system. A system with four physical CPUs would have four bits. If those CPUs were hyperthread-enabled, they would have an eight-bit bitmask.
If a given bit is set for a given process, that process may run on the associated CPU. Therefore, if a process is allowed to run on any CPU and allowed to migrate across processors as needed, the bitmask would be entirely 1s. This is, in fact, the default state for processes under Linux.
The Linux kernel API includes some methods to allow users to alter the bitmask or view the current bitmask:

  • sched_set_affinity() (for altering the bitmask)
  • sched_get_affinity() (for viewing the current bitmask)

Note that cpu_affinity is passed on to child threads,so you should place calls to the sched_set_affinity appropriately.


Back to top
Why should you use hard affinity?
Normally the Linux kernel does a good job of scheduling processes to run where they should (that is, running on available processors and obtaining good overall performance). The kernel includes algorithms for detecting skewed workloads across CPUs, enabling process migration to less busy processors.
As a rule of thumb, you should simply use the default scheduler behaviors in your applications. However, you might want to alter these default behaviors to optimize performance. Let's look at three reasons for using hard affinity.
Reason 1. You have a hunch
Hunch-based scenarios come up so often in scientific and academic computing that they are no doubt applicable to public-sector computing as well. A common indicator is when you know intuitively that your application will need to consume a lot of computational time on multiprocessor machines.
Reason 2. You are testing complex applications
Testing complex software is another reason to be interested in the kernel affinity technology. Consider an application that requires linear scalability testing. Some products claim to perform better with a throw-more-hardware-at-it mantra.
Rather than just purchasing multiple machines (a machine for each processor configuration), you can:

  • Purchase a single multiprocessor machine
  • Incrementally allocate processors
  • Measure your transactions per second
  • Plot the resulting scalability

If your application truly does scale linearly with CPU additions, a plot of transactions-per-second versus number of CPUs should yield a linear relationship (such as a straight diagonal graph -- see the next paragraph). Modeling behavior this way can indicate whether your application can use the underlying hardware efficiently.

Amdahl's Law
Amdahl's Law governs the speedup of using parallel processors on a problem versus using only one serial processor. Speedup is the time it takes a program to execute in serial (with one processor) divided by the time it takes to execute in parallel (with many processors):
     T(1)
S = ------
     T(j)
Where T(j) is the time it takes to execute the program when using j processors.
Amdahl's Law states this probably won't happen in reality, but the closer the better. For the general case, we can deduce that every program will have some sequential component. As problems sets get larger, the sequential component eventually places an upper limit on the optimal solution time.
Amdahl's Law is especially important when you want to keep the CPU cache hit rate high. If a given process gets migrated, it loses the benefits of the CPU cache. In fact, if the CPU you are using needs to cache some specific piece of data for itself, all other CPUs invalidate any entry (for that data) from their own cache.
So, if multiple threads need the same data, it might make sense to bind them to a particular CPU to ensure they all have access to the cached data (or at least improve the odds of a cache hit). Otherwise, the threads might execute on different CPUs and constantly invalidate each other's cache entries.
Reason 3. You are running time-sensitive, deterministic processes
A final reason to be interested in CPU affinity is for real-time (time-sensitive) processes. For example, you might wish to use hard affinity to specify one processor on an eight-way machine, while allowing the other seven processors to handle all the normal scheduling needs of the system. This action ensures that your long-running, time-sensitive application gets run, and also allows other/another application(s) to monopolize the remaining computing resources.
The following sample application shows how this all works.


Back to top
How to code hard affinity
Let's devise a program to make a Linux system very busy. You can construct this program using the system calls mentioned previously along with some other APIs that indicate how many processors are on the system. In essence, the goal is to write a program that can make each processor in a system busy for a few seconds.
Download the sample application
from the "Download" section below.
Listing 1. Keeping the processors busy
/* This method will create threads, then bind each to its own cpu. */
bool do_cpu_stress(int numthreads)
{
   int ret = TRUE;
   int created_thread = 0;
   /* We need a thread for each cpu we have... */
   while ( created_thread
As you can see, the code simply creates a bunch of threads by forking. Each executes the remaining code in the method. Now let's have each thread set affinity to its own CPU.
Listing 2. Setting CPU affinity for each thread
   cpu_set_t mask;
   /* CPU_ZERO initializes all the bits in the mask to zero. */
        CPU_ZERO( &mask );
   /* CPU_SET sets only the bit corresponding to cpu. */
        CPU_SET( created_thread, &mask );
   /* sched_setaffinity returns 0 in success */
        if( sched_setaffinity( 0, sizeof(mask), &mask ) == -1 )
   {
      printf("WARNING: Could not set CPU Affinity, continuing...\n");
   }
If the program executed thus far, our threads would be set with their individual affinity. The call to sched_setaffinity sets the CPU affinity mask of the process denoted by pid. If pid is zero, then the current process is used.
The affinity mask is represented by the bitmask stored in mask. The least significant bit corresponds to the first logical processor number on the system, while the most significant bit corresponds to the last logical processor number on the system.
Each set bit corresponds to a legally schedulable CPU, while an unset bit corresponds to an illegally schedulable CPU. In other words, a process is bound to and will run only on processors whose corresponding bit is set. Usually, all bits in the mask are set. The CPU affinity of each of these threads is passed on to any children forked from them.
Note that you should not alter the bitmask directly. You should use the following macros instead. Though not all were used in our example, they are listed here in case you need them in your own program.
Listing 3. Macros to indirectly alter the bitmask
void CPU_ZERO (cpu_set_t *set)
This macro initializes the CPU set set to be the empty set.
void CPU_SET (int cpu, cpu_set_t *set)
This macro adds cpu to the CPU set set.
void CPU_CLR (int cpu, cpu_set_t *set)
This macro removes cpu from the CPU set set.
int CPU_ISSET (int cpu, const cpu_set_t *set)
This macro returns a nonzero value (true) if cpu is
   a member of the CPU set set, and zero (false) otherwise.
For our purposes, the sample code will go on to have each thread execute some computationally expensive operation.
Listing 4. Each thread executes a compute-intensive operation
    /* Now we have a single thread bound to each cpu on the system */
    int computation_res = do_cpu_expensive_op(41);
    cpu_set_t mycpuid;
    sched_getaffinity(0, sizeof(mycpuid), &mycpuid);
    if ( check_cpu_expensive_op(computation_res) )
    {
      printf("SUCCESS: Thread completed, and PASSED integrity check!\n",
         mycpuid);
      ret = TRUE;
    }
    else
    {
      printf("FAILURE: Thread failed integrity check!\n",
         mycpuid);
      ret = FALSE;
    }
   return ret;
}
There you have the basics of setting CPU affinity for 2.6 Linux kernels. Let's wrap this method call with a fancy main program that takes a user-specified parameter for how many CPUs to make busy. We can even use another method to determine the number of processors in the system:
int NUM_PROCS = sysconf(_SC_NPROCESSORS_CONF);
This method lets the program make wise decisions about how many processors to make busy, such as spinning all by default and allowing users to specify something only in the range of actual processors available on the system.


Back to top
Running the sample application
When you run the
sample application
described above, you can use a variety of tools to see that the CPUs are busy. For simple testing, use the Linux command top. Press the "1" key while running top to see a per-CPU breakdown of executing processes.


Back to top
Conclusion
The sample application, although trivial, shows you the basics of hard affinity as implemented in the Linux kernel. (Any application using this code sample will no doubt do something much more interesting.) At any rate, with a basic understanding of the CPU affinity kernel API, you are in position to squeeze every last drop of performance out of complicated applications.


Back to top
Download
Description
Name
Size
Download method
Sample app using CPU affinity kernel API
thrasher.zip
3 KB
FTP


Information about download methods


Get Adobe® Reader®


Back to top
Resources
Learn

Get products and technologies

Discuss



Back to top
About the author


Eli Dow is a Software Engineer in the IBM Linux Test and Integration Center in Poughkeepsie, NY. He holds a B.S. degree in Computer Science and Psychology and a Masters of Computer Science from Clarkson University. His interests include the GNOME desktop, human computer interaction, and Linux systems programming. You can contact Eli at
emdow@us.ibm.com
.


本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/12757/showart_275539.html
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP