免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1847 | 回复: 0
打印 上一主题 下一主题

Dtrace Solaris 10的又一强大工具,下面文章入门的罗列一点它的功能和概念 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2006-05-20 00:08 |只看该作者 |倒序浏览

Description:

Top


A Quick Introduction to the Solaris(TM) Dynamic Tracing Framework (a.k.a
DTrace).
If you have ever tried to understand the behavior of a Solaris system, and I suspect that that is most of the readership, then a new technology is emerging that will change your life forever! DTrace is a major new subsystem which has been integrated into Solaris 10, and changes the way in which we observe Solaris. DTrace is already of proven benefit to customers, application and kernel developers so if this includes you then read on ...
Document Body:

Top


DTrace is the crystal ball you've always wanted. It gives you the ability to easily explore all of your system from the top of the application stack to the bowels of the kernel in an intuitive way. It allows you to generate concise answers to almost arbitrary questions giving you the ability to hone in on problem areas with extreme speed.
For starters, have you ever tried to (or would like to) answer one or more of the following random questions whilst observing a Solaris system:


  • Who is generating that user and system time?

  • Who is generating those system call rates?

  • What are those system calls?

  • What is being written to /dev/null?

  • Does application 'X' call malloc whilst in a signal handler?

  • Who is opening /etc/passwd? Tell me when they do it.

  • Does application 'X' call malloc for allocations over 100MB?

  • If the answer to the above is 'yes', give me a stacktrace to tell me who did it.

  • Can I see all calls (userland and kernel) that are made as a result of a process calling a given function.

  • How big is my run queue really and who is really waiting to run?

  • A system call returned a generic errno such as ENOPERM or ENXIO - what really decided to set this errno?

If the answer to the above is 'yes' then DTrace should become your new best friend. If it's 'no' then it's probably just owing to my inability to think up scenarios!
The main components of DTrace are the probe, the provider, the consumer and the D Programming Language.
As illustrated below, the dtrace(1M) command is the primary consumer which DTrace uses to enable instrumentation into the kernel/process. Consumers are utilities which request information on specific kernel/user statistics and report back to the user. Consumers are usually run by DTrace scripts, but can be called from the command line. The consumers are then processed through the DTrace software library and compiler. Utilities called Providers, which are trigger points to kernel modules enabling a requested probe to fire when it is hit, then retrieve the requested data and report back up the chain to the consumer requesting the data.
               a.d   b.d
               intrstat(1M)
               dtrace(1M)   
               sysinfo   
A probe is a point of interest which we can stop at and perform a set of actions that will typically record some data for us. A probe is made up of a provider name,module, function and name, and is rendered in a 4-tuple format:
       provider:module:function:name
The "provider" is the name of the DTrace provider that is publishing this probe (trigger point).
The "module" is the name of the module in which this probe is located.
The "function" is the name of the function within the module in which this probe is located.
The "name" is a name that gives an idea of the probe's semantic meaning (what it does).
Each probe is assigned a unique identifying integer (viewable in ID field of dtrace output). A probe with a specific purpose/module is called an "anchored" probe. A probe with no specific module is referred to as "unanchored."


  • An example of an anchored probe: fbt:pm:pm_open:entry

  • An example of an unanchored probe: dtrace:::BEGIN

  • Some anchored probes do not list a module, but list a function: syscall::forkall:entry

Since the "forkall" is a unique system call, it's associated module does not require a reference.
DTrace has its own language called 'D' which is formed from a large subset of the 'C' programming language plus some tracing-centric additions. (If you don't know 'C' don't panic because 'D' is really easy - a few examples are all that's needed to get you on the way!). A 'D' program (we'll see some later) is essentially made up of probes, predicates and actions. A probe is a point of interest that we can stop at and perform a set of actions that will typically record some data for us. Typical actions may include recording a timestamp, a stacktrace or a function argument. A predicate allows us to conditionally execute a probe based upon a given set of conditions at the time we hit (or 'fired') the probe.
The dtrace(1) command compiles your 'D' and injects it into the kernel where it is executed by DTrace. The probes named in your program are enabled and the running kernel or application is instrumented. Note that all this is totally dynamic so probes are enabled only when you use them and instrumented code is present only for the duration of your dtrace(1) session. This means that there is zero probe effect for disabled probes and the only tracing effect there is (which is minimal) is directly attributed to the enabled probes. The DTrace environment is also able to check your 'D' for run time errors such as divide by zero and invalid dereferences so you can never damage a live kernel or application with errant or mischievous code therefore tracing can be done on live, production kernels.
The number of probes that are available is staggering. On my impoverished Ultra 10 I have 40673 kernel based probes (this will vary from system to system) and we can instrument down to instruction level in every userland application therefore giving us an infinite number of probes!
Well, all that said, the best thing to do now is to have a look at how a few simple things are done. Firstly we'll look at an age old problem, that being the sequence of questions which are:


  • What are all those system calls?

  • Who is generating them?

  • How are they generating them?

Right now in Solaris it is possible to answer these simple questions using a mix of tools comprising of prex(1), tnfdump(1), truss(1), pstack(1), prun(1) and your favorite way of munging masses of data together into a coherent form i.e awk(1) or perl(1) ...
In DTrace we can do:
What are all these system calls?
# cat syscall.d
#!/usr/sbin/dtrace -qs
syscall:::entry
{
         @a[probefunc] = count();
}
We won't worry about the detail here but this enables the system call entry point probe for every system call (216 in total). We are using what is called an aggregation to record the name of the system call probe we fired and also to keep a count of the number of times we fired it. Running this for 5 seconds on my idle laptop we get:
# ./syscall.d
fstat64                                                           1
writev                                                            2
sigaction                                                         2
lwp_park                                                          3
pset                                                              3
setcontext                                                        3
gtime                                                             4
brk                                                               6
nanosleep                                                         6
xstat                                                             7
lstat64                                                          21
write                                                            28
read                                                             28
p_online                                                         42
poll                                                             76
ioctl                                                            85
gettimeofday                                                    101
sigprocmask                                                     378
From this we can see that the sigprocmask(2) call is the most prolific. We can then move on to ask:
Who is making the sigprocmask(2) calls?
# cat sigprocmask.d
#!/usr/sbin/dtrace -qs
syscall::sigprocmask:entry
{
         @a[execname] = count();
}
Here we only enable the sigprocmask(2) system call probe and we use an aggregation again to record the number of times that sigprocmask(2) was called and the name of the application that called it.
# ./sigprocmask.d
dtrace                                                            2
sendmail                                                          4
Xsun                                                            356
So, the chief protagonist is Xsun (my Xserver). So, what code in Xsun is at the root of calling sigprocmask(2)? Let's move on to:
How are we generating the sigprocmask(2) calls?
# cat Xsun.d
#!/usr/sbin/dtrace -qs
syscall::sigprocmask:entry
/ execname == "Xsun" /
{
         @a[ustack()] = count();
}
Along with a probe we can specify a predicate. As mentioned previously a predicate is merely a way of controlling the firing of a probe based upon given conditions. Here we say that we only want to execute the actions associated with the syscall::sigprocmask:entry probe if the executable that is firing it is 'Xsun'. Again we use an aggregation to keep count of the number of times a given stack trace in the userland Xsun process leads to a call to sigprocmask(2):
#./Xsun.d
             libc.so.1`__sigprocmask+0x20
             libc.so.1`sigprocmask+0x2a
             ddxSUNWkbd.so.1`sunKbdBlockHandler+0x54
             Xsun`BlockHandler+0x42
             Xsun`WaitForSomething+0x5e8
             Xsun`Dispatch+0x77
             Xsun`main+0x4f9
             808eab3
             Xsun`main+0x4f9
             808eab3
              36
             libc.so.1`__sigprocmask+0x20
             libc.so.1`sigprocmask+0x2a
             ddxSUNWmouse.so.1`sunMouseWakeupHandler+0x77
             Xsun`NextWakeupHandler+0x2c
             ddxSUNWkbd.so.1`sunKbdWakeupHandler+0x2c
             Xsun`WakeupHandler+0x6f
             Xsun`WaitForSomething+0x783
             Xsun`Dispatch+0x77
             Xsun`main+0x4f9
             808eab3
              36
For the sake of brevity I've had to strip out some stack traces. What we can see though is that Xservers event loop is generating the sigprocmask(2) calls. Nothing earth shattering in itself but remarkable in the simplicity and speed with which that information was derived.
A couple of other quick and trivial (yet interesting) examples are:
How big are my write(2)'s?
It's sometimes useful to gain an understanding of the size of writes that an application issues. This becomes easy by using a type of aggregating function known as a quantization. Using this we can quickly generate a power of two frequency distribution of a given piece of data, i.e. the size of write(2) calls that Netscape Navigator(TM) is making in this case:
#!/usr/sbin/dtrace -qs
syscall::write:entry
/ execname == ".netscape.bin" /
{
  @a["Netscape Write Distribution"] = quantize(arg2);
}
The third argument to write(2) is the size of the data being written. (Remember that arguments are counted starting at zero, so arg2 represents the third argument). We get a nice picture from this that will look very familiar to lockstat users:
# ./netscape.d
Netscape Write Distribution                       
          value  ------------- Distribution ------------- count   
              0 |                                         0        
              1 |@                                        180      
              2 |                                         0        
              4 |                                         8        
              8 |                                         0        
             16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             2795     
             32 |@                                        185      
             64 |                                         60      
            128 |                                         84      
            256 |@                                        176      
            512 |@                                        103      
           1024 |@                                        97      
           2048 |@                                        136      
           4096 |                                         38      
           8192 |                                         6        
          16384 |                                         0        
Here we can see that a high proportion of Netscape Navigator's write(2) calls are 16 bytes in size. No wonder busy SunRay servers need to do a lot of IOPS!
What is being written to /dev/null?
Being a nosey kind of person I always wondered about this. This was the first thing I ever wrote in D and it shows the power and simplicity of DTrace. All we do is probe on the driver routine that gets entered when we write(2) to the /dev/null device and take apart the data passed in on the fly!:
# cat devnull.d
#!/usr/sbin/dtrace -Cqs
#include
mmrw:entry
/ (args[0] & L_MAXMIN) == 2 && args[2] == 1 /
{
  iov = args[1]->uio_iov;
  printf("%s", stringof(copyin((uintptr_t)iov->iov_base, iov->iov_len)));
}
It's amazing what gets thrown down to /dev/null (just see how much ends up there when you look at some man pages ...). So many applications or scripts redirect their output and errors that we'd rather see are completely lost - until now that is. Just watch what goes there when Solaris boots! However, that's a story for another day.
The trivial examples given above are but a speck of dust on the tip of the iceberg around what DTrace can achieve. All that's needed is a basic understanding of DTrace, a rudimentary understanding of how applications and the kernel hang together, and your imagination.
For more information on DTrace, you can visit:
http://www.sun.com/bigadmin/content/dtrace/


本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/18167/showart_115295.html
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP