Tracing Parallel Execution with _px_trace. Part I [ID 444164.1]

TOMSYAN 发表于 2011-12-23 03:09

<DIV>
<TABLE cellSpacing=6 cellPadding=0 width="100%" border=0>
<TBODY>
<TR>
<TD vAlign=top noWrap width="50%" colSpan=3>改时间 10-OCT-2011     类型 BULLETIN     状态 PUBLISHED</TD>
<TD vAlign=top> </TD></TR></TBODY></TABLE>

***Checked for relevance on 29-April-2010*** ***Checked for relevance on 10-Oct-2011*** Tracing Parallel Execution with _px_trace. Part I 
Introduction
Since 9.0.1 a new mechanism has been introduced to obtain Parallel Execution (PX) trace information. This trace mechanism is keyword and parameter based. The existing event-based tracing is still available, but for some areas in _px_trace like compilation there is no event. Because each component of PX controls its own tracing, getting an event-based trace usually means setting one or more events and then deciding at what level they should be set; this can be cumbersome and there may not be any consistency of trace format across the different events. Keyword-based tracing is simpler to set than event-based tracing and is consistent across PX components.
This note gives an overview of some parts of the new trace facility and can also be used to help diagnose PX problems. This note is written for people with some background and experience in PX.
This first part gives some examples of how to trace the compilation of a PX query. This helps for question "Why does my query (not) run in parallel?". It also gives some examples of how to trace the allocation of PX server ( slaves ) and some idea why they are maybe not can be allocated. 
There are three sections to this note : 
<OL>
<LI><a href="https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=444164.1#Important PX phrases" target="_blank">Important PX phrases </A>Explain the main phrases/terms that are used in document.
<LI><a href="https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=444164.1#Usage" target="_blank">Usage </A>- gives some examples with annotated trace illustrating various scenarios :
<UL>
<LI><a href="https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=444164.1#Example1" target="_blank">Example#1 </A>select statement with one slave set, degree of parallelism from the dictionary.
<LI><a href="https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=444164.1#Example2" target="_blank">Example#2 </A>select statement with two slave sets, degree of parallelism from the dictionary.
<LI><a href="https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=444164.1#Example#3" target="_blank">Example#3 </A>select statement, too few slaves available to run query in parallel.
<LI><a href="https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=444164.1#Example #4" target="_blank">Example#4 </A>requested Degree Of Parallelism (DOP) rounded down dynamically due to CPU load
<LI><a href="https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=444164.1#Example#5" target="_blank">Example#5 </A>Degree Of Parallelism taken from a hint.
<LI><a href="https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=444164.1#Example#6" target="_blank">Example#6 </A>Join of two tables each with different DOP setting, which DOP do we choose ?
<LI><a href="https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=444164.1#Example#7" target="_blank">Example#7 </A>Parallel execution in RAC. </LI></UL>
<LI><a href="https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=444164.1#Current" target="_blank">Current SQL cursor</A> How to find the current SQL command for what the trace information is written.
<LI><a href="https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=444164.1#Reference" target="_blank">Reference</A> - gives a full list of the options. </LI></OL>
 <A name="Important PX phrases"></A>Important PX phrases
<OL>
<LI>QC The user background process that coordinates the slaves and that does all the work that runs in serial.
<LI>Instance All the active processes and the memory of the database belong to one Oracle SID. A RAC system is made of more than one instance.
<LI>set/threadsParallel query use a divide and conquer strategy. This means if for example a large table needs to be scan, multiple processes are used to scan parts of the table. This processes is a single step are called thread and the number is the DOP. Oracle use for complex query not only 1 set of process it use 2 or more. This is necessary to avoid skewed data.
<LI>DOP This is the number of threads in 1 set.
<LI>slave This is a synonym for a thread ( single process in a set).
<LI>height This is number of thread in a set ( synonym for DOP).
<LI>server/Svrs This is a synonym for thread or slave. </LI></OL>
 <A name=Usage></A>Usage
All of these examples were generated with the following trace setting:
SQL> alter session set "_px_trace"="compilation","execution","messaging";
 Trace files are created in the user_dump_dest and for the slaves in the background_dump_dest. On some platforms they are written in the user_dump_dest. On RAC systemes trace files are written on every node, where a slave or the qc is started. Interesting is here the QC trace files.
<A name=Example1></A>Example #1
This example should give an overview over the most important functions in the area of compilation and allocation of parallel execution servers. Here's a simple query which will use one slave set with two slaves:
SQL> alter session set "_px_trace"="compilation","execution","messaging"; Create a large table for the next examples with: CREATE TABLE bigemp AS SELECT * FROM all_objects; SQL> alter table bigemp parallel 2; SQL> select count(*) from bigemp;
------------ start of trace -------------- --- cut some lines--- kkfdapdml       pgadep:2 pdml mode:0 PQ allowed DML allowed not autonomous => not allowed -->This are checks if PDML is enabled. This lines can be ignored. --- cut some lines--- The table : 1166 ---> This is Table object id. from all_objects. kkfdPaPrm ---> This shows the functions from the source code.       DOP = 2 (computed from hint/dictionary)       use dictionary DOP on table ---> Use Degree Of Parallelism from the dictionary kkfdtsc       DOP = 2 (from kkfdPaPrm()) kkfdPiAlo       DOP = 2 (kkfdpideg)       Invalid Part. No. ---> Not a partitioned table       No Pruning Info ---> Ditto       Pararllel Operation Type = 2 (kkfdpiOpType)       Flags = 41 (kkfdpiflg)       min. tranx free list = 0 (kkfdpimgi) kkfdPaPrm       DOP = 1 (computed from hint/dictionary)       use dictionary DOP on table kkfdtsc       serial - DOP = 1 (from kkfdPaPrm()) or 10383 level 1 not set kxfrSysInfo       DOP trace -- compute default DOP from system info       # instance alive = 1 (kxfrsnins) kxfrDefaultDOP       DOP Trace -- compute default DOP       # CPU = 1       Threads/CPU = 2 ("parallel_threads_per_cpu")       default DOP = 2 (# CPU * Threads/CPU)       default DOP = 2 (DOP * # instance) kxfrSysInfo       system default DOP = 2 (from kxfrDefaultDOP()) ---> This is the computed System default DOP. kxfralo       DOP trace -- requested thread from best ref obj = 2 (from kxfrIsBestRef()) ---> This is the DOP after all objects are checked. The DOP can be adjusted in the next functions, because there are restrictions like adaptive load algorithm. kxfralo       threads requested = 2 (from kxfrComputeThread()) kxfralo       adjusted no. threads = 2 (from kxfrAdjustDOP()) kxfralo       about to allocate 2 slaves kxfrAllocSlaves       DOP trace -- call kxfpgsg to get 2 slaves kxfpgsg       num server requested = 2 kxfpiinfo       inst       1m ---> On this system we have: 1 instance kxfpgsg       getting 1 sets of 2 threads, client parallel query execution flg=0x30 ---> Get 1 slave set of 4 slaves       Height=2, Affinity List Size=0, inst_total=1, coord=1       Insts 1       Threads 2 kxfpg1srv       trying to get slave P000 on instance 1 ---> Let's go and grab the slaves ( repeated for each slave). We see also the name of the slave process. kxfpg1sg       Got It. 1 so far. kxfpg1srv       trying to get slave P001 on instance 1 kxfpg1sg       Got It. 2 so far. kxfrAllocSlaves       actual num slaves alloc'd = 2 (kxfpqcthr) ---> Then display the number of slaves allocated kxfxcPutSession ---> This function is only visible in version 10g or higher. This shows the CBO parameter that are used to compile the PX query. Send Compilation environment:       optimizer_mode_hinted = false       optimizer_features_hinted = 0.0.0       parallel_execution_enabled = true       parallel_query_forced_dop = 0       parallel_dml_forced_dop = 0       parallel_ddl_forced_degree = 0       parallel_ddl_forced_instances = 0       _query_rewrite_fudge = 90       optimizer_features_enable = 10.2.0.3 ...... ------------end of trace -------------- For questions like "Why does my query run in parallel?", the DOP setting in function kkfdPaPrm should be checked. Is there one with a DOP > 1? If so, this part of the trace will also show the object id from which we got the DOP setting and this id can be used to find the object name. Then you can check if there is are parallel setting in the data dictonary or if there is a hint. 
<A name=Example2></A>Example #2
A slightly more complex query which will use 2 slaves sets each of 2 slaves and therefore require 4 slaves in total
SQL> alter table bigemp parallel 2; SQL> select ename, count(*) from bigemp group by ename;
To make it easier to read, only extracts from the trace file will be shown:
------------ start trace --------------
kkfdPaPrm       DOP = 2 (computed from hint/dictionary) use dictionary DOP on table ---> From ALTER TABLE BIGEMP PARALLEL 2
--- cut some trace ---
kxfpiinfo       inst       1 ---> Instance 1 
--- cut some trace ---
kxfpgsg       getting 2 sets of 2 threads, client parallel query execution flg=0x30 ---> Try to get 2 slave sets, each of 2 slaves, i.e. 4 PQ slaves.       Height=2, Affinity List Size=0, inst_total=1, coord=1       Insts 1       Threads 2
--- cut some trace ---
kxfpgsg       got 2 threads on 1 instance(s), avg height=2       Insts 1       Svrs 4 ---> We got the 4 slaves.
----- end trace --------- 
<A name="Example #3"></A>Example #3
SQL> alter table bigemp parallel 8; SQL> select count(*) from bigemp; 
In this example the <> was set to TRUE, which in turns forces <> to be TRUE. So the requested degree of parallelism can be reduced dynamically based on the current machine load.
---- start ----
kxfrAllocSlaves       DOP trace -- call kxfpgsg to get 8 slaves ---> Try for 8 slaves
kxfpAdaptDOP       Requested=8 Granted=4 Target=4 Load=1 Default=2 users=0 sets=1 ---> Adaptive load algorithm rounds down to 4 kxfpgsg       load adapt num servers requested to = 4 (from kxfpAdaptDOP()) 
---- End ---- 
<A name=Example#4></A>Example #4
For this example:
- parallel_max_servers = 10 - parallel_min_percent = 0 ( we can still parallelize even if we can't grab 100% of requested slaves) - Query requires 12 slaves
SQL> alter table bigemp parallel 12; SQL> select count(*) from bigemp;
------ start --------
kxfpgsg       getting 1 sets of 12 threads, client parallel query execution flg=0x30       Height=12, Affinity List Size=0, inst_total=1, coord=1       Insts 1       Threads 12 ---> We try to get 12 slaves.
--- cut some lines---
kxfpgsg       got 10 threads on 1 instance(s), avg height=10       Insts 1       Svrs 10 ---> We actually allocate 10 slaves. 
----- end -------
If parallel_min_percent had been = 100, this would indicate we must grab all requested slaves for the query to be able to run, otherwise the user sees:
- ORA-12827: insufficient parallel query slaves available
and in the trace file:
kxfrAllocSlaves       actual num slaves alloc'd = 10 (kxfpqcthr)       DOP trace -- num slaves alloc'd < min required(min=%d) ---> We didn't find 100% of requested slaves, return error. 
<A name=Example#5></A>Example#5
In this example the trace illustrates how the degree of parallelism was derived from a query hint:
SQL> select /*+PARALLEL(BIGEMP,3) */ count(*) from bigemp;
------ start ------ kkfdPaPrm       DOP = 3 (computed from hint/dictionary)       use hint DOP on table ---> We used a hint to derive DOP ---- end -----
<A name=Example#6></A>Example#6
In this example, two tables are joined, each has a different DOP, which DOP do we use :
SQL>alter table bigemp parallel 4; SQL>alter table bigdept parallel 2; SQL>select dname, avg(sal) from bigemp e, bigdept d where e.deptno=d.deptno group by dname;
---- start ----
kkfdPaPrm       DOP = 4 (computed from hint/dictionary) ---> DOP for BIGEMP       use dictionary DOP on table --- cut some lines--- kkfdPaPrm       DOP = 2 (computed from hint/dictionary) ---> DOP for BIGDEPT       use dictionary DOP on table --- cut some lines--- kxfralo       DOP trace -- requested thread from best ref obj = 4 (from kxfrIsBestRef ---> Use BIGEMP's DOP
---- end ----
<A name=Example#7></A>Example#7
The following query was run on a 2 instance RAC database.
(note Parallel_automatic_tuning = TRUE ).
SQL>alter table bigemp parallel 4; SQL>select count(*) from bigemp; 
---- start ----
kxfpiinfo       inst       1 2 ---> Instance# [ CPU_COUNT : PARALLEL_MAX_SERVERS ]
kxfpclinfo       inst(load:user:pct:fact)aff       1(4 :0 :100:50 ) 2(0:0:100:200) ---> Load information for each instance, so instance #1 : ---> load=4 ---> 0 PX users ---> User has no resource restriction, can use 100% of system ---> Load factor represents how much more load the machine can accept the bigger the number, the better. kxfpAdaptDOP       Requested=4 Granted=4 Target=16 Load=4 Default=8 users=0 sets=1 ---> Target of 16 is estimate of best load for the cluster. ---> DOP of queries may be reduced to maintain this target as best we can. kxfpsori       Sorted: 2(0:200) 1(4:50) ---> Sort instances by load and factor. ---> Put "unit" more slaves on instance that can best afford it. ---> More slaves will go to Instance 2 as it has lower load.
kxfpunit       var=4 limit=8 use_aff=0 aff_num=0 unit=2 ---> Unit is a function of CPU_COUNT on the QC node.
kxfpgsg       getting 1 sets of 4 threads, client parallel query execution flg=0x30 ---> try for 1 slave set of 4 slaves       Height=0, Affinity List Size=0, inst_total=2, coord=1 ---> We have two instances, QC is on instance #1.       Insts 2 1 ---> Instances are 2 and 1       Threads 4 0 ---> We will try to get 4 slaves on instance 2. The numbers are # of slaves corresponding to the Instances in previous line in the order they are printed. kxfpg1srv       trying to get slave P000 on instance 2 ---> get slaves from instance #2 kxfpg1sg       Got It. 1 so far. kxfpg1srv       trying to get slave P001 on instance 2 kxfpg1sg       Got It. 2 so far.
--- cut ----
      got 4 threads on 1 instance(s), avg height=4 ---> We managed to get 4 slaves on instance #2. Thread means here slaves in 1 slaves set       Insts 2 1       Svrs 4 0
---- end ---- 
<A name=Current></A>Current SQL cursor How to find the SQL sent to Slaves: In 9i this is not so easy, because every slave set get a new SQL command that is part of the original query. AS example: 
kxfxcp1       parsing at slave set 1:         SELECT /*+ CIV_GB */ A1.C0,COUNT(SYS_OP_CSR(A1.C1,0)) FROM :Q2753000 A1 GROUP BY A1.C0       TQ: 1 nbgra: 0 flg: 0x1 kxfxcp1       parsing at slave set 2:        SELECT /*+ PIV_GB */ A1.C0 C0,SYS_OP_MSR(COUNT(*)) C1 FROM (SELECT /*+        NO_EXPAND ROWID(A2) */ A2."DEPTNO" C0 FROM "SCOTT"."EMP" PX_GRANULE(0        , BLOCK_RANGE, DYNAMIC) A2) A1 GROUP BY A1.C0       TQ: 0 nbgra: 1 flg: 0x2002 In 10g you have to search for the function kxfxcp1. 
kxfxcp1       Sending parse to slave set 1:        User sqllen sent from QC = 74         ; select /*+ parallel(t,5) */ owner# , count(*) from test t group by owner# In 10g all slaves in all sets work on the same SQL command, so in every function kxfxcp1 you wil the same SQL command. It is not longer possible to see o what part of the SQL teh current set work. 
<A name=Reference></A>Reference
This section documents describes teh syntax of _px_trace. In practice only a subset as documented above will be required.
There are up to three components that can be specified when starting the trace
- Area in which tracing is required. - Verbosity. - Do we want timing information.
alter session set "_px_trace"=[area],[area],..,; as example alter session set "_px_trace"=high,execution,medium,execution,time;
The possible values for each of these is
Verbosity:
<UL>
<LI>high
<LI>medium
<LI>low </LI></UL>Area:
<UL>
<LI>scheduling - ( equivalent to some of event 10384 and some of 10390)
<LI>execution - (equivalent to some of event 10390)
<LI>granule - (equivalent to some of event 10390 and some of 10391)
<LI>messaging - (equivalent to event 10392 and event 10393)
<LI>buffer - (equivalent to event 10399)
<LI>compilation - ( no equivalent event)
<LI>all - all of the above
<LI>none - none of the above. </LI></UL>Timing
<UL>
<LI>time </LI></UL>Here are some examples :
In the init.ora file:
<BLOCKQUOTE>_px_trace="high","execution","medium","buffer","time"</BLOCKQUOTE>Or, more usually, at the session level:
<BLOCKQUOTE>SQL> alter session set "_px_trace"="high","execution","medium","buffer","time";</BLOCKQUOTE>To get the full trace:
<BLOCKQUOTE>SQL>alter session set "_px_trace"="all";</BLOCKQUOTE>Note : Put the verbosity level followed by the areas to be traced at that level of verbosity.
To switch off the trace :
<BLOCKQUOTE>SQL> alter session set "_px_trace"="none"; </BLOCKQUOTE></DIV>

页: [1]

Chinaunix's Archiver

Tracing Parallel Execution with _px_trace. Part I [ID 444164.1]