TOMSYAN 发表于 2011-12-23 02:10

Dead Connection Detection (DCD) Explained [ID 151972.1]

<P>Dead Connection Detection (DCD) Explained </P>
<P>--------------------------------------------------------------------------------<BR>&nbsp;<BR>&nbsp; 修改时间 30-SEP-2011&nbsp;&nbsp;&nbsp;&nbsp; 类型 BULLETIN&nbsp;&nbsp;&nbsp;&nbsp; 状态 PUBLISHED&nbsp;&nbsp; </P>
<P><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <BR>Checked for relevance on 30-SEP-2011</P>
<P>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DEAD CONNECTION DETECTION <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =========================</P>
<P>OVERVIEW <BR>-------- <BR>&nbsp;<BR>Dead Connection Detection (DCD) is a feature of SQL*Net 2.1 and later, including<BR>Oracle Net8 and Oracle NET. DCD detects when a partner in a SQL*Net V2 client/server<BR>or server/server connection has terminated unexpectedly, and flags the dead session<BR>so PMON can release the resources associated with it.<BR>&nbsp;<BR>DCD is intended primarily for environments in which clients power down their <BR>systems without disconnecting from their Oracle sessions, a problem<BR>characteristic of networks with PC clients.</P>
<P>DCD is initiated on the server when a connection is established. At this <BR>time SQL*Net reads the SQL*Net parameter files and sets a timer to generate an <BR>alarm.&nbsp; The timer interval is set by providing a non-zero value in minutes for <BR>the SQLNET.EXPIRE_TIME parameter in the sqlnet.ora file. The Database and Listener<BR>need to be restarted after any DCD changes.</P>
<P>When the timer expires, SQL*Net on the server sends a "probe" packet to the <BR>client. (In the case of a database link, the destination of the link<BR>constitutes the server side of the connection.)&nbsp; The probe is essentially an <BR>empty SQL*Net packet and does not represent any form of SQL*Net level data, <BR>but it creates data traffic on the underlying protocol. <BR>&nbsp;<BR>If the client end of the connection is still active, the probe is discarded, <BR>and the timer mechanism is reset.&nbsp; If the client has terminated abnormally, <BR>the server will receive an error from the send call issued for the probe, and <BR>SQL*Net on the server will signal the operating system to release the <BR>connection's resources. <BR>&nbsp;<BR>On Unix servers, the sqlnet.ora file must be in either $TNS_ADMIN or <BR>$ORACLE_HOME/network/admin. Neither /etc nor /var/opt/oracle alone is valid. <BR>&nbsp;<BR>It should be also be noted that in SQL*Net 2.1.x, an active orphan process <BR>(one processing a query, for example) will not be killed until the query <BR>completes. In SQL*Net 2.2, orphaned resources will be released regardless of <BR>activity.</P>
<P>This is a server feature only.&nbsp; The client may be running any supported <BR>SQL*Net V2 release.<BR>&nbsp;<BR>&nbsp;<BR>THE FUNCTION OF THE PROTOCOL STACK <BR>---------------------------------- <BR>&nbsp;<BR>While Dead Connection Detection is set at the SQL*Net level, it relies heavily<BR>on the underlying protocol stack for it's successful execution. For example,<BR>you might set SQLNET.EXPIRE_TIME=1 in the sqlnet.ora file, but it is unlikely<BR>that an orphaned server process will be cleaned up immediately upon expiration<BR>of that interval. <BR>&nbsp;<BR>TCP/IP, for example, is a connection-oriented protocol, and as such, the <BR>protocol will implement some level of packet timeout and retransmission in an <BR>effort to guarantee the safe and sequenced order of data packets. If a timely <BR>acknowledgement is not received in response to the probe packet, the TCP/IP <BR>stack will retransmit the packet some number of times before timing out. After<BR>TCP/IP gives up, then SQL*Net receives notification that the probe failed.<BR>&nbsp;<BR>The time that it takes TCP/IP to timeout is dependent on the TCP/IP stack, and<BR>timeouts of many minutes are entirely common.&nbsp; This has been an area of concern<BR>for many customers, as many retransmissions at the protocol layer causes what<BR>could be a significant lag between the expiration of the DCD interval and the<BR>time when the orphaned process is actually killed. <BR>&nbsp;<BR>The easiest way to determine if the protocol stack is causing such a delay <BR>involves testing different DCD intervals.</P>
<P><BR>TESTING THE PROTOCOL STACK <BR>--------------------------<BR>Set the SQLNET.EXPIRE_TIME parameter to 1 minute and note the time required to<BR>clean up an orphaned server process.&nbsp; Then set SQLNET.EXPIRE_TIME to 5 minutes<BR>and again observe the time required to clean up the shadow. If the TCP/IP<BR>timeout is the reason the server resources do not get released, the time to<BR>clean up the shadow should increase by about 4 minutes.</P>
<P>If the TCP/IP retransmission timeout is indeed the problem, the Operating <BR>System kernel can be tuned to reduce the interval for and number of packet <BR>retransmissions (on many Unix platforms, the file <BR>/usr/include/netinet/tcp_timer.h contains the configuration parameters). <BR>&nbsp;<BR>Reducing the interval and number of retransmissions may impact other system <BR>components, since in effect you are shrinking the window allowed for<BR>connections to process data, possibly resulting in inadvertent loss of <BR>connections during periods of heavy system load.&nbsp; Slower connections from<BR>remote sites may be impacted by this change.<BR>&nbsp;<BR>Kernel parameters that may affect retransmission include but are not limited <BR>to TCP_TTL, TCPTV_PERSMIN, TCPTV_MAX, and TCP_LINGERTIME. <BR>&nbsp;<BR>*** To avoid disrupting other system processes, it is important to contact the <BR>appropriate vendor for assistance in tuning the operating system kernel or <BR>protocol stack. *** <BR>&nbsp;<BR>&nbsp;<BR>MONITORING DEAD CONNECTION DETECTION <BR>------------------------------------ <BR>The best way to determine if DCD is enabled and functioning properly is to <BR>generate a server trace and search the file for the DCD probe packet. To <BR>generate a server trace, set TRACE_LEVEL_SERVER=16 and <BR>TRACE_DIRECTORY_SERVER=&lt;path&gt; in sqlnet.ora on the server (note the location<BR>of the sqlnet.ora file).&nbsp; The resulting trace file will have a filename of<BR>svr_&lt;PID&gt;.trc and will be located in the specified directory. <BR>&nbsp;</P>
<P>Is DCD Enabled? <BR>--------------- <BR>For pre-Oracle8i versions, enable level 16 SQL*Net server tracing and search<BR>the resultant server trace file for an entry like the following: </P>
<P>&nbsp; osntns: Enabling dead connection detection (1 min) <BR>&nbsp;<BR>The timer interval listed should match the value of SQLNET.EXPIRE_TIME.</P>
<P>For Oracle8i onwards, you should see the following:</P>
<P>&nbsp; nstimini: entry <BR>&nbsp; nstimig: entry <BR>&nbsp; nstimig: normal exit <BR>&nbsp; nstimini: initializing NLTM in asynchronous mode <BR>&nbsp; nstimini: normal exit <BR>&nbsp; nstimstart: entry</P>
<P>&nbsp;<BR>Is DCD Working? <BR>---------------<BR>Search the server trace file for DCD probe packets. They will appear in the<BR>form of empty data packets, as follows: <BR>&nbsp;<BR>&nbsp; nstimexp: entry <BR>&nbsp; nstimexp: timer expired at 05-OCT-95 12:15:05 <BR>&nbsp; nsdo: entry <BR>&nbsp; nsdo: cid=0, opcode=67, *bl=0, *what=1, uflgs=0x2, cflgs=0x3 <BR>&nbsp; nsdo: nsctx: state=8, flg=0x621c, mvd=0 <BR>&nbsp; nsdo: gtn=93, gtc=93, ptn=10, ptc=2048 <BR>&nbsp; nsdoacts: entry <BR>&nbsp; nsdofls: entry <BR>&nbsp; nsdofls: DATA flags: 0x0 <BR>&nbsp; nsdofls: sending NSPTDA packet <BR>&nbsp; nspsend: entry <BR>&nbsp; nspsend: plen=10, type=6 <BR>&nbsp; nttwr: entry <BR>&nbsp; nttwr: socket 4 had bytes written=10 <BR>&nbsp; nttwr: exit <BR>&nbsp; nspsend: 10 bytes to transport <BR>&nbsp; nspsend:packet dump <BR>&nbsp; nspsend:00 0A 00 00 06 00 00 00&nbsp; |........| <BR>&nbsp; nspsend:00 00 00 00 00 00 00 00&nbsp; |........| <BR>&nbsp; nspsend: normal exit <BR>&nbsp; nsdofls: exit (0) <BR>&nbsp; nsdoacts: flushing transport <BR>&nbsp; nttctl: entry <BR>&nbsp; nsdoacts: normal exit <BR>&nbsp; nsdo: normal exit <BR>&nbsp; nstimexp: normal exit</P>
<P>The entry:</P>
<P>&nbsp; nspsend:00 0A 00 00 06 00 00 00&nbsp; |........| <BR>&nbsp; nspsend:00 00 00 00 00 00 00 00&nbsp; |........| <BR>&nbsp;<BR>represents the probe packet.&nbsp; Note that DCD packets are 10 bytes long when they<BR>are issued to the protocol stack. Once the protocol header and trailer bytes<BR>for the underlying protocols have been added, the packet could be approximately<BR>70 bytes long.<BR>&nbsp;<BR>If DCD is enabled, you will see these probe packets written to the trace file<BR>when the timer expires.&nbsp; If the server is a UNIX system, it might be useful to<BR>establish a connection and tail the trace file: </P>
<P>&nbsp; tail -f svr_&lt;PID&gt;.trc </P>
<P>The time elapsed after each probe packet is written to the server trace should <BR>match the SQLNET.EXPIRE_TIME value.</P>
<P>Note: from version 9.2.0.4.0 onwards, DCD probe packets are no longer traced in<BR>SQL*Net trace files, however DCD packets can be observed using other forms of<BR>tracing, such as network sniffer tracing.</P>
<P><BR>KNOWN PROBLEMS OR LIMITATIONS <BR>----------------------------- <BR>- Of the few reported problems, perhaps the most significant is DCD's poor <BR>performance on Windows NT.&nbsp; Dead connections are cleaned up only when the <BR>server is rebooted and the database is restarted.&nbsp; Exactly how well DCD works <BR>on NT depends on the client's protocol implementation. SQL*Net v2.3 has <BR>improved the performance over earlier releases. <BR>&nbsp;<BR>&nbsp; This has been logged as port-specific Bug#303578. </P>
<P>&nbsp;<BR>- On SCO Unix, a problem was reported in which server processes spin, consuming<BR>large amounts of CPU, once the DCD timer expires. The problem is due to improper<BR>signal handling and can be eliminated by disabling DCD.<BR>&nbsp;<BR>&nbsp; This is port-specific Bug#293264</P>
<P>- Orphaned resources are not released if only the client application is <BR>terminated. Only after the client PC has been rebooted does DCD release these <BR>resources. For example, if a Windows application is killed yet Windows remains<BR>running, the probe packet may be received and discarded as if the connection is<BR>still active.&nbsp; As it currently stands, it appears that DCD detects dead client<BR>machines, but not dead client processes.<BR>&nbsp;<BR>&nbsp; This is logged as generic Bug#280848. <BR>&nbsp;<BR>- The SQL*Net V2 implementation on MVS does not use the generic DCD mechanism,<BR>and therefore the SQLNET.EXPIRE_TIME parameter does not apply. The KEEPALIVE<BR>function of IBM's TCP/IP is used instead. This was implemented prior to<BR>development of DCD. <BR>&nbsp;<BR>&nbsp; This is documented in port-specific Bug#301318. <BR>&nbsp;<BR>- DCD relies heavily on issuing probe packets during any phase of the connection.<BR>This is not be possible with some protocols which run half-duplex. Hence, DCD is<BR>not enabled on protocols like APPC/LU6.2. <BR>&nbsp;<BR>&nbsp; This is not a bug, but is rather the intended design. <BR>&nbsp;<BR>- Local connections using BEQ protocol adapters are not supported with DCD.&nbsp; <BR>Local connections using the IPC protocol adapters are supported with DCD.</P>
<P>-BUG#1388806 : On Windows NT, DCD FAILS AFTER 16 CONNECTIONS</P>
<P>&nbsp;<BR>A FINAL NOTE...<BR>--------------<BR>On most OS'es (even more recent versions of Windows) if a process exits <BR>abnormally or is killed by an administrator, the OS will still gracefully <BR>clean up resources associated with that process including the network<BR>connection(s).&nbsp; It will tell the server on the other end that it is closing <BR>the network connection. DCD is still useful for times when there are problems <BR>with the physical network (e.g. ethernet cable falls off the machine) or if<BR>the OS kernel panics and crashes (e.g. blue screen of death) before it can<BR>close the network connections.&nbsp; It may have another side benefit with certain<BR>load balancing hardware, that may prematurely abort connections it thinks have<BR>been idle too long, by sending a dummy packet to the client periodically.<BR>&nbsp;<BR>Under no circumstances should you rely 100% on Dead Connection Detection.&nbsp; <BR>It was developed to handle clients that have abnormally exited. Clients should<BR>always exit their applications gracefully. It is the responsibility of the<BR>application developer to make this possible. DCD is intended only to clean up<BR>after abnormal events.<BR>&nbsp;<BR>DCD is much more resource-intensive than similar mechanisms at the protocol <BR>level, so if you depend on DCD to clean up all dead processes, that will put <BR>an undue load on the server. <BR>&nbsp;<BR>Clearly it is advantageous to exit applications cleanly in the first place.</P>
<P><BR>REFERENCES<BR>----------</P>
<P>Note:395505.1 How to Check if Dead Connection Detection (DCD) is Enabled in 9i and 10g<BR>Note:438923.1 How To Track Dead Connection Detection(DCD) Mechanism Without Enabling Any Client/Server Network Tracing<BR>References<BR>NOTE:395505.1 - How to Check if Dead Connection Detection (DCD) is Enabled in 9i ,10g and 11g<BR>NOTE:601605.1 - A discussion of Dead Connection Detection, Resource Limits, V$SESSION, V$PROCESS and OS processes<BR></P>

digdeep126 发表于 2013-04-28 23:04

good article
页: [1]
查看完整版本: Dead Connection Detection (DCD) Explained [ID 151972.1]