- 论坛徽章:
- 0
|
The common log file format
The Common Logfile Format is used by numerous HTTP servers. This format consists of the following seven fields:
remotehost rfc931 authuser [date] "method URL" status bytes
It is parsable by a variety of tools. The common format contains different information than the native log file format. The HTTP version is logged, which is not logged in native log file format.
The native log file format
The native format is different for different major versions of Squid. For Squid-1.0 it is:
time elapsed remotehost code/status/peerstatus bytes method URL
For Squid-1.1, the information from the hierarchy.log was moved into access.log. The format is:
time elapsed remotehost code/status bytes method URL rfc931 peerstatus/peerhost type
For Squid-2 the columns stay the same, though the content within may change a little.
The native log file format logs more and different information than the common log file format: the request duration, some timeout information, the next upstream server address, and the content type.
There exist tools, which convert one file format into the other. Please mind that even though the log formats share most information, both formats contain information which is not part of the other format, and thus this part of the information is lost when converting. Especially converting back and forth is not possible without loss.
squid2common.pl is a conversion utility, which converts any of the squid log file formats into the old CERN proxy style output. There exist tools to analyse, evaluate and graph results from that format.
access.log native format in detail
It is recommended though to use Squid's native log format due to its greater amount of information made available for later analysis. The print format line for native access.log entries looks like this:
"%9d.%03d %6d %s %s/%03d %d %s %s %s %s%s/%s %s"
Therefore, an access.log entry usually consists of (at least) 10 columns separated by one ore more spaces:
time
A Unix timestamp as UTC seconds with a millisecond resolution. You can convert Unix timestamps into something more human readable using this short perl script:
#! /usr/bin/perl -p
s/^d+.d+/localtime $&/e;
duration
The elapsed time considers how many milliseconds the transaction busied the cache. It differs in interpretation between TCP and UDP:
For HTTP/1.0, this is basically the time between accept() and close().
For persistent connections, this ought to be the time between scheduling the reply and finishing sending it.
For ICP, this is the time between scheduling a reply and actually sending it.
Please note that the entries are logged after the reply finished being sent, not during the lifetime of the transaction.
client address
The IP address of the requesting instance, the client IP address. The client_netmask configuration option can distort the clients for data protection reasons, but it makes analysis more difficult. Often it is better to use one of the log file anonymizers.
Also, the log_fqdn configuration option may log the fully qualified domain name of the client instead of the dotted quad. The use of that option is discouraged due to its performance impact.
result codes
This column is made up of two entries separated by a slash. This column encodes the transaction result:
The cache result of the request contains information on the kind of request, how it was satisfied, or in what way it failed. Please refer to section Squid result codes for valid symbolic result codes.
Several codes from older versions are no longer available, were renamed, or split. Especially the ERR_ codes do not seem to appear in the log file any more. Also refer to section Squid result codes for details on the codes no longer available in Squid-2.
The NOVM versions and Squid-2 also rely on the Unix buffer cache, thus you will see less TCP_MEM_HITs than with a Squid-1. Basically, the NOVM feature relies on read() to obtain an object, but due to the kernel buffer cache, no disk activity is needed. Only small objects (below 8KByte) are kept in Squid's part of main memory.
The status part contains the HTTP result codes with some Squid specific extensions. Squid uses a subset of the RFC defined error codes for HTTP. Refer to section status codes for details of the status codes recognized by a Squid-2.
bytes
The size is the amount of data delivered to the client. Mind that this does not constitute the net object size, as headers are also counted. Also, failed requests may deliver an error page, the size of which is also logged here.
request method
The request method to obtain an object. Please refer to section request-methods for available methods. If you turned off log_icp_queries in your configuration, you will not see (and thus unable to analyse) ICP exchanges. The PURGE method is only available, if you have an ACL for ``method purge'' enabled in your configuration file.
URL
This column contains the URL requested. Please note that the log file may contain whitespaces for the URI. The default configuration for uri_whitespace denies whitespaces, though.
rfc931
The eigth column may contain the ident lookups for the requesting client. Since ident lookups have performance impact, the default configuration turns ident_loookups off. If turned off, or no ident information is available, a ``-'' will be logged.
hierarchy code
The hierarchy information consists of three items:
Any hierarchy tag may be prefixed with TIMEOUT_, if the timeout occurs waiting for all ICP replies to return from the neighbours. The timeout is either dynamic, if the icp_query_timeout was not set, or the time configured there has run up.
A code that explains how the request was handled, e.g. by forwarding it to a peer, or going straight to the source. Refer to section hier-codes for details on hierarchy codes and removed hierarchy codes.
The IP address or hostname where the request (if a miss) was forwarded. For requests sent to origin servers, this is the origin server's IP address. For requests sent to a neighbor cache, this is the neighbor's hostname. NOTE: older versions of Squid would put the origin server hostname here.
type
The content type of the object as seen in the HTTP reply header. Please note that ICP exchanges usually don't have any content type, and thus are logged ``-''. Also, some weird replies have content types ``:'' or even empty ones.
There may be two more columns in the access.log, if the (debug) option log_mime_headers is enabled In this case, the HTTP request headers are logged between a ``['' and a ``]'', and the HTTP reply headers are also logged between ``['' and ``]''. All control characters like CR and LF are URL-escaped, but spaces are not escaped! Parsers should watch out for this.
----
需要转换时间格式的朋友,请把红色字部分的代码,复制到一个perl脚本中去,并给出x权限,就可以简单的这样使用了:
#cat access.log | timeconv.pl | more |
|