论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2005-11-18 23:04 |只看该作者 |倒序浏览

1131886859.359 891 66.249.72.113 TCP_MISS/302 505 GET http://www.xxxx.com/cgi-bin/china/news/news/elec200504131120.html - DIRECT/210.145.118.133 text/html

是第一段吗？1131886859.359 看不懂这个。

文库|博客

linuxsky

稍有积蓄

论坛徽章:: 0

2楼 [报告]

发表于 2005-11-18 23:21 |只看该作者

1131886859.359 不知道要用啥命令可以换成能看得懂的时间。。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

tidezcy

家境小康

论坛徽章:: 0

3楼 [报告]

发表于 2005-11-19 09:06 |只看该作者

用sarg吧，不会天天去看log文件吧,辛苦~

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

dingjeff

家境小康

论坛徽章:: 0

4楼 [报告]

发表于 2005-11-19 15:52 |只看该作者

第一段是时间，是采用UTC的格式来记录的。
squid 的Faq提供了一个perl程序来转换

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

段誉

版主

论坛徽章:: 0

5楼 [报告]

发表于 2005-11-19 21:44 |只看该作者

The common log file format
The Common Logfile Format is used by numerous HTTP servers. This format consists of the following seven fields:

   remotehost rfc931 authuser [date] "method URL" status bytes

It is parsable by a variety of tools. The common format contains different information than the native log file format. The HTTP version is logged, which is not logged in native log file format.

The native log file format
The native format is different for different major versions of Squid. For Squid-1.0 it is:

      time elapsed remotehost code/status/peerstatus bytes method URL

For Squid-1.1, the information from the hierarchy.log was moved into access.log. The format is:

      time elapsed remotehost code/status bytes method URL rfc931 peerstatus/peerhost type

For Squid-2 the columns stay the same, though the content within may change a little.

The native log file format logs more and different information than the common log file format: the request duration, some timeout information, the next upstream server address, and the content type.

There exist tools, which convert one file format into the other. Please mind that even though the log formats share most information, both formats contain information which is not part of the other format, and thus this part of the information is lost when converting. Especially converting back and forth is not possible without loss.

squid2common.pl is a conversion utility, which converts any of the squid log file formats into the old CERN proxy style output. There exist tools to analyse, evaluate and graph results from that format.

access.log native format in detail
It is recommended though to use Squid's native log format due to its greater amount of information made available for later analysis. The print format line for native access.log entries looks like this:

"%9d.%03d %6d %s %s/%03d %d %s %s %s %s%s/%s %s"

Therefore, an access.log entry usually consists of (at least) 10 columns separated by one ore more spaces:

time
A Unix timestamp as UTC seconds with a millisecond resolution. You can convert Unix timestamps into something more human readable using this short perl script:

      #! /usr/bin/perl -p
      s/^d+.d+/localtime $&/e;

duration
The elapsed time considers how many milliseconds the transaction busied the cache. It differs in interpretation between TCP and UDP:

For HTTP/1.0, this is basically the time between accept() and close().
For persistent connections, this ought to be the time between scheduling the reply and finishing sending it.
For ICP, this is the time between scheduling a reply and actually sending it.

Please note that the entries are logged after the reply finished being sent, not during the lifetime of the transaction.

client address
The IP address of the requesting instance, the client IP address. The client_netmask configuration option can distort the clients for data protection reasons, but it makes analysis more difficult. Often it is better to use one of the log file anonymizers.

Also, the log_fqdn configuration option may log the fully qualified domain name of the client instead of the dotted quad. The use of that option is discouraged due to its performance impact.

result codes

This column is made up of two entries separated by a slash. This column encodes the transaction result:

The cache result of the request contains information on the kind of request, how it was satisfied, or in what way it failed. Please refer to section Squid result codes for valid symbolic result codes.
Several codes from older versions are no longer available, were renamed, or split. Especially the ERR_ codes do not seem to appear in the log file any more. Also refer to section Squid result codes for details on the codes no longer available in Squid-2.

The NOVM versions and Squid-2 also rely on the Unix buffer cache, thus you will see less TCP_MEM_HITs than with a Squid-1. Basically, the NOVM feature relies on read() to obtain an object, but due to the kernel buffer cache, no disk activity is needed. Only small objects (below 8KByte) are kept in Squid's part of main memory.

The status part contains the HTTP result codes with some Squid specific extensions. Squid uses a subset of the RFC defined error codes for HTTP. Refer to section status codes for details of the status codes recognized by a Squid-2.

bytes
The size is the amount of data delivered to the client. Mind that this does not constitute the net object size, as headers are also counted. Also, failed requests may deliver an error page, the size of which is also logged here.

request method
The request method to obtain an object. Please refer to section request-methods for available methods. If you turned off log_icp_queries in your configuration, you will not see (and thus unable to analyse) ICP exchanges. The PURGE method is only available, if you have an ACL for ``method purge'' enabled in your configuration file.

URL
This column contains the URL requested. Please note that the log file may contain whitespaces for the URI. The default configuration for uri_whitespace denies whitespaces, though.

rfc931
The eigth column may contain the ident lookups for the requesting client. Since ident lookups have performance impact, the default configuration turns ident_loookups off. If turned off, or no ident information is available, a ``-'' will be logged.

hierarchy code
The hierarchy information consists of three items:

Any hierarchy tag may be prefixed with TIMEOUT_, if the timeout occurs waiting for all ICP replies to return from the neighbours. The timeout is either dynamic, if the icp_query_timeout was not set, or the time configured there has run up.
A code that explains how the request was handled, e.g. by forwarding it to a peer, or going straight to the source. Refer to section hier-codes for details on hierarchy codes and removed hierarchy codes.
The IP address or hostname where the request (if a miss) was forwarded. For requests sent to origin servers, this is the origin server's IP address. For requests sent to a neighbor cache, this is the neighbor's hostname. NOTE: older versions of Squid would put the origin server hostname here.

type
The content type of the object as seen in the HTTP reply header. Please note that ICP exchanges usually don't have any content type, and thus are logged ``-''. Also, some weird replies have content types ``:'' or even empty ones.

There may be two more columns in the access.log, if the (debug) option log_mime_headers is enabled In this case, the HTTP request headers are logged between a ``['' and a ``]'', and the HTTP reply headers are also logged between ``['' and ``]''. All control characters like CR and LF are URL-escaped, but spaces are not escaped! Parsers should watch out for this.

----

需要转换时间格式的朋友，请把红色字部分的代码，复制到一个perl脚本中去，并给出x权限，就可以简单的这样使用了：
#cat access.log | timeconv.pl | more