分析日志可以为您提供有关您的组织如何使用电子邮件的有价值的信息。对日志进行解析可能是一项非常复杂的处理过程,并且解析结果的质量在很大程度上取决于日志的格式,以及在解析文件以获取相关信息的过程中所涉及的复杂程度。例如,如果您查看由 postfix mail transfer agent (MTA) 产生的日志,那么您可以看到,每一条消息的信息分布在多行内容中(请参见清单 1)。 清单 1. 从原始电子邮件日志中提取信息
Nov 17 03:17:34 narcissus postfix/smtpd[14281]: connect from localhost[127.0.0.1]
Nov 17 03:17:34 narcissus postfix/smtpd[14281]: 4F4CB1109404:
client=localhost[127.0.0.1]
Nov 17 03:17:34 narcissus postfix/cleanup[14278]: 4F4CB1109404:
message-id=
Nov 17 03:17:34 narcissus postfix/qmgr[104]: 4F4CB1109404:
from=, size=7632, nrcpt=1 (queue active)
Nov 17 03:17:34 narcissus postfix/smtpd[14281]: disconnect from localhost[127.0.0.1]
Nov 17 03:17:34 narcissus postfix/smtp[14279]: DBA5B11093FD:
to=, relay=127.0.0.1[127.0.0.1], delay=11,
status=sent (250 2.6.0 Ok, i
d=08640-07, from MTA([127.0.0.1]:10025): 250 Ok: queued as 4F4CB1109404)
Nov 17 03:17:34 narcissus postfix/qmgr[104]: DBA5B11093FD: removed
Nov 17 03:17:34 narcissus postfix/pipe[14283]: 4F4CB1109404:
to=, relay=cyrus, delay=0,
status=sent (gendarme.example.com)
Nov 17 03:17:34 narcissus postfix/qmgr[104]: 4F4CB1109404: removed
Nov 17 03:20:07 narcissus postfix/smtpd[14355]:
connect from narcissus.example.com[192.168.0.110]
Nov 17 03:20:07 narcissus postfix/smtpd[14355]:
disconnect from narcissus.example.com[192.168.0.110]
Nov 17 03:20:07 narcissus postfix/smtpd[14355]:
connect from narcissus.example.com[192.168.0.110]
Nov 17 03:20:07 narcissus postfix/smtpd[14355]:
disconnect from narcissus.example.com[192.168.0.110]
Nov 17 03:23:16 narcissus postfix/smtpd[14410]:
connect from f048226119.adsl.alicedsl.de[78.48.226.119]
Nov 17 03:23:17 narcissus postfix/smtpd[14410]: 6CAAE1109461:
client=f048226119.adsl.alicedsl.de[78.48.226.119]
Nov 17 03:23:17 narcissus postfix/cleanup[14411]: 6CAAE1109461:
message-id=
Nov 17 03:23:17 narcissus postfix/qmgr[104]: 6CAAE1109461:
from=, size=2051, nrcpt=1 (queue active)
Nov 17 03:23:18 narcissus postfix/smtpd[14410]:
disconnect from f048226119.adsl.alicedsl.de[78.48.226.119]
Nov 17 03:23:30 narcissus postfix/smtpd[14414]: connect from localhost[127.0.0.1]
Nov 17 03:23:30 narcissus postfix/smtpd[14414]:
62E941109473: client=localhost[127.0.0.1]
Nov 17 03:23:30 narcissus postfix/cleanup[14411]: 62E941109473:
message-id=
Nov 17 03:23:30 narcissus postfix/qmgr[104]: 62E941109473:
from=, size=3220, nrcpt=1 (queue active)
Nov 17 03:23:30 narcissus postfix/smtpd[14414]:
disconnect from localhost[127.0.0.1]
幸运的是,通过使用为每封电子邮件所给定的唯一引用编号,您可以在系统中标识对每个消息的处理过程。例如,清单 2 显示了引用编号为 4F4CB1109404 的一封电子邮件,您可以使用该编号获得相关的信息,以确定要提取哪些内容。 清单 2. 从单个邮件传输中提取相关信息
Nov 17 03:17:34 narcissus postfix/smtpd[14281]: 4F4CB1109404:
client=localhost[127.0.0.1]
Nov 17 03:17:34 narcissus postfix/cleanup[14278]: 4F4CB1109404:
message-id=
Nov 17 03:17:34 narcissus postfix/qmgr[104]: 4F4CB1109404:
from=, size=7632, nrcpt=1 (queue active)
Nov 17 03:17:34 narcissus postfix/pipe[14283]: 4F4CB1109404:
to=, relay=cyrus, delay=0, status=sent
(gendarme.example.com)
Nov 17 03:17:34 narcissus postfix/qmgr[104]: 4F4CB1109404: removed
从提取的信息中,您可以看到能够从日志中获取的不同信息类型,如:
日期和时间
发送者
接收者
消息大小
消息计数
通过对文件内容进行解析,您可以为这些不同元素的组合创建一些有意义的统计信息,以便更清楚地了解电子邮件的使用方式。
对日志文件进行解析以获取相关信息
要解析相关的内容,您需要标识出每封不同的电子邮件。要做到这一点是完全可能的,因为 MTA 为每封电子邮件嵌入了唯一的 ID,通常插入到日志文件输出中。例如,在下面一行内容中:
Nov 17 03:17:34 narcissus postfix/pipe[14283]:
4F4CB1109404: to=,
relay=cyrus, delay=0, status=sent (gendarme.example.com)
唯一的 ID 是十六进制值 4F4CB1109404,并且您还可以从这一行内容中确定接收者的地址。可以在包含相同的嵌入 ID 的另一行内容中找到这封电子邮件的发送者:
Nov 17 03:17:34 narcissus postfix/qmgr[104]:
4F4CB1109404: from=
, size=7632, nrcpt=1 (queue active)
这些信息不一定是顺序的,因为 MTA 可能同时处理多封电子邮件,并且在处理过程的不同部分完成时,会将信息写入到日志中。
从上面的一行内容中,您还可以看到电子邮件的总计大小(7632 字节)以及接收者的数目(一个)。
清单 3 显示了一个 Perl 脚本,该脚本可以对信息进行整理,然后输出电子邮件数量及其总计大小的汇总统计信息。 清单 3. 对日志进行解析以获得一些有用的统计信息
#!/usr/bin/perl
#
# Script to extract email statistics from log files
# Time::ParseDate will be used parse the time into an epoch
# value, and then DateTime can be used to reformat the date
# again
use Time::ParseDate;
use DateTime;
# Parse the first file on the command line
open(MAIL,$ARGV[0]) or die "Couldn't open $ARGV[0]: $!
";
# Create a structure to hold the stats
my $mails = {};
# Parse each line of the file
while()
{
chomp;
my $mailid = 0;
# Look for the 12 digit hex mail ID
if (m/: ([A-Z0-9]{12}):/)
{
$mailid = $1;
}
# Extract the date and parse it into an Epoch value
if (m/(S+ d+ d{2}:d{2}:d{2}) .*? $mailid/)
{
$mails->{$mailid}->{date} = parsedate($1);
}
# Extract the sender address and email size
if (m/$mailid: from=, size=(d+),/)
{
$mails->{$mailid}->{from} = $1;
$mails->{$mailid}->{size} = $2;
}
# Extract the recipient
if (m/$mailid: to=/)
{
$mails->{$mailid}->{to} = $1;
}
}
close(MAIL);
# Compile together the stats by parsing the formatted
# information into another summary structure
my $mailstats = {};
foreach my $mailid (keys %{$mails})
{
# Don't create a summary entry if we don't have enough information
# (sender/recipient is empty)
if (!defined($mails->{$mailid}->{to}) ||
!defined($mails->{$mailid}->{from}) ||
$mails->{$mailid}->{to} !~ m/[a-z]/ ||
$mails->{$mailid}->{from} !~ m/[a-z]/)
{
next;
}
# Count the number of emails to each recipient
$mailstats->{$mails->{$mailid}->{to}}->{count}++;
# Sum up the email size to each recipient
$mailstats->{$mails->{$mailid}->{to}}->{size} +=
$mails->{$mailid}->{size};
# Count the number of emails from each sender
$mailstats->{$mails->{$mailid}->{from}}->{count}++;
# Sum up the email size from each sender
$mailstats->{$mails->{$mailid}->{from}}->{size} +=
$mails->{$mailid}->{size};
# Sum up the same information, but organized on a date by date basis
if (defined($mails->{$mailid}->{date}))
{
my $dt = DateTime->from_epoch(
epoch => $mails->{$mailid}->{date})->ymd('');
my $mailto = $mails->{$mailid}->{to};
my $mailfrom = $mails->{$mailid}->{from};
$mailstats->{$mailto}->{_date}->{$dt}->{count}++;
$mailstats->{$mailto}->{_date}->{$dt}->{size} +=
$mails->{$mailid}->{size};
$mailstats->{$mailfrom}->{_date}->{$dt}->{count}++;
$mailstats->{$mailfrom}->{_date}->{$dt}->{size} +=
$mails->{$mailid}->{size};
}
}
# Dump out the information show mail counts and mail sizes
# on a mail address basis
foreach my $address (sort keys %{$mailstats})
{
# Only show information from email addresses that are
# local
if ($address =~ m/@.*example.com$/)
{
printf('%-40s %5d %9d',
$address,
$mailstats->{$address}->{count},
$mailstats->{$address}->{size});
print("
");
}
}
对示例日志文件运行该脚本,将产生一些相关的统计信息,从而显示指定域的邮件数量和大小(请参见清单 4)。您所打印出的实际信息要比您可能产生的信息少得多。可以对统计信息进行整理,以便输出每天的信息,这样做可以帮助显示每天的邮件数量,以使您可以更好地了解邮件服务器上的负载级别。 清单 4. 生成的统计信息
#! /usr/bin/perl
# Mail filter to file mail on a date basis
use Mail::IMAPClient;
use Date::Parse;
use Data::Dumper;
use strict;
use warnings;
# The IMAP Server
my $Server = 'imap.example.com';
# The Mailbox we want to filter
my $INBOX = "Sent-Mail";
# Open the server connection
my $IMAP = Mail::IMAPClient -> new (Server => $Server,
User => 'user',
Password => 'password',);
# Open the mailbox we want to filter
$IMAP->select($INBOX) or die "Couldn't select $INBOX";
# We want to filter every message, so obtain a list of every
# message by the message ID
my @msgids = $IMAP->search("ALL");
# Don't do anything if there's nothing to process
exit(0) if (scalar @msgids == 0);
# Now parse the message contents to determine
# the From, To, Subject and Address of each message
my $parsed = $IMAP->parse_headers(
$IMAP->Range(@msgids),
"From",
"To",
"CC",
"Subject",
"Date",
);
# Set up some message counters
my $toprocess = scalar @msgids;
my $processed = 0;
my $counter = 0;
# Process each message
foreach my $msgid (keys %{$parsed})
{
$processed++;
# Extract the date, and build a new folder path
# The new path will split up emails first by
# year and then by month, all as subfolders
# of the current folder
my ($ss,$mm,$hh,$day,$month,$year,$zone) =
strptime($parsed->{$msgid}->{Date}->[0]);
# Try another date if the first one couldn't be identified
if (!defined($year))
{
($ss,$mm,$hh,$day,$month,$year,$zone) =
strptime($parsed->{$msgid}->{Date}->[1]);
}
# default to 2004 if we can't find a year
if (!defined($year))
{
$year = 2004;
}
# Make some assumptions about the year
# Occasionally a date will contain only two digits
# So assume it's either in the year 2000, or 1990+
$year += 2000 if ($year 90) && ($year message_string($msgid);
# Try to change to the destination folder,
# or create it if we couldn't select the folder
my $selectstat = $IMAP->select($destfolder);
unless ($selectstat)
{
$IMAP->create($destfolder);
}
# Go back to the Inbox so that we select the right message
# next time round
$IMAP->select($INBOX);
# Add the original message to the new folder
my $AppendStatus = $IMAP -> append_string($destfolder,$Message);
# When you add a message to a folder, the message
# is marked as unread, so mark all the messages
# in the folder as read by reading them
$IMAP->select($destfolder);
my @unseenMIDs = $IMAP->unseen();
foreach my $MID (@unseenMIDs)
{
$IMAP->message_string($MID);
}
# Go back to the original folder, and delete the message
# if it was successfully moved
$IMAP->select($INBOX);
if ($AppendStatus)
{
$IMAP -> delete_message($msgid);
}
$counter++;
}
# Print out a summary of what we achieved
printf("Processed %5d out of %5d msgs
",$counter,$toprocess);
# Make sure we clean out the folder where we deleted messages
# and then disconnect
$IMAP->expunge();
$IMAP->disconnect();
这个脚本要求您提供该用户的登录名和密码,但是您可以很容易地对该脚本进行改写,以便从命令行中接受该信息,这样一来,用户就可以在需要的时候在命令行中执行这个脚本了。
对某个文件夹运行这个脚本,如 Sent Mail 文件夹,可以合理地组织相关信息,当您再次运行该脚本时可以看到这一点,如清单 8 中所示。 清单 8. 经过过滤的邮箱具有更合理的结构