- 论坛徽章:
- 0
|
写了个awk查重复字段的脚本。目的是把两个文件中共有的字段找出来输出并且去除重复。代码函数段如下:
find_same()
{
echo "start at `date | awk '{print $5}'`"
echo -n "now rebuilding input files..."
awk '{count[$1]++}END{for(number in count)print number","count[number] }' $file1 | awk -F, '{print $1 > "find-final1.txt"}'
awk '{count[$1]++}END{for(number in count)print number","count[number] }' $file2 | awk -F, '{print $1 > "find-final2.txt"}'
cat find-final1.txt >> find-final2.txt
echo -ne "ok! \n analyze files..."
awk '{count[$1]++}END{for(number in count)print number","count[number] }' find-final2.txt | awk -F, '$2 > 1 {print $1 > "find-same.txt"}'
echo -ne "ok! \n output files..."
sort find-same.txt > same_$file3
echo -e "ok! \n output file is same_$file3"
rm -f find-*.txt
echo "end at `date | awk '{print $5}'`"
read anything
......
}
但是如果文件1的最后一行刚好在文件2里有的话。输出的结果却没有这一行,代码实现肯定没有问题,但是为什么遇到最后一行匹配时,这行就没法输出呢?实在不解。。
[ 本帖最后由 galford433 于 2007-12-5 17:07 编辑 ] |
|