@如何删除重复的行?@
@如何删除重复的行?@
假如我们有一个文件file,然后想要删除该文件中重复的行,那么我们有哪些方法呢?
file文件的内容如下:
my friends, xiaoying
my teacher, xiaoniu
my teacher, xiaoniu
my fuqin, father
my sister, wushiying
my sister, wushiying
my friends, xiaoying
my teacher, xiaoniu
my fuqin, father
my sister, wushiying
my friends, xiaoying
my fuqin, father
方法一:awk '{if ($0!=line) print;line=$0}' file
也就是:
cat file |sort |awk '{if ($0!=line) print;line=$0}'【因为这个需要先排序,才能够用这样的方法~】
原理:
因为awk也是一次读入一行,line第一次为空【line 是 awk 的变量,像shell中的一样不需事先声明,没给它赋值前当然就是空的】
所以自然就不等于$0($0为"my friend,xiaoying"),所以就打印了;接着把line的值赋为$0;然后awk又读入一行,由于此时$0的值
与line相同(均为"my friend,xiaoying"),所以就不打印了。当读入"my teacher, liyong"时,$0与line(值为"my friend,xiaoying")又不
同了,所以打印出来,其余的以此类推。
方法二:【这个是最简单的~】
# cat file| sort | uniq
my friends, xiaoying
my fuqin, father
my sister, wushiying
my teacher, xiaoniu
方法三:
文件rmdup.sed的内容如下:
#n rmdup.sed - ReMove DUPlicate consecutive lines
# read next line into pattern space (if not the last line)
$!N
# check if pattern space consists of two identical lines
s/^\(.*\)\n\1$/&/
# if yes, goto label RmLn, which will remove the first line in pattern space
t RmLn
# if not, print the first line (and remove it)
P
# garbage handling which simply deletes the first line in the pattern space
: RmLn
D
# cat file|sort |sed -f rmdup.sed
my friends, xiaoying
my fuqin, father
my sister, wushiying
my teacher, xiaoniu
谢谢分享 本帖最后由 Shell_HAT 于 2011-12-28 15:14 编辑
cat file |sort |awk '{if ($0!=line) print;line=$0}'【因为这个需要先排序,才能够用这样的方法~】
awk初学者真是伤不起啊awk '!a[$0]++' urfile 方法二:【这个是最简单的~】
# cat file| sort | uniq
其实吧,这样更简单:sort -u urfile
页:
[1]