Chinaunix

标题: 如何删除大文件的末尾几行？（效率优先，文件有几十G） [打印本页]

作者: justlooks 时间: 2009-10-15 11:40
标题: 如何删除大文件的末尾几行？（效率优先，文件有几十G）
请问

[ 本帖最后由 justlooks 于 2009-10-15 12:26 编辑 ]

作者: sandermansxj 时间: 2009-10-15 11:42
# 删除文件中的最后一行
sed '$d'

# 删除文件中的最后两行
sed 'N;$!P;$!D;$d'

作者: sandermansxj 时间: 2009-10-15 11:42
# 删除文件中的最后10行
sed -e :a -e '$d;N;2,10ba' -e 'P;D' # 方法1
sed -n -e :a -e '1,10!{P;N;D;};N;ba' # 方法2

作者: lucash 时间: 2009-10-15 11:48
不显示最后的100行
head --lines=-100 log

作者: 寂寞烈火 时间: 2009-10-15 11:49

原帖由 sandermansxj 于 2009-10-15 11:42 发表
# 删除文件中的最后10行
sed -e :a -e '$d;N;2,10ba' -e 'P;D' # 方法1
sed -n -e :a -e '1,10!{P;N;D;};N;ba' # 方法2

经典~~

作者: justlooks 时间: 2009-10-15 12:26
要效率高的

作者: justlooks 时间: 2009-10-15 12:27
标题: 回复 #3 sandermansxj 的帖子
这样的做法对大型文件效率比较低

作者: r2007 时间: 2009-10-15 16:23
以前讨论过，搜搜吧。
方案一种是dd为主的组合命令，还有python和perl的方案。

作者: lucash 时间: 2009-10-15 16:38
标题: 回复 #6 justlooks 的帖子
我的不行？

作者: lucash 时间: 2009-10-15 16:38
以前有个用dd来的，快得不得了

作者: lucash 时间: 2009-10-15 16:45
找到了是这个

dd of=urfile seek=1 bs=$(($(stat -c%s urfile)-$(tail -1 urfile|wc -c)))

作者: flw 时间: 2009-10-15 17:04

原帖由 lucash 于 2009-10-15 16:45 发表
找到了是这个

dd of=urfile seek=1 bs=$(($(stat -c%s urfile)-$(tail -1 urfile|wc -c)))

其实这个不是效率最高的。

用 truncate 效率应该还可以更高（如果不想要备份的话）。

作者: sinic 时间: 2009-10-15 17:13
提示: 作者被禁止或删除内容自动屏蔽

作者: MYSQLER 时间: 2009-10-15 17:15

原帖由 flw 于 2009-10-15 17:04 发表

其实这个不是效率最高的。

用 truncate 效率应该还可以更高（如果不想要备份的话）。

这个怎么用？只知道sql有truncate

作者: haimming 时间: 2009-10-15 19:06
标题: 回复 #12 flw 的帖子
高手说说吧

作者: lucash 时间: 2009-10-15 19:23
标题: 回复 #12 flw 的帖子
大家等你的答案呢。不知道truncate怎么用阿。
不会跟数据库里的那个truncate一样吧

作者: cjaizss 时间: 2009-10-15 20:31
用C语言我可以写一个效率高的，但是用shell我不会写。
可以用truncate,关键在怎么确定最后几行，用C语言从后往前读。

作者: MYSQLER 时间: 2009-10-15 20:38

原帖由 cjaizss 于 2009-10-15 20:31 发表
用C语言我可以写一个效率高的，但是用shell我不会写。
可以用truncate,关键在怎么确定最后几行，用C语言从后往前读。

C的我也想看看，顺便向老大学习

作者: lucash 时间: 2009-10-15 20:38
标题: 回复 #17 cjaizss 的帖子
写一个嘛，让咱们留着哪天备用阿

作者: ruifox 时间: 2009-10-15 20:45

new=`tail -10 file|wc -c|awk '{print $1}'`

复制代码

##new_为最后10行所占的字节数

size=`l file|awk '{printf "%d\n",$5-'$new'}'`

复制代码

##size为整个文件去掉最后10行的大小

eho "dd if=file of=new bs=$size count=1"
eval "dd if=file of=file.new bs=$size count=1"

复制代码

##count=1表示只处理一个block，block的大小为size

作者: r2007 时间: 2009-10-15 20:51
#!/usr/bin/perl

use strict;
use warnings;

# For the SEEK_* constants
use Fcntl qw(:seek);
use File::ReadBackwards;

my $LINES = 10; # Change to 125_000 or whatever
my $File = shift; # file passed in as argument

my $rbw = File::ReadBackwards->new($File) or die $!;

# Count backwards $LINES or the beginning of the file is hit
my $line_count = 0;
until( $rbw->eof || $line_count == $LINES ) {
$rbw->readline;
$line_count++;
}

# Get the real filehandle out of the File::ReadBackwards
my $fh = $rbw->get_handle;

# Chop off everything from that point on.
truncate($File, $rbw->tell);

作者: Shell_HAT 时间: 2009-10-15 21:15
标题: 回复 #13 sinic 的帖子
这里：
http://bbs.chinaunix.net/viewthread.php?tid=1459945

作者: blackold 时间: 2009-10-15 22:10
已经讨论过好几次了。

作者: flw 时间: 2009-10-16 14:02

原帖由 r2007 于 2009-10-15 20:51 发表
#!/usr/bin/perl

use strict;
use warnings;

# For the SEEK_* constants
use Fcntl qw(:seek);
use File::ReadBackwards;

my $LINES = 10; # Change to 125_000 or whatever
my $File = shif ...

你这个太冗长了。

perl -e '$f = shift; truncate( $f => (-s $f) - shift )' 1.pl $(tail -10 1.pl|wc -c)

作者: flw 时间: 2009-10-16 14:08

原帖由 Shell_HAT 于 2009-10-15 21:15 发表
这里：
http://bbs.chinaunix.net/viewthread.php?tid=1459945

我错了！我错了！
还是 r2007 的办法好，
dd 没有 if 的时候等同于 truncate ……

败走……内牛满面……

作者: efhilt 时间: 2009-10-17 08:01
来个简单的

head -n -3

作者: r2007 时间: 2009-10-17 20:16

原帖由 efhilt 于 2009-10-17 08:01 发表
来个简单的

head -n -3

不考虑效率时是个常用的方案。

作者: brucewoo 时间: 2009-10-17 21:28
学习一下.

欢迎光临 Chinaunix (http://bbs.chinaunix.net/)