Chinaunix

标题: 读取一篇英文文章将所有的单词设定为键然后赋值为1创建哈希表 [打印本页]

作者: 大山里出来的孩子 时间: 2016-08-04 17:28
标题: 读取一篇英文文章将所有的单词设定为键然后赋值为1创建哈希表
刚开始的思路是：
将整个文件读取，然后按照空格切割后保存于数组中，然后遍历数组创建哈希表。但是如果文章很长，并且有多个文章的话，
先保存数组有点不太妥，效率太低，请问如何改进，使得当读入文件的时候不创建临时数组直接创建哈希表呢？
text_in:
The U.N. Food and Agriculture Organization says it has less than half the funding it needs to help ensure food security in parts of South Sudan.
.......
(太多先不贴出来了，假设文本很规范)

创建如下的哈希表%Words:
(
The => 1,
U.N. => 1,
Food => 1,
...
)

我之前的想法是：
my $content;

{
local $/= undef;
$content = <$IN1>;
close($IN1);
#print "$content\n";
}

my @words1 = split /\s/,$content;
my %Words1 = map{$_ => 1} @words1;

可不可以不用临时的数组呢，直接创建哈希表，那样会不会更快呢？

作者: sunzhiguolu 时间: 2016-08-04 18:34

perl -anle '{$h{$_}++ for @F}END{$,=",";print keys %h}' f

复制代码

作者: 104359176 时间: 2016-08-04 22:06
use local is easy to slurp all text to a string. not related with speed.

If you want more rapid, use array and uniq it.

作者: jason680 时间: 2016-08-04 23:20
本帖最后由 jason680 于 2016-08-05 11:02 编辑

回复 1# 大山里出来的孩子

$ perl words.pl text_in
the half ensure of needs has Sudan. Food Agriculture to funding less in help says Organization it South than U.N. food parts security and The

$ cat words.pl
use strict;
use warnings;

my %hWord;

while(<>){
chomp;
$hWord{$_}=1 for(split);
}
print join(" ",keys %hWord),"\n";

作者: 华小飞_Perl 时间: 2016-08-04 23:22
回复 2# sunzhiguolu

膜拜大神的单行代码！！！

作者: 大山里出来的孩子 时间: 2016-08-05 09:33
感谢大神，我试试看

回复 4# jason680

作者: 大山里出来的孩子 时间: 2016-08-05 09:35
大神还是用了数组啊

回复 2# sunzhiguolu

作者: sunzhiguolu 时间: 2016-08-05 12:32
有一些东西你没有看见, 但并不表示它不存在.

欢迎光临 Chinaunix (http://bbs.chinaunix.net/)