Chinaunix

标题: perl 如何高效统计01字符串(二进制串)中1的个数？ [打印本页]

作者: gongyonghui2 时间: 2013-04-10 09:49
标题: perl 如何高效统计01字符串(二进制串)中1的个数？
example: $_="01010101011101";
那么计算出的1的个数应该是8个。

想到利用/(1)/g或split来求1的个数。是不是有其他方法在效率上要高？

作者: 不能超过15字 时间: 2013-04-10 10:49
/(1)/g 怎么实现？新手求指导。。。

作者: jason680 时间: 2013-04-10 11:00
回复 1# gongyonghui2

How about this

$sCnt++ while(/1/g);

作者: gongyonghui2 时间: 2013-04-10 11:03
本帖最后由 gongyonghui2 于 2013-04-10 11:04 编辑

回复 3# jason680

my @lis=/(1)/g;
scalar @lis; 和这个是一个效果，没有变化，如何能不是用正则计算出有多少个1呢

作者: wenhq 时间: 2013-04-10 11:12
s/1//&&c++; print c;
也的用正则啊。。

作者: jason680 时间: 2013-04-10 11:15
本帖最后由 jason680 于 2013-04-10 11:17 编辑

回复 4# gongyonghui2

you need performance  ==>是不是有其他方法在效率上要高？

but they are different on performance
  $sCnt++ while(/1/g);

  my @lis=/(1)/g;
scalar @lis;

-------------------------
如何能不是用正则计算出有多少个1呢?

  substr, index

作者: jason680 时间: 2013-04-10 11:30
本帖最后由 jason680 于 2013-04-10 11:31 编辑

回复 4# gongyonghui2

get the same answer ...

# time perl -le '$_="010111" x 1000000;$sCnt++ while(/1/g);print $sCnt'
4000000

real 0m0.763s
user 0m0.747s
sys 0m0.016s

# time perl -le '$_="010111" x 1000000;@lis=/(1)/g;print scalar @lis'
4000000

real 0m5.726s
user 0m4.363s
sys 0m1.166s

作者: sammyjeep 时间: 2013-04-10 11:36
回复 7# jason680

把0替换掉取字符串长度会不会快一点，

作者: jason680 时间: 2013-04-10 11:39
回复 8# sammyjeep

how about your trying ...

作者: sammyjeep 时间: 2013-04-10 11:42
回复 9# jason680

有一点差别吧，不大。
time perl -le '$_="010111" x 1000000;$sCnt++ while(/1/g);print $sCnt'
4000000
0.756u 0.012s 0:00.76 100.0% 0+0k 0+0io 0pf+0w

time perl -le '$_="010111" x 1000000;@lis=/(1)/g;print scalar @lis'
4000000
7.388u 0.764s 0:08.15 99.8% 0+0k 0+0io 0pf+0w

time perl -le '$_="010111" x 1000000;s/0//g;print length'
4000000
0.468u 0.016s 0:00.48 97.9% 0+0k 0+0io 0pf+0w

作者: jason680 时间: 2013-04-10 11:51
回复 10# sammyjeep

different data and get different result

# time perl -le '$_="010000" x 10000000;$sCnt++ while(/1/g);print $sCnt'
10000000
real 0m1.994s

# time perl -le '$_="010000" x 10000000;s/0//g;print length'
10000000

real 0m20.301s

# time perl -le '$_="101111" x 10000000;$sCnt++ while(/1/g);print $sCnt'
50000000

real 0m9.172s

# time perl -le '$_="101111" x 10000000;s/0//g;print length'
50000000

real 0m4.382s

作者: xfoucs 时间: 2013-04-10 12:54
回复 5# wenhq

可能比正则的效率稍微高点

#!/usr/bin/env perl
use strict;
use warnings;
$_="010111"x1000000;
my $count=0;
while($_)
{
$count++ if chop;
}
print $count;

复制代码

##################
real 0m0.576s
user 0m0.480s
sys 0m0.010s

作者: 不能超过15字 时间: 2013-04-10 14:30

time perl -le '$_="010111" x 1000000;@list=split /0/,$_;$len=scalar @list;print $len'

复制代码

real 0m1.279s

作者: Perlvim 时间: 2013-04-10 16:35

use 5.014;
use Benchmark qw(timethese);
timethese(100000, {
'tr' => sub {
my $str = "10101010101010";
my $times = $str=~ tr/1/0/;
return $times;
},
'replace' => sub {
my $str = "10101010101010";
my $times = $str =~ s/1/0/g;
return $times;
},
}
);

复制代码

Benchmark: timing 1000000 iterations of replace, tr...
replace: 10 wallclock secs (10.90 usr + 0.00 sys = 10.90 CPU) @ 91776.80/s (n=1000000)
tr: 1 wallclock secs ( 1.13 usr + 0.00 sys = 1.13 CPU) @ 883392.23/s (n=1000000)

复制代码

作者: gongyonghui2 时间: 2013-04-10 16:57
回复 7# jason680

多谢校正

作者: gongyonghui2 时间: 2013-04-10 17:00
回复 14# Perlvim

多谢采用你的建议了

作者: 黑色阳光_cu 时间: 2013-04-10 17:09

$_= "01010101011101";
print unpack("%B*", pack("B*", $_)), "\n";

复制代码

作者: gongyonghui2 时间: 2013-04-10 19:34
回复 13# 不能超过15字

以0 split的是不对的，如01111001产生的元素为('',1111,1),那么长度为三。

作者: 不能超过15字 时间: 2013-04-10 22:33
回复 18# gongyonghui2
恩，是呢。。谢谢//学习了。。。

作者: rubyish 时间: 2013-04-11 08:08

perl -le '$_="010111" x 1000000;tr/0//d;print length'

复制代码

作者: 大米白面 时间: 2013-04-11 15:09
统计一个字节内的值为1的2进制位数的代码：
perl -E 'say unpack("%b8",pack("C",255))'

作者: 大米白面 时间: 2013-04-11 15:33
针对楼主的问题，效率最高的做法是：
perl -E 'say unpack("%B14",pack("B14","01010101011101"))'

作者: gongyonghui2 时间: 2013-04-11 23:19
总结了下大家的建议，pack效率虽然不错，但是还是没能超过tr，测试代码及结果如下：
use 5.014;

use Benchmark qw(timethese);

timethese(-2, {
'tr' => sub {
   my $str = "10101010101010"x10;
   my $times = $str=~ tr/1/0/;
   return $times;
},
'replace' => sub {
   my $str = "10101010101010"x10;
   my $times = $str =~ s/1/0/g;
   return $times;
},
'while'=>sub{
my $str = "10101010101010"x10;
   my $times =0;
   $times++ while $str =~/1/g;
   return $times;
},

'chop'=>sub{
my $str = "10101010101010"x10;
   my $times =0;
   while($str)
{
$times++ if chop $str;
}
   return $times;
},
   'pack'=>sub{
my $str = "10101010101010"x10;
   return unpack("%B*",pack("B*",$str));
},

  }
);

##结果
Benchmark: running chop, pack, replace, tr, while for at least 2 CPU seconds...
   chop:  2 wallclock secs ( 2.12 usr +  0.00 sys =  2.12 CPU) @ 66080.15/s (n=140156)
   pack:  3 wallclock secs ( 2.09 usr +  0.00 sys =  2.09 CPU) @ 757637.32/s (n=1583462)
replace:  2 wallclock secs ( 2.15 usr +  0.00 sys =  2.15 CPU) @ 34983.74/s (n=75320)
      tr:  2 wallclock secs ( 2.17 usr +  0.00 sys =  2.17 CPU) @ 1224101.01/s (n=2653851)
   while:  2 wallclock secs ( 2.17 usr +  0.00 sys =  2.17 CPU) @ 86977.40/s (n=188567)

欢迎光临 Chinaunix (http://bbs.chinaunix.net/)