- 论坛徽章:
- 0
|
回复 3# q1208c
00000000000000000000000000000000000000000000000000000000000000000----->这是一段空白序列
---------<data1>--------
起始 终了
chr1 4 8
chr1 11 12
chr1 17 19
chr1 22 31
chr1 39 46
.........
---------<data2>-----
chr1 6 7
chr1 15 23
chr1 29 34
chr1 38 49
chr1 51 53
00000000000000000000000000000000000000000000000000000000000000000
data1 00011111001100001110011111111110000000111111110000000000000000000
data2 00000110000000111111111000001111110001111111111110111000000000000
A 00011111001100111111111111111111110001111111111110111000000000000
B 00011221001100112221122111112221110001222222221110111000000000000
C 00000220000000002220022000002220000000222222220000000000000000000
将空白序列中标注出起始和终了的信息,A是我之前做的,只要有信息就记为1不考虑重复的问题。 B是现在需要的,需要计数,并且设置一个阈值假设上述例子中阈值为1,小于1的就不要了所以就得到序列C,我如何计数,并且如果这个数值大于10那么序列岂不是会变长?是不是就不能用substr去做了,谢谢。
while (<INFILE>) {
@a = split(" ");
if($a[0] =~ m/chr1$/) {$chr1start[$count1] = $a[1];$chr1end[$count1] = $a[2];$count1 +=1;}
}
my $wide=100000;
$dna1="0"x$chr1end[-1];
foreach (0..$count1-1){
$L1=$chr1end[$_]-$chr1start[$_]+1;
substr($dna1,$chr1start[$_],$L1)="1"x$L1;
}
foreach $win(0..int($chr1end[-1]/$wide)){
$D[$win]= substr($dna1,$wide*$win,$wide);
$C[$win]=($D[$win]=~ tr/1g//);
print OUTFILE1 $C[$win]/$wide*100;
print OUTFILE1 "\n";
我原来是这样写的
|
|