- 论坛徽章:
- 0
|
有两个文件,文件1格式如下:前面的字母开头这一行表示一个id,下面的是该id号对应的需要处理的数字
NP_415088.1-1
4
11
44
46
72
134
NP_415089.1-1
31
74
83
NP_415560.1-1
4
6
45
68
92
113
137
NP_415561.1-1
14
72
75
77
85
87
NP_415562.1-1
6
30
51
53
71
72
81
84
97
98
文件2:
BLASTP 2.2.29+
Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.
Reference for composition-based statistics: Alejandro A. Schaffer,
L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri
I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
"Improving the accuracy of PSI-BLAST protein database searches with
composition-based statistics and other refinements", Nucleic Acids
Res. 29:2994-3005.
Database: /public/home/mgb226/alook/Find_AA_STOP/re_AA_STOP/readthrough/all.fa
sta
185,250 sequences; 57,471,956 total letters
Query= NP_415088.1-1
Length=153
Score E
Sequences producing significant alignments: (Bits) Value
lcl|NC_000913.3_prot_YP_588440.1_550 [gene=rzoD] [protein=DLP12 ... 122 1e-35
lcl|CP011323.1_prot_SG47_0559_549 [gene=rzoD] [protein=DLP12 pro... 122 1e-35
lcl|CP011322.1_prot_SG46_0559_548 [gene=rzoD] [protein=DLP12 pro... 122 1e-35
lcl|CP006698.1_prot_N840_0565_549 [gene=rzoD] [protein=DLP12 pro... 122 1e-35
lcl|NC_002695.1_prot_NP_309651.1_1555 [gene=ECs1624] [protein=li... 118 6e-34
lcl|NC_000913.3_prot_YP_588452.1_1351 [gene=rzoR] [protein=Rac p... 115 9e-33
lcl|CP011323.1_prot_SG47_1388_1349 [gene=rzoR] [protein=Rac prop... 115 9e-33
lcl|CP011322.1_prot_SG46_1388_1348 [gene=rzoR] [protein=Rac prop... 115 9e-33
lcl|CP006698.1_prot_N840_1389_1364 [gene=rzoR] [protein=Rac prop... 115 9e-33
lcl|CP013029.1_prot_AKK22_02365_443 [gene=AKK22_02365] [protein=... 84.0 9e-21
>lcl|NC_000913.3_prot_YP_588440.1_550 [gene=rzoD] [protein=DLP12 prophage; putative lipoprotein] [protein_id=YP_588440.1]
[location=578327..578509]
Length=60
Score = 122 bits (307), Expect = 1e-35, Method: Compositional matrix adjust.
Identities = 60/60 (100%), Positives = 60/60 (100%), Gaps = 0/60 (0%)
Query 74 MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPRPPAWIMQPPPDWQTPLNGIISPSERG 133
MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPRPPAWIMQPPPDWQTPLNGIISPSERG
Sbjct 1 MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPRPPAWIMQPPPDWQTPLNGIISPSERG 60
>lcl|CP011322.1_prot_SG46_0559_548 [gene=rzoD] [protein=DLP12 prophage, putative lipoprotein] [protein_id=AKF62758.1]
[location=572012..572194]
Length=60
Score = 122 bits (307), Expect = 1e-35, Method: Compositional matrix adjust.
Identities = 60/60 (100%), Positives = 60/60 (100%), Gaps = 0/60 (0%)
Query 74 MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPRPPAWIMQPPPDWQTPLNGIISPSERG 133
MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPRPPAWIMQPPPDWQTPLNGIISPSERG
Sbjct 1 MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPRPPAWIMQPPPDWQTPLNGIISPSERG 60
>lcl|CP006698.1_prot_N840_0565_549 [gene=rzoD] [protein=DLP12 prophage; predicted lipoprotein] [protein_id=AGX32695.1]
[location=577376..577558]
Length=60
Score = 122 bits (307), Expect = 1e-35, Method: Compositional matrix adjust.
Identities = 60/60 (100%), Positives = 60/60 (100%), Gaps = 0/60 (0%)
Query 74 MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPRPPAWIMQPPPDWQTPLNGIISPSERG 133
MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPRPPAWIMQPPPDWQTPLNGIISPSERG
Sbjct 1 MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPRPPAWIMQPPPDWQTPLNGIISPSERG 60
>lcl|NC_000913.3_prot_YP_588452.1_1351 [gene=rzoR] [protein=Rac prophage; putative lipoprotein] [protein_id=YP_588452.1]
[location=1423400..1423585]
Length=61
Score = 115 bits (287), Expect = 9e-33, Method: Compositional matrix adjust.
Identities = 57/61 (93%), Positives = 57/61 (93%), Gaps = 0/61 (0%)
Query 74 MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPRPPAWIMQPPPDWQTPLNGIISPSERG 133
MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPP PPAWIMQPPPDWQTPLNGIISPS
Sbjct 1 MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPPPPAWIMQPPPDWQTPLNGIISPSGND 60
Query 134 W 134
W
Sbjct 61 W 61
>lcl|CP011323.1_prot_SG47_1388_1349 [gene=rzoR] [protein=Rac prophage, putative lipoprotein] [protein_id=AKF67672.1]
[location=1415887..1416072]
Length=61
Score = 115 bits (287), Expect = 9e-33, Method: Compositional matrix adjust.
Identities = 57/61 (85%), Positives = 57/61 (93%), Gaps = 0/61 (0%)
Query 74 MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPRPPAWIMQPPPDWQTPLNGIISPSERG 133
MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPP PPAWIMQPPPDWQTPLNGIISPS
Sbjct 1 MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPPPPAWIMQPPPDWQTPLNGIISPSGND 60
Query 134 W 134
W
Sbjct 61 W 61
文件1和文件2如上面所示,具体文件如下:
实验数据.zip
(238.23 KB, 下载次数: 3)
首先用文件1的id去匹配文件2中的内容,如NP_415088.1-1匹配到文件2中的NP_415088.1-1,
此时则用文件1下面的数字去匹配文件2下面的Query行序号,其中如:
Query 74 MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPRPPAWIMQPPPDWQTPLNGIISPSERG 133
MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPP PPAWIMQPPPDWQTPLNGIISPS
Sbjct 1 MRKLKMMLCVMMLPLVVVGCTSKQSVSQCVKPPPPPAWIMQPPPDWQTPLNGIISPSGND 60
Query 134 W 134
W
Sbjct 61 W 61
这里的74便表示第一个M的序号是74,该行到最后的G其序号是133,每一个字母分别对应一个序号(如M下来的R为75,以此类推),文件1中id对应下面的数字
NP_415088.1-1
4
11
44
46
72
134
通过比较发现只有134对应了字母即标注的W,则此时输出该W对应的Sbjct 行的W,上面标注的 Identities需要进行过滤,
即只输出 Identities后面 (93%)大于90%的内容,若小于90%则可以将结果忽略
这里需要计算出W的个数此处是1则:上面的例子输出应该是
NP_415088.1-1 W:1
|
|