- 论坛徽章:
- 0
|
test.zip
(5.03 KB, 下载次数: 7)
文件1.txt
Entrez_Gene_Id Tumor_Sample_Barcode
182 TCGA-02-0043-01A-01W
220 TCGA-02-0089-01A-01W
220 TCGA-02-0028-01A-01W
286 TCGA-02-0083-01A-01W
286 TCGA-02-0028-01A-01W
287 TCGA-02-0015-01A-01W-0318-08
287 TCGA-08-0354-01A-01W-0318-08
287 TCGA-02-0064-01A-01W-0206-08
301 TCGA-02-0028-01A-01W
310 TCGA-02-0083-01A-01W-0206-08
324 TCGA-02-0015-01A-01W-0318-08
472 TCGA-02-0010-01A-01W-0189-08
472 TCGA-02-0114-01A-01W-0206-08
473 TCGA-02-0114-01A-01W
477 TCGA-02-0083-01A-01W
529 TCGA-02-0083-01A-01W-0206-08
——————————————————————————————————
文件2.txt
Term Genes
hsa05200athways in cancer 1436, 6469, 5579, 6772, 1956, 5578, 2033, 862, 3082, 1029, 2335, 867, 7472, 3320, 2956, 7048, 4233, 5159, 5601, 2064, 5728, 675, 2735, 5156, 1441, 3815, 324, 2322, 4193, 5925, 2737, 5727, 7157, 2260, 5290, 4089, 5294, 5295, 999
hsa04510:Focal adhesion 5579, 1956, 5578, 3082, 2335, 1289, 4233, 5159, 5601, 57144, 2064, 1281, 5728, 5156, 3371, 2321, 85366, 9564, 3791, 5170, 3690, 1301, 3611, 1793, 5290, 1277, 2909, 5294, 5295, 7057
hsa05218:Melanoma 5728, 5156, 1956, 3082, 1029, 4193, 5925, 7157, 2260, 5290, 5294, 5295, 4233, 5159, 999
hsa05213:Endometrial cancer 5290, 2064, 5728, 1956, 2309, 5170, 324, 3611, 5294, 5295, 7157, 999
hsa05214:Glioma 5728, 5579, 5156, 1956, 5578, 1029, 4193, 5925, 7157, 5290, 5294, 5295, 5159
hsa05215rostate cancer 5728, 2064, 5156, 1956, 2033, 5170, 4193, 5925, 7157, 2260, 5290, 3320, 5294, 5295, 5159
hsa05223:Non-small cell lung cancer 5290, 2064, 5579, 5578, 1956, 2309, 1029, 5170, 5925, 5294, 5295, 7157
hsa05212ancreatic cancer 5601, 2064, 675, 6772, 1956, 1029, 5925, 7157, 5290, 4089, 7048, 5294, 5295
hsa05210:Colorectal cancer 5601, 5156, 1956, 324, 7157, 5290, 4089, 2956, 7048, 5294, 5295, 4233, 5159
hsa04070hosphatidylinositol signaling system 5290, 5728, 5579, 5578, 3710, 79837, 5287, 8527, 5288, 5294, 5286, 5295
————————————————————————————————————————————————————————————————————
目的:
Term Genes
hsa05200athways in cancer TCGA-02-0084-01A-01W TCGA-08-0390-01A-01W TCGA-06-0185-01A-01W-0254-08 TCGA-06-0185-01A-01W TCGA-02-0010-01A-01W-0189-08 TCGA-02-0083-01A-01W-0206-08 TCGA-02-0028-01A-01W TCGA-02-0083-01A-01W-0206-08 TCGA-08-0390-01A-01W TCGA-02-0114-01A-01W TCGA-02-0114-01A-01W-0206-08 TCGA-02-0083-01A-01W-0206-08 TCGA-02-0099-01A-01W TCGA-02-0083-01A-01W TCGA-02-0043-01A-01W TCGA-02-0014-01A-01W-0189-08 TCGA-02-0010-01A-01W TCGA-02-0083-01A-01W TCGA-02-0014-01A-01W-0189-08 TCGA-02-0113-01A-01W TCGA-06-0213-01A-01W-0254-08 TCGA-02-0014-01A-01W-0189-08 TCGA-06-0133-01A-02W-0224-08 TCGA-08-0347-01A-01W-0318-08 TCGA-02-0014-01A-01W-0189-08 TCGA-02-0083-01A-01W-0206-08 TCGA-02-0015-01A-01W-0318-08 TCGA-06-0125-01A-01W TCGA-02-0085-01A-01W-0206-08 TCGA-06-0206-01A-01W-0254-08 TCGA-02-0083-01A-01W TCGA-06-0125-01A-01W TCGA-06-0188-01A-01W TCGA-08-0245-01A-01W-0318-08 TCGA-06-0201-01A-01W TCGA-02-0028-01A-01W-0189-08 TCGA-02-0114-01A-01W TCGA-02-0048-01A-01W TCGA-06-0210-01A-01W
hsa05218:Melanoma TCGA-06-0213-01A-01W-0254-08 TCGA-08-0347-01A-01W-0318-08 TCGA-02-0010-01A-01W-0189-08 TCGA-08-0390-01A-01W TCGA-02-0114-01A-01W TCGA-02-0085-01A-01W-0206-08 TCGA-06-0206-01A-01W-0254-08 TCGA-06-0188-01A-01W TCGA-08-0245-01A-01W-0318-08 TCGA-06-0201-01A-01W TCGA-02-0114-01A-01W TCGA-02-0048-01A-01W TCGA-02-0010-01A-01W TCGA-02-0083-01A-01W TCGA-06-0210-01A-01W
我写的代码如下- #!usr/bin/perl
- use strict;
- use warnings;
- open (IN1, $ARGV[0]) or die $!;
- open (IN2, $ARGV[1]) or die $!;
- open (OUT, ">$ARGV[2]") or die $!;
- my %h;
- while(<IN1>){
- chomp;
- my @m=split(/\t/,$_);
- $h{$m[0]}=$m[1];
- }
- while(<IN2>)
- {
- chomp;
- my @a=split(/\t/,$_);
- my @m=split(", ",$a[1]);
- for (0..$#m){
- $m[$_]=$h{$m[$_]} if exists($h{$m[$_]});
- }
- print OUT "$a[0]\t@m\n";
- }
复制代码 ————————————————————————————————
错误:
1. 文件1.txt中,第一列是有重复的,结果中只有对应的一个。
例如:286 TCGA-02-0083-01A-01W
286 TCGA-02-0028-01A-01W
而结果只是对应了第一个,第二个没有展示。
2. 结果中有重复的
例如:hsa05200athways in cancer中出现了两次:
TCGA-02-0014-01A-01W-0189-08
TCGA-02-0014-01A-01W-0189-08
请教,
这个perl应该怎么写?
|
|