Chinaunix
标题:
如何将A文件根据B文件得到C文件
[打印本页]
作者:
wd_my
时间:
2019-02-15 15:30
标题:
如何将A文件根据B文件得到C文件
A文件:3列,第一列是序列ID,第二列是序列长度,第三列是序列注释信息
Chr01_9654239_9660689 645
GAG_copia
/GAG_tork/
AP_copia
/INT_tat/INT_17_6/INT_codi_d/INT_retrofit/
INT_copia
/INT_sire/INT_1731/INT_oryco/RT_hydra/RT_pCretro/
RT_copia
/RNaseH_codi_c /RNaseH_retrofit/RNaseH_oryco/
RNaseH_copia
/RNaseH_codi_d
Chr01_10815525_10825383 9859 RNaseH_gypsy/RNaseH_galadriel/RNaseH_gmr1/RNaseH_17_6/RNaseH_pyret/RNaseH_maggy/RNaseH_micropia_mdg3/RNaseH_v_clade/RNaseH_osvaldo/RNaseH_b_clade/RNaseH_TF/
RNaseH_a_clade/RNaseH_pyggy/INT_osvaldo/INT_cer2_3/INT_crm/INT_reina
Chr01_11277413_11279136 1724 RNaseH_tork/RNaseH_retrofit/RNaseH_oryco/RNaseH_sire/RNaseH_1731/RNaseH_copia/RNaseH_codi_d/RNaseH_codi_c
Chr01_13549455_13553995 4541 GAG_sire/
GAG_oryco
/
AP_oryco
/AP_copia/
INT_oryco
/INT_hydra/INT_copia/INT_1731/INT_retrofit/INT_tork/INT_sire/INT_csrn1/INT_pseudovirus/INT_pCretro/INT_codi_d/
RT_tork/
RT_oryco
/RT_pCretro/RT_1731/RT_hydra/RT_sire/RT_copia/RT_retrofit/RT_codi_d/RT_codi_c/RT_galea/RT_pseudovirus/RNaseH_copia/
RNaseH_oryco
/RNaseH_sire/RNaseH_retrofit/RNaseH_tork/RNaseH_1731/RNaseH_pCretro/RNaseH_codi_d/RNaseH_hydra
Chr01_13578874_13583841 4968
GAG_retrofit
/
AP_retrofit
/INT_tork/INT_pseudovirus/INT_1731/INT_oryco/INT_copia/
INT_retrofit
/RT_codi_c/RT_1731/RT_pseudovirus/RT_oryco/
RT_codi_d/RT_copia/RT_tork/
RT_retrofit
/RT_sire/RT_hydra/RT_pCretro/RNaseH_codi_I/RNaseH_pseudovirus/RNaseH_tork/RNaseH_codi_d/RNaseH_1731/RNaseH_hydra/RNaseH_sire/RNaseH_pCretro/RNaseH_oryco/
RNaseH_retrofit
/RNaseH_copia
Chr01_13708617_13713687 5071
GAG_reina
/
GAG
_del/
AP_galadriel/
AP_del
/
AP_reina
/
INT_del/
RT_alpharetroviridae/RT_betaretroviridae/RT_cavemovirus/RT_caulimovirus/RT_soymovirus/RT_tat/RT_gammaretroviridae/
RT_TF/RT_spumaretroviridae/RT_csrn1/RT_b_clade/RT_a_clade/RT_epsilonretroviridae/RT_galadriel/RT_v_clade/RT_gmr1/RT_gypsy/RT_osvaldo/RT_maggy/
RT_del
/RT_17_6/RT_412_mdg1/RT_micropia_mdg3/RT_crm/RT_pyggy/RT_athila/
RT_reina
/RT_pyret/RT_badnavirus/RNaseH_csrn1/RNaseH_b_clade/RNaseH_caulimovirus/RNaseH_maggy/
RNaseH_reina
/
RNaseH_del
/RNaseH_athila/RNaseH_crm/RNaseH_galadriel/RNaseH_osvaldo/RNaseH_pyggy/RNaseH_TF/RNaseH_pyret/RNaseH_v_clade/RNaseH_gmr1/RNaseH_a_clade/RNaseH_17_6/RNaseH_gypsy/RNaseH_micropia_mdg3/INT_csrn1/INT_b_clade/INT_athila/INT_epsilonretroviridae/INT_pyggy/INT_pyret/INT_maggy/INT_micropia_mdg3/INT_tat/INT_spumaretroviridae/INT_gmr1/INT_osvaldo/INT_cer2_3/INT_412_mdg1/INT_gammaretroviridae/INT_v_clade/INT_crm/INT_TF/
INT_reina
B文件
GAG/AP/RT/RNaseH/INT Gypsy
GAG/AP/INT/RT/RNaseH Copia
GAG/AP/RT/INT/RNaseH Bel
C文件
Chr01_9654239_9660689 645 Copia ###包含
GAG_copia/
AP_copia/
INT_copia/
RT_copia/
RNaseH_copia,满足
GAG/AP/INT/RT/RNaseH的顺序
Chr01_13549455_13553995 4541 Copia ###包含
GAG_
oryco
/
AP_
oryco
/
INT_
oryco
/
RT_
oryco
/
RNaseH_
oryco
,满足
GAG/AP/INT/RT/RNaseH的顺序
Chr01_13578874_13583841 4968 Copia ###包含
GAG_
retrofit
/
AP_
retrofit
/
INT_
retrofit
/
RT_
retrofit
/
RNaseH_
retrofit
,满足
GAG/AP/INT/RT/RNaseH的顺序
Chr01_13708617_13713687 5071 Gypsy ###包含
GAG_
reina
/
AP_
reina
/
RT_
reina
/
RNaseH_
reina
/
INT_
reina(
满足
GAG/AP/RT/RNaseH/INT的顺序
)
及
GAG_del/AP_del/INT_del/RT_del/RNaseH_del(
满足
GAG/AP/INT/RT/RNaseH的顺序),但是GAG_reina在前,即为Gypsy
请教:如何根据B文件将A文件处理得到C文件?
作者:
expert1
时间:
2019-02-15 17:14
要看明白这种生物学的shell问题,还真的比较难啊。
作者:
wd_my
时间:
2019-02-15 17:22
回复
2#
expert1
真的是这样
作者:
expert1
时间:
2019-02-16 11:47
不是题有多难,是理解你们这些人的意图比较难。
作者:
wd_my
时间:
2019-02-16 13:21
回复
4#
expert1
可能是我太着急了,没说明白
我的目的是将A文件根据B文件进行归类。A文件中每行的第三列是以/分隔符,多个xxx_xxx形式组成。条件是_之后的xxx一样,_之前xxx的满足B文件第一列的顺序,即属于B文件中第二列的类别。(可能A文件某行中第三列符合多行B文件第一列的顺序,取先符合条件的)
不管是否能解答,谢谢大神的回复了
欢迎光临 Chinaunix (http://bbs.chinaunix.net/)
Powered by Discuz! X3.2