- 论坛徽章:
- 1
|
假如这个路径为:/TJPROJ1/XJ/WORK/auto_demultiplexed/01.HiseqX/161230_ST-E00251A
此目录下有8个子目录分别为:
161230_ST-E00251_0273_AHF3GHALXX.0
161230_ST-E00251_0273_AHF3GHALXX.1
161230_ST-E00251_0273_AHF3GHALXX.2
161230_ST-E00251_0273_AHF3GHALXX.3
161230_ST-E00251_0273_AHF3GHALXX.4
161230_ST-E00251_0273_AHF3GHALXX.5
161230_ST-E00251_0273_AHF3GHALXX.6
161230_ST-E00251_0273_AHF3GHALXX.7
这8个目录下每个目录下都有RunInfo.xml和SampleSheet.csv这两个文件,在shell下我是用 sed -i s/"159"/"157"/g RunInfo.xml 去修改 RUNinfo.xml;用 sed -ri '\''s/,([ATCG]{6})../,\1/g'\'' SampleSheet.csv去修改 SampleSheet.csv
RUNinfo.xml内容如下:
<?xml version="1.0"?>
<RunInfo xmlns sd="http://www.w3.org/2001/XMLSchema" xmlns si="http://www.w3.org/2001/XMLSchema-instance" Version="3">
<Run Id="161230_ST-E00251_0273_AHF3GHALXX" Number="273">
<Flowcell>HF3GHALXX</Flowcell>
<Instrument>ST-E00251</Instrument>
<Date>161230</Date>
<Reads>
<Read FirstCycle="1" LastCycle="150" IsIndexedRead="N" />
<Read FirstCycle="152" LastCycle="159" IsIndexedRead="Y" />
<Read FirstCycle="160" LastCycle="309" IsIndexedRead="N" />
SampleSheet.csv的内容如下:
[Header],,,,
Investigator Name,Jason,,,
Project Name,Novogene,,,
Experiment Name,X_test,,,
Date,161230,,,
Workflow,GenerateFASTQ,,,
[Data],,,,
SampleID,SampleName,index,index2
DHG10670,DHG10670,TCCGTCTA,,
DHG10757,DHG10757,AGAGTCAA,,
DHG10760,DHG10760,AGTCACTA,,
DHG10672,DHG10672,TGAAGAGA,,
DHG10673,DHG10673,TGGAACAA,,
DHG10674,DHG10674,TGGCTTCA,,
DHG10675,DHG10675,TGGTGGTA,,
DHG10676,DHG10676,TTCACGCA,,
DHG10677,DHG10677,AACTCACC,,
DHG10748,DHG10748,AGTACAAG,,
DHG10756,DHG10756,ACTATGCA,,
如果SampleSheet.csv中标颜色的地方为一行,就把后面标红色字的地方给删了,如果不是我就挑几个目录进去把这两个文件用上面的正则修改了,如果我挑中161230_ST-E00251_0273_AHF3GHALXX.0和161230_ST-E00251_0273_AHF3GHALXX.1那么就只进入这两个目录下修改 RunInfo.xml和SampleSheet.csv,请问这应该怎么做,先在这谢谢各位大神了!
|
|