Chinaunix
标题:
问个匹配后计算的问题
[打印本页]
作者:
huang6894
时间:
2014-01-26 13:16
标题:
问个匹配后计算的问题
对于:
==============================================================
qname ngb.q
hostname compute-64-13.local
group ST_PG
owner zhouyang
project ngb_un
department defaultdepartment
jobname blast_4species_92.pep.sh
jobnumber 1745897
taskid undefined
account sge
priority 0
qsub_time Thu Jan 16 10:25:40 2014
start_time Fri Jan 17 08:00:56 2014
end_time Fri Jan 17 09:14:03 2014
granted_pe NONE
slots 1
failed 0
exit_status 0
ru_wallclock 4387
ru_utime 5517.813
ru_stime 11.114
ru_maxrss 118524
ru_ixrss 0
ru_ismrss 0
ru_idrss 0
ru_isrss 0
ru_minflt 1514322
ru_majflt 0
ru_nswap 0
ru_inblock 0
ru_oublock 0
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 2739
ru_nivcsw 82069
cpu 5528.927
mem 1297.879
io 0.035
iow 0.000
maxvmem 3.426G
arid undefined
==============================================================
qname ngb.q
hostname compute-21-0.local
group ST_PG
owner zhouyang
project ngb_un
department defaultdepartment
jobname blast_4species_38.pep.sh
jobnumber 1745910
taskid undefined
account sge
priority 0
qsub_time Thu Jan 16 10:25:48 2014
start_time Fri Jan 17 08:10:24 2014
end_time Fri Jan 17 09:14:05 2014
granted_pe NONE
slots 1
failed 0
exit_status 0
ru_wallclock 3821
ru_utime 4757.413
ru_stime 7.035
ru_maxrss 128176
ru_ixrss 0
ru_ismrss 0
ru_idrss 0
ru_isrss 0
ru_minflt 1645788
ru_majflt 0
ru_nswap 0
ru_inblock 0
ru_oublock 0
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 4639
ru_nivcsw 272407
cpu 4764.448
mem 1346.846
io 0.037
iow 0.000
maxvmem 375.039M
arid undefined
==============================================================
qname bc.q
hostname compute-23-40.local
group bc_aap
owner liuyili
project aaptest
department defaultdepartment
jobname soap_00002.sh
jobnumber 1794799
taskid undefined
account sge
priority 0
qsub_time Fri Jan 17 01:34:54 2014
start_time Fri Jan 17 01:34:58 2014
end_time Fri Jan 17 01:35:00 2014
granted_pe NONE
slots 1
failed 0
exit_status 0
ru_wallclock 2
ru_utime 0.141
ru_stime 0.084
ru_maxrss 1756
ru_ixrss 0
ru_ismrss 0
ru_idrss 0
ru_isrss 0
ru_minflt 25934
ru_majflt 0
ru_nswap 0
ru_inblock 0
ru_oublock 0
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 461
ru_nivcsw 109
cpu 0.225
mem 0.001
io 0.006
iow 0.000
maxvmem 321.426M
arid undefined
复制代码
这样的文本,我想以owner、time为输入,比如输入是zhouyang、30,匹配一下,距今qsub_time是30天内的所有owner为zhouyang的jobname、jobnumber、qsub_time、
start_time、end_time、cpu、mem、io以及maxvmem,然后对start_time、end_time进行计算其差值。输出:
jobname jobnumber qsub_time start_time end_time cpu mem io maxvmem count(就是start_time、end_time的差值)
blast_4species_92.pep.sh 1745897 Thu Jan 16 10:25:40 2014 Fri Jan 17 08:00:56 2014 Fri Jan 17 09:14:03 2014 5528.927 1297.879 0.035 3.426G 01:13:07
blast_4species_38.pep.sh 1745910 Fri Jan 17 01:34:54 2014 Fri Jan 17 01:34:58 2014 Fri Jan 17 01:35:00 2014 0.225 0.001 0.006 321.426M 00:00:02
复制代码
最后计算出他们cpu mem io maxvmem count的总和:输出最终结果是:
jobname jobnumber qsub_time start_time end_time cpu mem io maxvmem count(就是start_time、end_time的差值)
blast_4species_92.pep.sh 1745897 Thu Jan 16 10:25:40 2014 Fri Jan 17 08:00:56 2014 Fri Jan 17 09:14:03 2014 5528.927 1297.879 0.035 3.426G 01:13:07
blast_4species_38.pep.sh 1745910 Fri Jan 17 01:34:54 2014 Fri Jan 17 01:34:58 2014 Fri Jan 17 01:35:00 2014 0.225 0.001 0.006 321.426M 00:00:02
===>count all:
zhouyang cpu:5 529.152 mem:1297.880 io:0.041 maxvmem: 3 829.65M count:01:13:09
复制代码
难点:
1、对段落进行匹配,根据指定行内容获取该段落信息;
2、时间的转换和计算(对qsub_time的判断,对start_time、end_time进行计算其差值)
3、结果的合并
作者:
huang6894
时间:
2014-01-27 09:45
本帖最后由 huang6894 于 2014-01-27 09:53 编辑
1、对段落进行匹配,根据指定行内容获取该段落信息;
能不能这样,定义一个元素,遇到qname是,为1,遇到arid时为0,对qsub_time计算,如果距离当前时间超出time的话为0,然后当这个为1的时候再去匹配段落信息呢?
作者:
xiaoshichao143
时间:
2014-01-27 11:04
本帖最后由 xiaoshichao143 于 2014-01-27 12:41 编辑
#!/bin/bash
if [[ $# -ne 2 ]];then
echo "Usage: sh $0 OWNER INTERVAL_TIME"
exit 1
fi
INPUT_FILE="file.txt"
TEMP_FILE="temp.txt"
OWNER="$1"
TIME_INTERVAL_INPUT="$2"
NUM_OF_EQUAL="$(($(grep '=' ${INPUT_FILE} | awk '! a[$0]++' | wc -c)-1))"
function format() {
TIME_H=$(eval echo \$$1)
TIME_M=$(eval echo \$$2)
TIME_S=$(eval echo \$$3)
if [[ ${#TIME_H} -eq 1 ]];then
eval $1="0\$$1"
fi
if [[ ${#TIME_M} -eq 1 ]];then
eval $2="0\$$2"
fi
if [[ ${#TIME_S} -eq 1 ]];then
eval $3="0\$$3"
fi
}
echo "jobname|jobnumber|qsub_time|start_time|end_time|cpu|mem|io|maxvmem|count" | awk -F'|' '{color=30;for(i=1;i<=NF;i++){color++;if(color==38)color=31;printf "\033[0;%s;1m%s\033[0m|",color,$i};print ""}' | sed 's/|$//'
sed ':a;$!N;s/\n/ /g;ta' ${INPUT_FILE} | sed "s/=\{${NUM_OF_EQUAL}\} /\n/g" | sed '/^$/d' | grep "${OWNER}" | sed 's/ \{2,\}/|/g' > ${OWNER}.txt
cat ${OWNER}.txt | while read line
do
TIME_NOW=$(date +%s -d "$(date)")
TIME_QSUB=$(date +%s -d "$(echo ${line} | sed 's/.*qsub_time|\(.*\) start_time.*/\1/')")
TIME_INTERVAL=$(((${TIME_NOW}-${TIME_QSUB})/86400))
if [[ ${TIME_INTERVAL_GET} -le ${TIME_INTERVAL_INPUT} ]];then
jobname="$(echo $line | sed 's/.*jobname|\(.*\) jobnumber.*/\1/')"
jobnumber="$(echo $line | sed 's/.*jobnumber|\(.*\) taskid.*/\1/')"
qsub_time="$(echo $line | sed 's/.*qsub_time|\(.*\) start_time.*/\1/')"
start_time="$(echo $line | sed 's/.*start_time|\(.*\) end_time.*/\1/')"
end_time="$(echo $line | sed 's/.*end_time|\(.*\) granted_pe.*/\1/')"
cpu="$(echo $line | sed 's/.*cpu|\(.*\) mem.*io.*/\1/')"
mem="$(echo $line | sed 's/.*mem|\(.*\) io.*iow.*/\1/')"
io="$(echo $line | sed 's/.*io|\(.*\) iow.*/\1/')"
maxvmem="$(echo $line | sed 's/.*maxvmem|\(.*\) arid.*/\1/')"
TIME_INTERVAL_BETWEEN_END_AND_START=$(($(date +%s -d "$end_time")-$(date +%s -d "$start_time")))
TIME_HOUR="00"
TIME_MINUTE="00"
if [[ ${TIME_INTERVAL_BETWEEN_END_AND_START} -lt 60 ]];then
TIME_SECOND="${TIME_INTERVAL_BETWEEN_END_AND_START}"
elif [[ ${TIME_INTERVAL_BETWEEN_END_AND_START} -lt 3600 ]];then
TIME_MINUTE="$((${TIME_INTERVAL_BETWEEN_END_AND_START}/60))"
TIME_SECOND="$((${TIME_INTERVAL_BETWEEN_END_AND_START}%60))"
else
TIME_HOUR="$((${TIME_INTERVAL_BETWEEN_END_AND_START}/(60*60)))"
TIME_MINUTE="$((${TIME_INTERVAL_BETWEEN_END_AND_START}%(60*60)/60))"
TIME_SECOND="$((${TIME_INTERVAL_BETWEEN_END_AND_START}%60))"
fi
format TIME_HOUR TIME_MINUTE TIME_SECOND
count="${TIME_HOUR}:${TIME_MINUTE}:${TIME_SECOND}"
echo "$jobname|$jobnumber|$qsub_time|$start_time|$end_time|$cpu|$mem|$io|$maxvmem|$count" | awk -F'|' '{color=30;for(i=1;i<=NF;i++){color++;if(color==38)color=31;printf "\033[0;%s;1m%s\033[0m|",color,$i};print ""}' | sed 's/|$//'
if echo $maxvmem | grep -q 'G';then
maxvmem=$(echo "scale=4;$(echo $maxvmem | tr -d 'G')*1024" | bc)
fi
echo "$cpu|$mem|$io|${maxvmem}|$count" >> ${TEMP_FILE}
fi
done
echo -e "\n===>count all:"
TOTAL_CPU=$(awk -F'|' '{sum += $1} END{print sum}' ${TEMP_FILE})
TOTAL_MEM=$(awk -F'|' '{sum += $2} END{print sum}' ${TEMP_FILE})
TOTAL_IO=$(awk -F'|' '{sum += $3} END{print sum}' ${TEMP_FILE})
TOTAL_VMEM=$(awk -F'|' '{sum += $4} END{print sum}' ${TEMP_FILE})
TOTAL_HOUR=$(awk -F'[|:]' '{sum += $5} END{print sum}' ${TEMP_FILE})
TOTAL_MINUTE=$(awk -F'[|:]' '{sum += $6} END{print sum}' ${TEMP_FILE})
TOTAL_SECOND=$(awk -F'[|:]' '{sum += $7} END{print sum}' ${TEMP_FILE})
SECOND=$((${TOTAL_SECOND}%60))
MINUTE=$(((${TOTAL_MINUTE}+${TOTAL_SECOND}/60)%60))
HOUR=$((${TOTAL_HOUR}+(${TOTAL_MINUTE}+${TOTAL_SECOND}/60)/60))
format HOUR MINUTE SECOND
echo -e "$1\tcpu:${TOTAL_CPU}\tmem:${TOTAL_MEM}\tio:${TOTAL_IO}\tmaxvmem:${TOTAL_VMEM}M\tcount:${HOUR}:${MINUTE}:${SECOND}"
rm -rf ${OWNER}.txt ${TEMP_FILE}
复制代码
作者:
huang6894
时间:
2014-01-27 11:18
回复
3#
xiaoshichao143
嗯嗯,谢谢你,谢谢。。大概能明白一点。。谢谢
作者:
huang6894
时间:
2014-01-27 11:43
回复
3#
xiaoshichao143
大神,想问下,有什么优化的方法吗?404M的文件,30分钟没有结果出来呢。。。不好意思啊
作者:
xiaoshichao143
时间:
2014-01-27 16:26
本帖最后由 xiaoshichao143 于 2014-01-27 17:10 编辑
回复
5#
huang6894
#!/bin/bash
if [[ $# -ne 2 ]];then
echo "Usage: sh $0 OWNER INTERVAL_TIME"
exit 1
fi
INPUT_FILE="file.txt"
TOTAL_FILE="total.txt"
#定义属主,用于搜索属于该属主的记录
OWNER="$1"
#定义时间间隔,用于搜索在该时间天数内的记录
TIME_INTERVAL_INPUT="$2"
#得到"="号的个数
NUM_OF_EQUAL="$(($(head -n 1 ${INPUT_FILE} | wc -c)-1))"
#格式化输出时间
function format() {
TIME_H=$(eval echo \$$1)
TIME_M=$(eval echo \$$2)
TIME_S=$(eval echo \$$3)
if [[ ${#TIME_H} -eq 1 ]];then
eval $1="0\$$1"
fi
if [[ ${#TIME_M} -eq 1 ]];then
eval $2="0\$$2"
fi
if [[ ${#TIME_S} -eq 1 ]];then
eval $3="0\$$3"
fi
}
echo "jobname|jobnumber|qsub_time|start_time|end_time|cpu|mem|io|maxvmem|count" | awk -F'|' '{color=30;for(i=1;i<=NF;i++){color++;if(color==38)color=31;printf "\033[0;%s;1m%s\033[0m|",color,$i};print ""}' | sed 's/|$//'
#删除临时文件
rm -f ${OWNER}.txt ${INPUT_FILE}.* ${TOTAL_FILE}
#备份文件
cp ${INPUT_FILE}{,.bak}
#格式化文件,筛选出需要的行
sed -ri '/^(=|owner|job|qsub|start|end|cpu|mem|io[^w]|max)/!d' ${INPUT_FILE}.bak
#将大文件分割成可处理的小文件
split -l $((11*11*11*6)) -a 4 ${INPUT_FILE}.bak ${INPUT_FILE}.
#读每一个分割后的小文件
for file in $(ls ${INPUT_FILE}.a*)
do
#格式化分割后的小文件
sed ':a;$!N;s/\n/ /g;ta' ${file} | sed "s/=\{${NUM_OF_EQUAL}\} /\n/g" | sed '/^$/d' | grep "${OWNER}" \
| sed 's/ \{2,\}/|/g' > ${OWNER}.txt
cat ${OWNER}.txt | while read line
do
#从1970-01-01到现在的杪数
TIME_NOW=$(date +%s -d "$(date)")
#从1970-01-01到qsub_time时间之间的秒数
TIME_QSUB=$(date +%s -d "$(echo ${line} | sed 's/.*qsub_time|\(.*\) start_time.*/\1/')")
#从qsub_time到现在所隔的天数
TIME_INTERVAL_GET=$(((${TIME_NOW}-${TIME_QSUB})/86400))
#如果从qsub_time到现在所隔的天数少于所输入的天数,则是需要的记录
if [[ ${TIME_INTERVAL_GET} -le ${TIME_INTERVAL_INPUT} ]];then
jobname="$(echo $line | sed 's/.*jobname|\(.*\) jobnumber.*/\1/')"
jobnumber="$(echo $line | sed 's/.*jobnumber|\(.*\) qsub_time.*/\1/')"
qsub_time="$(echo $line | sed 's/.*qsub_time|\(.*\) start_time.*/\1/')"
start_time="$(echo $line | sed 's/.*start_time|\(.*\) end_time.*/\1/')"
end_time="$(echo $line | sed 's/.*end_time|\(.*\) cpu.*/\1/')"
cpu="$(echo $line | sed 's/.*cpu|\(.*\) mem.*io.*/\1/')"
mem="$(echo $line | sed 's/.*mem|\(.*\) io.*maxvmem.*/\1/')"
io="$(echo $line | sed 's/.*io|\(.*\) maxvmem.*/\1/')"
maxvmem="$(echo $line | sed 's/.*maxvmem|\(.*\)/\1/')"
#maxvmem="$(echo $line | sed 's/.*maxvmem|\(.*\) arid.*/\1/')"
TIME_INTERVAL_BETWEEN_END_AND_START=$(($(date +%s -d "$end_time")-$(date +%s -d "$start_time")))
TIME_HOUR="00"
TIME_MINUTE="00"
if [[ ${TIME_INTERVAL_BETWEEN_END_AND_START} -lt 60 ]];then
TIME_SECOND="${TIME_INTERVAL_BETWEEN_END_AND_START}"
elif [[ ${TIME_INTERVAL_BETWEEN_END_AND_START} -lt 3600 ]];then
TIME_MINUTE="$((${TIME_INTERVAL_BETWEEN_END_AND_START}/60))"
TIME_SECOND="$((${TIME_INTERVAL_BETWEEN_END_AND_START}%60))"
else
TIME_HOUR="$((${TIME_INTERVAL_BETWEEN_END_AND_START}/(60*60)))"
TIME_MINUTE="$((${TIME_INTERVAL_BETWEEN_END_AND_START}%(60*60)/60))"
TIME_SECOND="$((${TIME_INTERVAL_BETWEEN_END_AND_START}%60))"
fi
format TIME_HOUR TIME_MINUTE TIME_SECOND
count="${TIME_HOUR}:${TIME_MINUTE}:${TIME_SECOND}"
echo "$jobname|$jobnumber|$qsub_time|$start_time|$end_time|$cpu|$mem|$io|$maxvmem|$count" | awk -F'|' '{color=30;for(i=1;i<=NF;i++){color++;if(color==38)color=31;printf "\033[0;%s;1m%s\033[0m|",color,$i};print ""}' | sed 's/|$//'
if echo $maxvmem | grep -q 'G';then
maxvmem=$(echo "scale=4;$(echo $maxvmem | tr -d 'G')*1024" | bc)
fi
echo "$cpu|$mem|$io|${maxvmem%M}|$count" >> ${TOTAL_FILE}
fi
done
done
echo -e "\n===>count all:"
TOTAL_CPU=$(awk -F'|' '{sum += $1} END{print sum}' ${TOTAL_FILE})
TOTAL_MEM=$(awk -F'|' '{sum += $2} END{print sum}' ${TOTAL_FILE})
TOTAL_IO=$(awk -F'|' '{sum += $3} END{print sum}' ${TOTAL_FILE})
TOTAL_VMEM=$(awk -F'|' '{sum += $4} END{print sum}' ${TOTAL_FILE})
TOTAL_HOUR=$(awk -F'[|:]' '{sum += $5} END{print sum}' ${TOTAL_FILE})
TOTAL_MINUTE=$(awk -F'[|:]' '{sum += $6} END{print sum}' ${TOTAL_FILE})
TOTAL_SECOND=$(awk -F'[|:]' '{sum += $7} END{print sum}' ${TOTAL_FILE})
SECOND=$((${TOTAL_SECOND}%60))
MINUTE=$(((${TOTAL_MINUTE}+${TOTAL_SECOND}/60)%60))
HOUR=$((${TOTAL_HOUR}+(${TOTAL_MINUTE}+${TOTAL_SECOND}/60)/60))
format HOUR MINUTE SECOND
echo -e "$1\tcpu:${TOTAL_CPU}\tmem:${TOTAL_MEM}\tio:${TOTAL_IO}\tmaxvmem:${TOTAL_VMEM}M\tcount:${HOUR}:${MINUTE}:${SECOND}"
复制代码
作者:
xiaoshichao143
时间:
2014-01-27 16:30
回复
5#
huang6894
文件太多,不好处理.将它们分割了,还是会很慢,但结果是有出来. 我也是新手,肯定会有更好的方法的
作者:
huang6894
时间:
2014-01-27 17:05
回复
7#
xiaoshichao143
真的,很感谢,很感谢
作者:
huang6894
时间:
2014-02-08 16:18
顶顶老帖,求大神帮帮忙啊~
欢迎光临 Chinaunix (http://bbs.chinaunix.net/)
Powered by Discuz! X3.2