- 论坛徽章:
- 10
|
我觉得这个数据用pandas处理比较合适,但是我这刚学,给你个刚写的吧。
root@lp:~/jw/python/data# cat query_count.py
# query.txt
import pandas as pd
import re
def get_info(x):
return re.findall('\[.*\]',x)[0]
name = 'query.txt'
a = pd.read_csv(name,sep='\s\s+',engine='python')
b = pd.DataFrame()
b['id'] = a.Query_id
b['name'] = a.Subject_annotation.apply(get_info)
b['iname'] = b.id + b.name
c = b.iname.unique()
d = pd.Series(c)
e = d.apply(lambda x .split('[')[0]).value_counts()
print(e[e<=5])
root@lp:~/jw/python/data# python query_count.py
VVC24064 3
dtype: int64
|
|