- 论坛徽章:
- 0
|
本帖最后由 nmweizi 于 2012-06-03 14:02 编辑
不知道用python写算不算,代码写的比较丑陋,请大家多原谅- header = True
- for i in f.readlines():
- #d = []
- i = i.strip()
- if len(i) > 0 and len(i) <50:
- if header:
- d = []
-
- header = False
- d.append(i)
- #print d
- #else:
- if len(i) > 50 :
- #print d
- header = True
- if i[0]<>'#':
-
- i1 = re.sub('\s{2,}|\s+\(|\)\s+','|',i)
- i2 = re.sub('\)\s*\||\(aged','',i1)
-
- i3 = re.sub('\|\s*\(','|',i2)
- i4 = re.sub('\|\s*c\s*\|',' (c) |',i3)
-
- if len(d) ==2:
- d.insert(0,d1)
- else:
- d1 =d[0]
- g.write(i4+'|'+'|'.join(d) +'\n')
-
- g.close()
- f.close()
- import datetime
- def date_converter(date_string):
- return datetime.datetime.strptime(date_string, '%Y-%m-%d').date()
- df = read_csv('e:/temp/name3.txt',parse_dates=True,sep='\|',names=['#', 'Pos.', 'Player','Date of Birth', 'Age','Club','group','gj','jl'],converters={3:date_converter})
- df.head()
复制代码 Out[43]:
# Pos. Player Date of Birth Age Club group gj jl sr
0 1 GK Wojciech Szczesny 1990-04-18 22 Arsenal Group A Poland Head coach: Franciszek Smuda (Poland) 4-18
1 2 DF Sebastian Boenisch 1987-02-01 25 Werder Bremen Group A Poland Head coach: Franciszek Smuda (Poland) 2-1
2 3 DF Grzegorz Wojtkowiak 1984-01-26 28 Lech Poznan Group A Poland Head coach: Franciszek Smuda (Poland) 1-26
3 4 DF Damien Perquis 1984-10-04 27 Sochaux Group A Poland Head coach: Franciszek Smuda (Poland) 10-4
4 5 MF Dariusz Dudka 1983-12-09 28 Auxerre Group A Poland Head coach: Franciszek Smuda (Poland) 12-9
1.年纪最小的人
- df.min(axis=0)
- df.Age.min() #这个也可以
复制代码 (Age) 18
2.年纪最大的人
- df.max(axis=0)
- df.Age.max() #这个也可以
复制代码 (Age) 38
3.哪一天出生的人最多- df.groupby(['Date of Birth']).count().sort('Date of Birth', ascending = False)['#']
复制代码 Date of Birth
1982-05-01 2
1987-01-27 2
1981-10-03 2
1986-09-08 2
1981-10-29 2
1984-06-28 2
1987-01-22 2
1987-07-28 2
1989-05-31 2
1983-07-07 2
1984-12-13 2
1985-08-20 2
1983-05-02 1
如果只比较天- nian = lambda x: x.day
- df['sr'] = df['Date of Birth'].map(nian)
- df.groupby(['sr']).count().sort('Date of Birth', ascending = False)['#']
复制代码 sr
13 18
20 18
27 17
28 16
9 16
4 15
26 14
12 14
3 13
8 13
19 13
6 12
10 12
17 12
21 12
按月日计算- nian = lambda x: str(x.month)+'-'+str(x.day)
- df['sr'] = df['Date of Birth'].map(nian)
- df.groupby(['sr']).count().sort('Date of Birth', ascending = False)['#']
复制代码 Out[29]:
sr
6-9 4
3-30 4
7-7 4
10-27 3
7-28 3
5-31 3
9-8 3
4.和你生日相同的人,如果有的话找出来- nian = lambda x: x ==datetime(1983,12,9).date()
- df['sr'] = df['Date of Birth'].map(nian)
- print df[df['sr']]
复制代码 # Pos. Player Date of Birth Age Club group gj jl rq sr
4 5 MF Dariusz Dudka 1983-12-09 28 Auxerre Group A Poland Head coach: Franciszek Smuda (Poland) True
如果不算年的话- nian = lambda x: x.month ==12 and x.day==9
- df['sr'] = df['Date of Birth'].map(nian)
- print df[df['sr']]
复制代码 # Pos. Player Date of Birth Age Club group gj jl sr
4 5 MF Dariusz Dudka 1983-12-09 28 Auxerre Group A Poland Head coach: Franciszek Smuda (Poland) True
276 1 GK Maksym Koval 1992-12-09 19 Dynamo Kyiv Group D Ukraine Head coach: Oleh Blokhin (Ukraine) True
5.给定一个球员名,列出国籍与教练,处在哪个小组- print df[df['Player']=='Simon Cox'][['gj','jl','group']]
复制代码 gj jl group
249 Republic of Ireland Head coach: Giovanni Trapattoni(Italy) Group C
6.哪个俱乐部的人最多- df.groupby(['Club']).count().sort('Club', ascending = False)['#']
复制代码 Out[145]:
Club
Bayern Munich 13
Real Madrid 11
Dynamo Kyiv 10
Chelsea 9
Zenit Saint Petersburg 8
Arsenal 8
Barcelona 8
Shakhtar Donetsk 8
Borussia Dortmund 8
Manchester City 8
Manchester United 7
Juventus 7
CSKA Moscow 7
Liverpool 7
Olympiacos 6
7.队长里打什么位置的最多- f = lambda x: '(c)' in x
- df['dz'] = df['Player'].map(f)
- df.groupby(['dz','Pos.']).count().sort('Pos.', ascending = False)['Pos.']
复制代码 dz Pos.
False MF 124
DF 112
FW 71
GK 45
True MF 6
FW 5
GK 3
DF 2
8.比你年纪大的有多少人- nian = lambda x: x >datetime(1983,12,9).date()
- df['sr'] = df['Date of Birth'].map(nian)
- df[df['sr']].count()
复制代码 # 222
9.列出365天里所有人都不过生日的日期- from datetime import datetime
- from pandas.core.datetools import *
- start = datetime(2012, 1, 1)
- end = datetime(2012, 12, 31)
- rng = DateRange(start, end, offset=BDay())
- nian = lambda x: datetime(2012,x.month,x.day)
- df['rq'] =df['Date of Birth'].map(nian)
- (rng-df['rq']).map(lambda x: [str(x.month)+'-'+str(x.day)])
复制代码 Out[158]:
array([['1-1'], ['1-3'], ['1-7'], ['1-20'], ['1-21'], ['1-29'], ['1-30'],
['1-31'], ['2-6'], ['2-7'], ['2-8'], ['2-10'], ['2-17'], ['2-19'],
['2-22'], ['2-27'], ['3-1'], ['3-3'], ['3-5'], ['3-8'], ['3-14'],
['3-15'], ['3-22'], ['3-23'], ['3-24'], ['3-25'], ['3-31'], ['4-1'],
['4-11'], ['4-12'], ['4-14'], ['4-15'], ['4-17'], ['4-23'],
['4-24'], ['4-26'], ['5-3'], ['5-6'], ['5-10'], ['5-12'], ['5-14'],
['5-18'], ['5-25'], ['5-28'], ['6-1'], ['6-2'], ['6-5'], ['6-7'],
['6-11'], ['6-14'], ['6-16'], ['6-18'], ['6-23'], ['6-24'],
['6-26'], ['6-30'], ['7-1'], ['7-2'], ['7-4'], ['7-5'], ['7-13'],
['7-18'], ['7-19'], ['7-23'], ['7-25'], ['7-27'], ['7-29'],
['7-30'], ['7-31'], ['8-5'], ['8-7'], ['8-9'], ['8-10'], ['8-22'],
['8-25'], ['8-26'], ['8-29'], ['9-1'], ['9-7'], ['9-11'], ['9-16'],
['9-17'], ['9-23'], ['9-24'], ['9-25'], ['10-1'], ['10-2'],
['10-8'], ['10-11'], ['10-14'], ['10-17'], ['10-19'], ['10-21'],
['10-26'], ['10-30'], ['10-31'], ['11-2'], ['11-3'], ['11-7'],
['11-9'], ['11-13'], ['11-14'], ['11-16'], ['11-18'], ['11-19'],
['11-20'], ['11-22'], ['11-23'], ['11-24'], ['11-28'], ['11-29'],
['12-1'], ['12-4'], ['12-5'], ['12-10'], ['12-18'], ['12-22'],
['12-23'], ['12-25'], ['12-29'], ['12-31']], dtype=object)
10.打各个位置的人一共有多少- df.groupby('Pos.').count().sort('Pos.', ascending = False)['Pos.']
复制代码 Out[418]:
Pos.
MF 130
DF 114
FW 76
GK 48 |
|