我有一个赛季的篮球得分数据,我想找到每个球队本赛季每场比赛的天数。
示例框架:
testDateFrame = pd.DataFrame({'HomeTeam': ['HOU', 'CHI', 'DAL', 'HOU'],
'AwayTeam' : ['CHI', 'DAL', 'CHI', 'DAL'],
'HomeGameNum': [1, 2, 2, 2],
'AwayGameNum' : [1, 1, 3, 3],
'Date' : [datetime.date(2014,3,11), datetime.date(2014,3,12), datetime.date(2014,3,14), datetime.date(2014,3,15)]})
我想要的输出是这样的:
AwayGameNum AwayTeam Date HomeGameNum HomeTeam AwayRest HomeRest
1 CHI 2014-03-11 1 HOU nan nan
1 DAL 2014-03-12 2 CHI nan 0
3 CHI 2014-03-14 2 DAL 1 1
3 DAL 2014-03-15 2 HOU 0 3
其中AwayRest,HomeRest列是AwayTeam,HomeTeam -1在游戏之间的天数
最佳答案
我会稍微调整您的数据布局,使其与Hadley Wickhams对Tidy Data的定义相符。这使计算更加简单。消除AwayTeam
和HomeTeam
的列,并使用Team
制作单个列。然后创建一个布尔列(HomeTeam
)以确定该团队是否为主队。
注意:我没有更改AwayGameNum
和HomeGameNum
,因此数字与您所需的输出不匹配。但是该方法将起作用。
In [34]: df
Out[34]:
AwayGameNum Team Date HomeGameNum HomeTeam
0 1 CHI 2014-03-11 1 False
1 1 HOU 2014-03-11 1 True
2 1 DAL 2014-03-12 2 False
3 1 CHI 2014-03-12 2 True
4 3 CHI 2014-03-14 2 False
5 3 DAL 2014-03-14 2 True
6 3 DAL 2014-03-15 2 False
7 3 HOU 2014-03-15 2 True
[8 rows x 5 columns]
In [62]: rest = df.groupby(['Team'])['Date'].diff() - datetime.timedelta(1)
In [63]: df['HomeRest'] = rest[df.HomeTeam]
In [64]: df['AwayRest'] = rest[~df.HomeTeam]
In [65]: df
Out[65]:
AwayGameNum Team Date HomeGameNum HomeTeam HomeRest AwayRest
0 1 CHI 2014-03-11 1 False NaT NaT
1 1 HOU 2014-03-11 1 True NaT NaT
2 1 DAL 2014-03-12 2 False NaT NaT
3 1 CHI 2014-03-12 2 True 0 days NaT
4 3 CHI 2014-03-14 2 False NaT 1 days
5 3 DAL 2014-03-14 2 True 1 days NaT
6 3 DAL 2014-03-15 2 False NaT 0 days
7 3 HOU 2014-03-15 2 True 3 days NaT
[8 rows x 7 columns]