我对熊猫不熟悉。
我有一个数据帧,它看起来像这样(只是大得多):
Horses RaceDate Position
1 RedHorse 1/2/00 2
2 BlueHorse 1/2/00 6
3 YellowHorse 1/2/00 7
4 RedHorse 15/1/00 3
我想为以前的结果添加列。所以我的数据帧可能看起来像:
Horses RaceDate Position PrevPosition
1 RedHorse 1/2/00 2 3
2 BlueHorse 1/2/00 6 -
3 YellowHorse 1/2/00 7 -
4 RedHorse 15/1/00 3 -
我试过以下方法:
def prevRuns(horseName, raceDate):
horseDf = df.loc[df['Horse'] == horseName]
currentRace = horseDf.index[horseDf['RaceDate'] == raceDate]
if len(horseDf.index) >= currentRace:
return horseDf.at[currentRace+1,'Position']
else:
return 0
df['prevRun'] = df['Horse'].apply(prevRuns, raceDate = df['RaceDate'])
但没用。
ValueError: Can only compare identically-labeled Series objects
为什么不起作用?
有没有更优雅的方法来实现我的目标?
最佳答案
您可以使用groupby
+shift
:
# convert dates to datetime and sort descending
df['RaceDate'] = pd.to_datetime(df['RaceDate'], dayfirst=True)
df = df.sort_values('RaceDate', ascending=False)
# groupby and shift for previous position
df['PrevPosition'] = df.groupby('Horses')['Position'].shift(-1)
print(df)
Horses RaceDate Position PrevPosition
1 RedHorse 2000-02-01 2 3.0
2 BlueHorse 2000-02-01 6 NaN
3 YellowHorse 2000-02-01 7 NaN
4 RedHorse 2000-01-15 3 NaN