我试图用一个内部连接来连接两个pandas数据帧。
my_df = pd.merge(df1, df2, how = 'inner', left_on = ['date'], right_on = ['myDate'])
但是,我得到以下错误:
KeyError: 'myDate' TypeError: an integer is required
我相信在日期加入是有效的,但是我不能让这个简单的加入工作?
DF2是使用以下命令创建的
df2 = idf.groupby(lambda x: (x.year,x.month,x.day)).mean()
有人能建议一下吗?谢谢。
df1
type object
id object
date object
value float64
type id date value
0 CAR PSTAT001 15/07/15 42
1 BIKE PSTAT001 16/07/15 42
2 BIKE PSTAT001 17/07/15 42
3 BIKE PSTAT004 18/07/15 42
4 BIKE PSTAT001 19/07/15 32
df2
myDate object
val1 float64
val2 float64
val3 float64
myDate val1 val2 val3
0 (2015,7,13) 1074 1871.666667 2800.777778
1 (2015,7,14) 347.958333 809.416667 1308.458333
2 (2015,7,15) 202.625 597.375 1008.666667
3 (2015,7,16) 494.958333 1192 1886.916667
DF1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3040 entries, 0 to 3039
Data columns (total 4 columns):
type 3040 non-null object
id 3040 non-null object
date 3040 non-null object
value 3040 non-null float64
dtypes: float64(1), object(3)
memory usage: 118.8+ KB
DF2.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 16 entries, 0 to 15
Data columns (total 4 columns):
myDate 16 non-null object
val1 16 non-null float64
val2 16 non-null float64
val3 16 non-null float64
dtypes: float64(3), object(1)
memory usage: 640.0+ bytes
最佳答案
您的日期列不是datetime数据类型,df1
看起来像一个str
,而另一个是一个tuple
,因此您需要先转换这些数据,然后合并才能工作:
In [75]:
df1['date'] = pd.to_datetime(df1['date'])
df1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 4 columns):
type 5 non-null object
id 5 non-null object
date 5 non-null datetime64[ns]
value 5 non-null int64
dtypes: datetime64[ns](1), int64(1), object(2)
memory usage: 200.0+ bytes
In [76]:
import datetime as dt
df2['myDate'] = df2['myDate'].apply(lambda x: dt.datetime(x[0], x[1], x[2]))
df2.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 4 columns):
myDate 4 non-null datetime64[ns]
val1 4 non-null float64
val2 4 non-null float64
val3 4 non-null float64
dtypes: datetime64[ns](1), float64(3)
memory usage: 160.0 bytes
In [78]:
my_df= pd.merge(df1, df2, how = 'inner', left_on = ['date'], right_on = ['myDate'])
my_df
Out[78]:
type id date value myDate val1 val2 \
0 CAR PSTAT001 2015-07-15 42 2015-07-15 202.625000 597.375
1 BIKE PSTAT001 2015-07-16 42 2015-07-16 494.958333 1192.000
val3
0 1008.666667
1 1886.916667