我试图用一个内部连接来连接两个pandas数据帧。

my_df = pd.merge(df1, df2, how = 'inner', left_on = ['date'], right_on = ['myDate'])

但是,我得到以下错误:
KeyError: 'myDate' TypeError: an integer is required

我相信在日期加入是有效的,但是我不能让这个简单的加入工作?
DF2是使用以下命令创建的
df2 = idf.groupby(lambda x: (x.year,x.month,x.day)).mean()

有人能建议一下吗?谢谢。
df1
type    object
id      object
date    object
value   float64

    type    id          date       value
0   CAR     PSTAT001    15/07/15    42
1   BIKE    PSTAT001    16/07/15    42
2   BIKE    PSTAT001    17/07/15    42
3   BIKE    PSTAT004    18/07/15    42
4   BIKE    PSTAT001    19/07/15    32

df2
myDate  object
val1    float64
val2    float64
val3    float64

    myDate     val1         val2           val3
0   (2015,7,13) 1074        1871.666667    2800.777778
1   (2015,7,14) 347.958333  809.416667     1308.458333
2   (2015,7,15) 202.625     597.375        1008.666667
3   (2015,7,16) 494.958333  1192           1886.916667

DF1.info()
<class  'pandas.core.frame.DataFrame'>
Int64Index: 3040    entries,    0   to  3039
Data    columns (total  4   columns):
type    3040    non-null    object
id      3040    non-null    object
date    3040    non-null    object
value   3040    non-null    float64
dtypes: float64(1), object(3)
memory  usage:  118.8+  KB

DF2.info()
<class  'pandas.core.frame.DataFrame'>
Int64Index: 16  entries,    0   to  15
Data    columns (total  4   columns):
myDate  16  non-null    object
val1    16  non-null    float64
val2    16  non-null    float64
val3    16  non-null    float64
dtypes: float64(3), object(1)
memory  usage:  640.0+  bytes

最佳答案

您的日期列不是datetime数据类型,df1看起来像一个str,而另一个是一个tuple,因此您需要先转换这些数据,然后合并才能工作:

In [75]:
df1['date'] = pd.to_datetime(df1['date'])
df1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 4 columns):
type     5 non-null object
id       5 non-null object
date     5 non-null datetime64[ns]
value    5 non-null int64
dtypes: datetime64[ns](1), int64(1), object(2)
memory usage: 200.0+ bytes

In [76]:
import datetime as dt
df2['myDate'] = df2['myDate'].apply(lambda x: dt.datetime(x[0], x[1], x[2]))
df2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 4 columns):
myDate    4 non-null datetime64[ns]
val1      4 non-null float64
val2      4 non-null float64
val3      4 non-null float64
dtypes: datetime64[ns](1), float64(3)
memory usage: 160.0 bytes

In [78]:
my_df=  pd.merge(df1, df2, how = 'inner', left_on = ['date'], right_on = ['myDate'])
my_df

Out[78]:
   type        id       date  value     myDate        val1      val2  \
0   CAR  PSTAT001 2015-07-15     42 2015-07-15  202.625000   597.375
1  BIKE  PSTAT001 2015-07-16     42 2015-07-16  494.958333  1192.000

          val3
0  1008.666667
1  1886.916667

10-07 19:11
查看更多