基于另一个DataFrame更新DataFrame

基于另一个DataFrame更新DataFrame

本文介绍了基于另一个DataFrame更新DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定DataFrame df

Given DataFrame df:

    Id Sex  Group  Time  Time!
0  21   M      2  2.31    NaN
1   2   F      2  2.29    NaN

更新

    Id Sex  Group  Time
0  21   M      2  2.36
1   2   F      2  2.09
2   3   F      1  1.79

I想要匹配 Id 性别和更新时间!时间值(从更新 df )如果匹配,或插入新的记录。

I want to match on Id, Sex and Group and either update Time! with Time value (from the update df) if match, or insert if a new record.

这是我如何做:

df = df.set_index(['Id', 'Sex', 'Group'])
update = update.set_index(['Id', 'Sex', 'Group'])

for i, row in update.iterrows():
    if i in df.index:  # update
        df.ix[i, 'Time!'] = row['Time']
    else:              # insert new record
        cols = up.columns.values
        row = np.array(row).reshape(1, len(row))
        _ = pd.DataFrame(row, index=[i], columns=cols)
       df = df.append(_)

print df

              Time  Time!
Id Sex Group
21 M   2      2.31   2.36
2  F   2      2.29   2.09
3  F   1      1.79    NaN

代码似乎工作,我希望的结果与上述匹配。但是,我注意到,如果我在df.index:$ b中有条件的条件

The code seem to work and my wished result matches with the above. However, I have noticed this behaving faultily on a big data set, with the conditional

if i in df.index:
    ...
else:
    ...

工作明显错误(它将继续到 else 和副作用,它不应该,我猜,这个MultiIndex可能是某种原因)。

working obviously wrong (it would proceed to else and vice-verse where it shouldn't, I guess, this MultiIndex may be the cause somehow).

所以我的问题是,你知道其他任何方式,还是我们的更健壮的版本,以更新一个df基于另一个df?

So my question is, do you know any other way, or a more robust version of mine, to update one df based on another df?

推荐答案

我想我会用合并来做,然后用一个更新列。首先从时间列中删除:

I think I would do this with a merge, and then update the columns with a where. First remove the Time column from up:

In [11]: times = up.pop('Time')  # up = the update DataFrame

In [12]: df1 = df.merge(up, how='outer')

In [13]: df1
Out[13]:
   Id Sex  Group  Time  Time!
0  21   M      2  2.31    NaN
1   2   F      2  2.29    NaN
2   3   F      1   NaN    NaN

如果不是NaN和Time,更新时间!如果是NaN:

Update Time if it's not NaN and Time! if it's NaN:

In [14]: df1['Time!'] = df1['Time'].where(df1['Time'].isnull(), times)

In [15]: df1['Time'] = df1['Time'].where(df1['Time'].notnull(), times)

In [16]: df1
Out[16]:
   Id Sex  Group  Time  Time!
0  21   M      2  2.31   2.36
1   2   F      2  2.29   2.09
2   3   F      1  1.79    NaN

这篇关于基于另一个DataFrame更新DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 16:41