问题描述
给定DataFrame df
:
Given DataFrame df
:
Id Sex Group Time Time!
0 21 M 2 2.31 NaN
1 2 F 2 2.29 NaN
更新
:
Id Sex Group Time
0 21 M 2 2.36
1 2 F 2 2.09
2 3 F 1 1.79
I想要匹配 Id
,性别
和组
和更新时间!
与时间
值(从更新
df )如果匹配,或插入新的记录。
I want to match on Id
, Sex
and Group
and either update Time!
with Time
value (from the update
df) if match, or insert if a new record.
这是我如何做:
df = df.set_index(['Id', 'Sex', 'Group'])
update = update.set_index(['Id', 'Sex', 'Group'])
for i, row in update.iterrows():
if i in df.index: # update
df.ix[i, 'Time!'] = row['Time']
else: # insert new record
cols = up.columns.values
row = np.array(row).reshape(1, len(row))
_ = pd.DataFrame(row, index=[i], columns=cols)
df = df.append(_)
print df
Time Time!
Id Sex Group
21 M 2 2.31 2.36
2 F 2 2.29 2.09
3 F 1 1.79 NaN
代码似乎工作,我希望的结果与上述匹配。但是,我注意到,如果我在df.index:$ b中有条件的条件
The code seem to work and my wished result matches with the above. However, I have noticed this behaving faultily on a big data set, with the conditional
if i in df.index:
...
else:
...
工作明显错误(它将继续到 else
和副作用,它不应该,我猜,这个MultiIndex可能是某种原因)。
working obviously wrong (it would proceed to else
and vice-verse where it shouldn't, I guess, this MultiIndex may be the cause somehow).
所以我的问题是,你知道其他任何方式,还是我们的更健壮的版本,以更新一个df基于另一个df?
So my question is, do you know any other way, or a more robust version of mine, to update one df based on another df?
推荐答案
我想我会用合并来做,然后用一个更新列。首先从时间列中删除:
I think I would do this with a merge, and then update the columns with a where. First remove the Time column from up:
In [11]: times = up.pop('Time') # up = the update DataFrame
In [12]: df1 = df.merge(up, how='outer')
In [13]: df1
Out[13]:
Id Sex Group Time Time!
0 21 M 2 2.31 NaN
1 2 F 2 2.29 NaN
2 3 F 1 NaN NaN
如果不是NaN和Time,更新时间!如果是NaN:
Update Time if it's not NaN and Time! if it's NaN:
In [14]: df1['Time!'] = df1['Time'].where(df1['Time'].isnull(), times)
In [15]: df1['Time'] = df1['Time'].where(df1['Time'].notnull(), times)
In [16]: df1
Out[16]:
Id Sex Group Time Time!
0 21 M 2 2.31 2.36
1 2 F 2 2.29 2.09
2 3 F 1 1.79 NaN
这篇关于基于另一个DataFrame更新DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!