本文介绍了从DataFrame中删除NaN,并从多索引中删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
给出一个多索引DataFrame:
Given a multi-index DataFrame:
col_A col_B
level_0 level_1
A x 1.0 NaN
y NaN 1.0
x NaN 2.0
y 2.0 NaN
如何从df中删除NaN,并从多索引中删除重复项以获取:
How can I remove the NaNs from the df and duplicates from the multi-index to get:
col_A col_B
level_0 level_1
A x 1.0 2.0
y 2.0 1.0
这是MWE:
import pandas as pd
import numpy as np
index = pd.MultiIndex.from_product([['A', 'A'],
['x', 'y']],
names=['level_0',
'level_1'])
data =[
[1, np.NaN],
[np.NaN, 1],
[np.NaN,2],
[2, np.NaN],
]
df = pd.DataFrame(data=data, index=index, columns=['col_A', 'col_B'])
print df
推荐答案
在index
名称上使用groupby
,并获取first
值.
Use groupby
on index
names, and take first
values.
In [642]: df.groupby(level=df.index.names).first()
Out[642]:
col_A col_B
level_0 level_1
A x 1.0 2.0
y 2.0 1.0
注:编辑后,意识到它几乎与Psidom的答案相同.对level
Note: Post edit, realized it's almost identical to Psidom's answer. A minor generic edit to level
这篇关于从DataFrame中删除NaN,并从多索引中删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!