问题描述
我的DataFrame包含MultiIndex列,如下所示:
I have DataFrame with MultiIndex columns that looks like this:
# sample data
col = pd.MultiIndex.from_arrays([['one', 'one', 'one', 'two', 'two', 'two'],
['a', 'b', 'c', 'a', 'b', 'c']])
data = pd.DataFrame(np.random.randn(4, 6), columns=col)
data
选择特定列的正确,简单方法是什么(例如 ['a','c']
,从第二级开始不是范围?
What is the proper, simple way of selecting only specific columns (e.g. ['a', 'c']
, not a range) from the second level?
目前我这样做:
import itertools
tuples = [i for i in itertools.product(['one', 'two'], ['a', 'c'])]
new_index = pd.MultiIndex.from_tuples(tuples)
print(new_index)
data.reindex_axis(new_index, axis=1)
然而,这并不是一个好的解决方案,因为我必须淘汰 itertools
,手工构建另一个MultiIndex然后重新索引(我的实际代码甚至更麻烦,因为列列表不是那么容易获取)。我很确定必须有一些 ix
或 xs
这样做的方式,但我尝试的所有内容都会导致错误。
It doesn't feel like a good solution, however, because I have to bust out itertools
, build another MultiIndex by hand and then reindex (and my actual code is even messier, since the column lists aren't so simple to fetch). I am pretty sure there has to be some ix
or xs
way of doing this, but everything I tried resulted in errors.
推荐答案
这不是很好,但也许:
>>> data
one two
a b c a b c
0 -0.927134 -1.204302 0.711426 0.854065 -0.608661 1.140052
1 -0.690745 0.517359 -0.631856 0.178464 -0.312543 -0.418541
2 1.086432 0.194193 0.808235 -0.418109 1.055057 1.886883
3 -0.373822 -0.012812 1.329105 1.774723 -2.229428 -0.617690
>>> data.ix[:,data.columns.get_level_values(1).isin({"a", "c"})]
one two
a c a c
0 -0.927134 0.711426 0.854065 1.140052
1 -0.690745 -0.631856 0.178464 -0.418541
2 1.086432 0.808235 -0.418109 1.886883
3 -0.373822 1.329105 1.774723 -0.617690
会起作用吗?
这篇关于如何从MultiIndex列的DataFrame中仅选择特定列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!