本文介绍了如何从MultiIndex列的DataFrame中仅选择特定列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的DataFrame包含MultiIndex列,如下所示:

I have DataFrame with MultiIndex columns that looks like this:

# sample data
col = pd.MultiIndex.from_arrays([['one', 'one', 'one', 'two', 'two', 'two'],
                                ['a', 'b', 'c', 'a', 'b', 'c']])
data = pd.DataFrame(np.random.randn(4, 6), columns=col)
data

选择特定列的正确,简单方法是什么(例如 ['a','c'] ,从第二级开始不是范围?

What is the proper, simple way of selecting only specific columns (e.g. ['a', 'c'], not a range) from the second level?

目前我这样做:

import itertools
tuples = [i for i in itertools.product(['one', 'two'], ['a', 'c'])]
new_index = pd.MultiIndex.from_tuples(tuples)
print(new_index)
data.reindex_axis(new_index, axis=1)

然而,这并不是一个好的解决方案,因为我必须淘汰 itertools ,手工构建另一个MultiIndex然后重新索引(我的实际代码甚至更麻烦,因为列列表不是那么容易获取)。我很确定必须有一些 ix xs 这样做的方式,但我尝试的所有内容都会导致错误。

It doesn't feel like a good solution, however, because I have to bust out itertools, build another MultiIndex by hand and then reindex (and my actual code is even messier, since the column lists aren't so simple to fetch). I am pretty sure there has to be some ix or xs way of doing this, but everything I tried resulted in errors.

推荐答案

这不是很好,但也许:

>>> data
        one                           two                    
          a         b         c         a         b         c
0 -0.927134 -1.204302  0.711426  0.854065 -0.608661  1.140052
1 -0.690745  0.517359 -0.631856  0.178464 -0.312543 -0.418541
2  1.086432  0.194193  0.808235 -0.418109  1.055057  1.886883
3 -0.373822 -0.012812  1.329105  1.774723 -2.229428 -0.617690
>>> data.ix[:,data.columns.get_level_values(1).isin({"a", "c"})]
        one                 two          
          a         c         a         c
0 -0.927134  0.711426  0.854065  1.140052
1 -0.690745 -0.631856  0.178464 -0.418541
2  1.086432  0.808235 -0.418109  1.886883
3 -0.373822  1.329105  1.774723 -0.617690

会起作用吗?

这篇关于如何从MultiIndex列的DataFrame中仅选择特定列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-23 22:08