本文介绍了用可迭代产品制作Pandas MultiIndex吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我有两个或多个可迭代对象时,我有一个实用程序函数可用于创建Pandas MultiIndex,并且我希望为这些可迭代对象中的每个值的唯一配对使用一个索引键.看起来像这样

I have a utility function for creating a Pandas MultiIndex when I have two or more iterables and I want an index key for each unique pairing of the values in those iterables. It looks like this

import pandas as pd
import itertools

def product_index(values, names=None):
    """Make a MultiIndex from the combinatorial product of the values."""
    iterable = itertools.product(*values)
    idx = pd.MultiIndex.from_tuples(list(iterable), names=names)
    return idx

并且可以像这样使用:

a = range(3)
b = list("ab")
product_index([a, b])

要创建

MultiIndex(levels=[[0, 1, 2], [u'a', u'b']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])

这很好用,但似乎是一个普通用例,令我惊讶的是我不得不自己实现它.因此,我的问题是,在提供此功能的Pandas库本身中,我错过/误解了什么?

This works perfectly fine, but it seems like a common usecase and I am surprised I had to implement it myself. So, my question is, what have I missed/misunderstood in the Pandas library itself that offers this functionality?

编辑以添加:此功能已作为 >对于0.13.1版本.

Edit to add: This function has been added to Pandas as MultiIndex.from_product for the 0.13.1 release.

推荐答案

这是一个非常相似的结构(但是使用cartesian_product对于较大的数组要比itertools.product快)

This is a very similar construction (but using cartesian_product which for larger arrays is faster than itertools.product)

In [2]: from pandas.tools.util import cartesian_product

In [3]: MultiIndex.from_arrays(cartesian_product([range(3),list('ab')]))
Out[3]: 
MultiIndex(levels=[[0, 1, 2], [u'a', u'b']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])

可以作为一种便捷方法添加,也许MultiIndex.from_iterables(...)

could be added as a convience method, maybe MultiIndex.from_iterables(...)

请打开一个问题(如果需要,还可以提供PR)

pls open an issue (and PR if you'd like)

仅供参考,我很少真正地手动"构建多索引,几乎总是更容易构建框架,而只是set_index.

FYI I very rarely actually construct a multi-index 'manually', almost always easier to actually construct a frame and just set_index.

In [10]: df = DataFrame(dict(A = np.arange(6), 
                             B = ['foo'] * 3 + ['bar'] * 3, 
                             C = np.ones(6)+np.arange(6)%2)
                       ).set_index(['C','B']).sortlevel()

In [11]: df
Out[11]: 
       A
C B     
1 bar  4
  foo  0
  foo  2
2 bar  3
  bar  5
  foo  1

[6 rows x 1 columns]

这篇关于用可迭代产品制作Pandas MultiIndex吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!