python - 导出具有MultiIndex的Pandas DataFrame

我刚刚发现了熊猫，并且对它的功能印象深刻。
我很难理解如何使用带有MultiIndex的DataFrame。

我有两个问题：

（1）导出数据框

这是我的问题：
这个数据集

import pandas as pd
import StringIO
d1 = StringIO.StringIO(
     """Gender,Employed,Region,Degree
     m,yes,east,ba
     m,yes,north,ba
     f,yes,south,ba
     f,no,east,ba
     f,no,east,bsc
     m,no,north,bsc
     m,yes,south,ma
     f,yes,west,phd
     m,no,west,phd
     m,yes,west,phd """
   )

df = pd.read_csv(d1)

# Frequencies tables
tab1 = pd.crosstab(df.Gender, df.Region)
tab2 = pd.crosstab(df.Gender, [df.Region, df.Degree])
tab3 = pd.crosstab([df.Gender, df.Employed], [df.Region, df.Degree])

# Now we export the datasets
tab1.to_excel('H:/test_tab1.xlsx')  # OK
tab2.to_excel('H:/test_tab2.xlsx') # fails
tab3.to_excel('H:/test_tab3.xlsx') # fails

我可能想到的一种解决方法是更改列（R的方式）

def NewColums(DFwithMultiIndex):
       NewCol = []
       for item in DFwithMultiIndex.columns:
               NewCol.append('-'.join(item))
       return NewCol

# New Columns
tab2.columns = NewColums(tab2)
tab3.columns = NewColums(tab3)

# New export
tab2.to_excel('H:/test_tab2.xlsx')  # OK
tab3.to_excel('H:/test_tab3.xlsx')  # OK

我的问题是：在文档中我错过的Pandas中有没有更有效的方法？

2）选择列

这种新结构不允许选择给定变量上的列（首先是分层索引的优势）。如何选择包含给定字符串（例如'-ba'）的列？

附注：我看到了与this question相关的内容，但尚未理解建议的答复

最佳答案

目前，这似乎是to_excel中的错误，作为一种解决方法，我建议您使用to_csv（似乎没有显示此问题）。

我将此添加为an issue on github。

要回答第二个问题，如果您真的需要使用to_excel ...

您可以使用filter选择仅包含'-ba'的那些列：

In [21]: filter(lambda x: '-ba' in x, tab2.columns)
Out[21]: ['east-ba', 'north-ba', 'south-ba']

In [22]: tab2[filter(lambda x: '-ba' in x, tab2.columns)]
Out[22]:
        east-ba  north-ba  south-ba
Gender
     f        1         0         1
     m        1         1         0

关于python - 导出具有MultiIndex的Pandas DataFrame，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/14341584/