给定DataFrame,di
import pandas as pd
import numpy as np
data = {
"Event": ['Biathlon', 'Ski Jump', 'Slalom', 'Downhill'],
"Award": ['Gold', 'Bronze', 'Gold', 'Silver'],
"Points": ['100', '10', '100', '40']
}
d = pd.DataFrame(data)
di = d.set_index(["Award","Event"])
print(di)
Points
Award Event
Gold Biathlon 100
Bronze Ski Jump 10
Gold Slalom 100
Silver Downhill 40
假设我想在“冬季两项”或“激流回旋”中选择所有获得“金奖”奖项的行...为什么会失败?
di.loc[('Gold',['Biathlon','Slalom']),:]
根据pandas documentation中的示例,看来这应该可行。我从下面的文档中复制了该示例:
#example from http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers
def mklbl(prefix,n):
return ["%s%s" % (prefix,i) for i in range(n)]
miindex = pd.MultiIndex.from_product([mklbl('A',4),
mklbl('B',2),
mklbl('C',4),
mklbl('D',2)])
micolumns = pd.MultiIndex.from_tuples([('a','foo'),('a','bar'),
('b','foo'),('b','bah')],
names=['lvl0', 'lvl1'])
dfmi = pd.DataFrame(np.arange(len(miindex)*len(micolumns)).reshape((len(miindex),len(micolumns))),
index=miindex,
columns=micolumns).sort_index().sort_index(axis=1)
dfmi.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:]
#this also works
dfmi.loc[(['A1','A3'],['B0','B1'], ['C1','C3']),:]
最佳答案
您需要先对索引进行排序:
In [15]:
data = {
"Event": ['Biathlon', 'Ski Jump', 'Slalom', 'Downhill'],
"Award": ['Gold', 'Bronze', 'Gold', 'Silver'],
"Points": ['100', '10', '100', '40']
}
d = pd.DataFrame(data)
di = d.set_index(["Award","Event"])
di = di.sort_index()
di
Out[15]:
Points
Award Event
Bronze Ski Jump 10
Gold Biathlon 100
Slalom 100
Silver Downhill 40
In [16]:
di.loc[('Gold',['Biathlon','Slalom']),:]
Out[16]:
Points
Award Event
Gold Biathlon 100
Slalom 100
关于python - 使用标签列表从pandas DataFrame中选择观察子集,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/35710423/