训练数据如下:
p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g
e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m
p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u
e,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g
e,x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,p,w,o,p,k,n,g
e,b,s,w,t,a,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,n,m
第一列是有关该蘑菇是否可食用的标签。(e:食用,p:有毒)
我想通过食用与否将此数据分为两部分。
我的代码如下:
mushdf = pd.read_csv('agaricus-lepiota.data') #load in two data for mushroom and iris
mushdf.columns = ['edible?','cap-shape','cap-surface','cap-color','bruises?','odor',
'gill-attachment','gill-spacing','gill-size','gill-color',
'stalk-shape','stalk-root','stalk-surface-above-ring','stalk-surface-below-ring',
'stalk-color-above-ring','stalk-color-below-ring','veil-type','veil-color',
'ring-number','ring-type','spore-print-color','population','habitat']
print(mushdf)
mushdic = {key: mushdf for (key, mushdf) in mushdf.groupby('edible?')}
for key in mushdic:
print(f'mushdic[{key}]')
print(mushdic[key])
print('-'*50)
问题是,当我将第2行的
mushdf.columns
删除到第6行时,此代码有效。但是,当我执行mushdf.columns
时,终端返回错误消息。与另一列相同的方法也可以。例如,
mushdic = {key: mushdf for (key, mushdf) in mushdf.groupby('bruises?')}
运行正常。我对此一无所知。
Traceback (most recent call last):
File "e:\Visual Studio Project\LiMing\vs2017_python\.vscode\helloworld.py", line 11, in <module>
mushdic = {key: mushdf for (key, mushdf) in mushdf.groupby('edible?')}
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\pandas\core\generic.py", line 7894, in groupby
**kwargs
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\pandas\core\groupby\groupby.py", line 2522, in groupby
return klass(obj, by, **kwds)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\pandas\core\groupby\groupby.py", line 391, in __init__
mutated=self.mutated,
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\pandas\core\groupby\grouper.py", line 621, in _get_grouper
raise KeyError(gpr)
KeyError: 'edible?'
The terminal process terminated with exit code: 1
最佳答案
pandas.read_csv
表示csv文件中的第一行是标题。由于您的csv文件没有标题,因此您需要在导入过程中告知这一点。您还应该在此处传递列名称:
mushdf = pd.read_csv('agaricus-lepiota.data', header=None, names=[
'edible?','cap-shape','cap-surface','cap-color','bruises?','odor',
'gill-attachment','gill-spacing','gill-size','gill-color',
'stalk-shape','stalk-root','stalk-surface-above-ring','stalk-surface-below-ring',
'stalk-color-above-ring','stalk-color-below-ring','veil-type','veil-color',
'ring-number','ring-type','spore-print-color','population','habitat'])