我有一堆数据存储在vals中。指数是单调的,但不是连续的。我试图对数据的直方图进行一些分析,因此创建了以下结构:

hist = pd.DataFrame(vals)

hist['bins'] = pd.cut(vals, 100)

这是从一个实验仪器中获取的数据,我知道有些bins只有1或2个计数,我正试图去除它们。我试着使用groupby如下并得到以下错误(完整的回溯包含在注释末尾):
hist.groupby('bins').describe()

AttributeError: 'Categorical' object has no attribute 'flags'

但是,当我执行以下操作时,错误不会出现,我会得到预期的结果:
In[]:  hist.index = hist.bins
In[]:  hist['bins'] = hist.index
In[]:  desc = hist.groupby('bins').describe()
In[]:  desc.index.names = ['bins', 'describe']

Out[]: **describe with MultiIndex for rows.**

如果我不包括第二行,我仍然会得到一个hist['bins'] = hist.index,据我所知,回溯是相同的。
有人能解释一下AttributeError: 'Categorical' object has no attribute 'flags'是什么吗?为什么只有当我将flags设置为index然后用存储在bins中的版本替换bins时,它们才会起作用?
我的最终目标是删除计数小于等于6的容器的数据。如果有人比我追求的方式更容易解决问题,我也会很感激。
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-f606a051f2e4> in <module>()
----> 1 hist.groupby('bins').describe()

C:\Users\balterma\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.1.1975.win-x86_64\lib\site-packages\IPython\core\displayhook.pyc in __call__(self, result)
    245             self.start_displayhook()
    246             self.write_output_prompt()
--> 247             format_dict, md_dict = self.compute_format_data(result)
    248             self.write_format_data(format_dict, md_dict)
    249             self.update_user_ns(result)

C:\Users\balterma\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.1.1975.win-x86_64\lib\site-packages\IPython\core\displayhook.pyc in compute_format_data(self, result)
    155
    156         """
--> 157         return self.shell.display_formatter.format(result)
    158
    159     def write_format_data(self, format_dict, md_dict=None):

C:\Users\balterma\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.1.1975.win-x86_64\lib\site-packages\IPython\core\formatters.pyc in format(self, obj, include, exclude)
    150             md = None
    151             try:
--> 152                 data = formatter(obj)
    153             except:
    154                 # FIXME: log the exception

C:\Users\balterma\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.1.1975.win-x86_64\lib\site-packages\IPython\core\formatters.pyc in __call__(self, obj)
    479                 type_pprinters=self.type_printers,
    480                 deferred_pprinters=self.deferred_printers)
--> 481             printer.pretty(obj)
    482             printer.flush()
    483             return stream.getvalue()

C:\Users\balterma\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.1.1975.win-x86_64\lib\site-packages\IPython\lib\pretty.pyc in pretty(self, obj)
    360                             if callable(meth):
    361                                 return meth(obj, self, cycle)
--> 362             return _default_pprint(obj, self, cycle)
    363         finally:
    364             self.end_group()

C:\Users\balterma\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.1.1975.win-x86_64\lib\site-packages\IPython\lib\pretty.pyc in _default_pprint(obj, p, cycle)
    480     if getattr(klass, '__repr__', None) not in _baseclass_reprs:
    481         # A user-provided repr.
--> 482         p.text(repr(obj))
    483         return
    484     p.begin_group(1, '<')

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\base.pyc in __repr__(self)
     62         Yields Bytestring in Py2, Unicode String in py3.
     63         """
---> 64         return str(self)
     65
     66

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\base.pyc in __str__(self)
     42         if compat.PY3:
     43             return self.__unicode__()
---> 44         return self.__bytes__()
     45
     46     def __bytes__(self):

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\base.pyc in __bytes__(self)
     54
     55         encoding = get_option("display.encoding")
---> 56         return self.__unicode__().encode(encoding, 'replace')
     57
     58     def __repr__(self):

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\frame.pyc in __unicode__(self)
    507             width = None
    508         self.to_string(buf=buf, max_rows=max_rows, max_cols=max_cols,
--> 509                        line_width=width, show_dimensions=show_dimensions)
    510
    511         return buf.getvalue()

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\frame.pyc in to_string(self, buf, columns, col_space, colSpace, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, line_width, max_rows, max_cols, show_dimensions)
   1340                                            max_rows=max_rows,
   1341                                            max_cols=max_cols,
-> 1342                                            show_dimensions=show_dimensions)
   1343         formatter.to_string()
   1344

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\format.pyc in __init__(self, frame, buf, columns, col_space, header, index, na_rep, formatters, justify, float_format, sparsify, index_names, line_width, max_rows, max_cols, show_dimensions, **kwds)
    345             self.columns = frame.columns
    346
--> 347         self._chk_truncate()
    348
    349     def _chk_truncate(self):

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\format.pyc in _chk_truncate(self)
    410             else:
    411                 row_num = max_rows_adj // 2
--> 412                 frame = concat((frame.iloc[:row_num, :], frame.iloc[-row_num:, :]))
    413             self.tr_row_num = row_num
    414

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\tools\merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    752                        keys=keys, levels=levels, names=names,
    753                        verify_integrity=verify_integrity,
--> 754                        copy=copy)
    755     return op.get_result()
    756

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\tools\merge.pyc in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
    884         self.copy = copy
    885
--> 886         self.new_axes = self._get_new_axes()
    887
    888     def get_result(self):

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\tools\merge.pyc in _get_new_axes(self)
    957                 new_axes[i] = ax
    958
--> 959         new_axes[self.axis] = self._get_concat_axis()
    960         return new_axes
    961

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\tools\merge.pyc in _get_concat_axis(self)
   1009
   1010         if self.keys is None:
-> 1011             concat_axis = _concat_indexes(indexes)
   1012         else:
   1013             concat_axis = _make_concat_multiindex(indexes, self.keys,

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\tools\merge.pyc in _concat_indexes(indexes)
   1027
   1028 def _concat_indexes(indexes):
-> 1029     return indexes[0].append(indexes[1:])
   1030
   1031

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\index.pyc in append(self, other)
   4603             arrays = []
   4604             for i in range(self.nlevels):
-> 4605                 label = self.get_level_values(i)
   4606                 appended = [o.get_level_values(i) for o in other]
   4607                 arrays.append(label.append(appended))

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\index.pyc in get_level_values(self, level)
   4239         unique = self.levels[num]  # .values
   4240         labels = self.labels[num]
-> 4241         filled = com.take_1d(unique.values, labels, fill_value=unique._na_value)
   4242         values = unique._simple_new(filled, self.names[num],
   4243                                     freq=getattr(unique, 'freq', None),

C:\Users\balterma\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\common.pyc in take_nd(arr, indexer, axis, out, fill_value, mask_info, allow_fill)
    829         out_shape[axis] = len(indexer)
    830         out_shape = tuple(out_shape)
--> 831         if arr.flags.f_contiguous and axis == arr.ndim - 1:
    832             # minor tweak that can make an order-of-magnitude difference
    833             # for dataframes initialized directly from 2-d ndarrays

AttributeError: 'Categorical' object has no attribute 'flags'

最佳答案

这看起来是一个带有Categorical数据的错误,将在0.17.0版中更正(问题here)。
同时,您可以将类别强制转换为object数据类型-这是分配给索引并返回时发生的情况。

df['bins'] = df['bins'].astype(str)

关于python - Pandas groupby和describe标志AttributeError,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/31999862/

10-15 02:03