我有一个样本数据集,如下所示:
所以我想设置时间序列,因此将所有时间序列设置为列标题。所以我的脚本如下:
#!/usr/bin/python
import pandas as pd
import os
from os.path import basename
def generate_timeSeries(fileToProcess):
df = pd.read_csv(fileToProcess)
timestamps = df.pivot_table('C_Number',['A_Id', 'P_Id'], 'Time Stamp')
return timestamps
def main():
folder_path = "Input/"
for files in os.listdir(folder_path):
print "processing",files
file_to_open = os.path.join(folder_path, files)
unicoded_file = unicode(file_to_open).encode('utf8')
TimeSeries_dataframe = generate_timeSeries(unicoded_file)
TimeSeries_dataframe.to_csv('Output/%s_timeseries.csv' % os.path.splitext(files)[0], sep=',', encoding='utf-8')
if __name__ == "__main__":
main()
当我尝试运行脚本时,出现以下错误:
pandas.core.groupby.DataError: No numeric types to aggregate
这是完整的错误跟踪:
Traceback (most recent call last):
File "Error_AuthorTimeSeries.py", line 43, in <module>
main()
File "Error_AuthorTimeSeries.py", line 33, in main
TimeSeries_dataframe = generate_timeSeries(unicoded_file)
File "Error_AuthorTimeSeries.py", line 16, in generate_timeSeries
timestamps = df.pivot_table('C_Number',['A_ID', 'P_ID'], 'Time Stamp')
File "/usr/lib/python2.7/dist-packages/pandas/tools/pivot.py", line 104, in pivot_table
agged = grouped.agg(aggfunc)
File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 437, in agg
return self.aggregate(func, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1994, in aggregate
return getattr(self, arg)(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 452, in mean
return self._cython_agg_general('mean')
File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1917, in _cython_agg_general
new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1964, in _cython_agg_blocks
raise DataError('No numeric types to aggregate')
pandas.core.groupby.DataError: No numeric types to aggregate
附注:此问题几乎重复的是1,2和3。但是,他们没有为我的问题提供令人满意的答案。
我尝试使用
fill_value
和astype
方法。他们没有运气。编辑:
我试图通过添加以下内容来查找导致错误的原因(基于建议
pd.unique(df['C_number'].values)
并得到以下结果:
['163' '143' '51' '43' '34' '24' '20' '15' '14' '12' '11' '10' '9' '8' '7'
'6' '5' '4' '3' '2' '1' '\xc2\xa0' '145' '35' '16' '164' '146' '36' '21'
'165' '148' '37' '171' '154' '52' '44' '22' '17' '13' '158' '160' '147'
'161']
因此,尽管反复在UTF-8中使用编码,但我相信“ \ xc2 \ xa0”是元凶。因此,我在函数
generate_timeSeries()
中添加了以下两行:df.loc[df['Cited By Numbers']=='\xc2\xa0', 'Cited By Numbers' ] = '0'
df['Cited By Numbers'] = df['Cited By Numbers'].astype(int)
尽管对于具有
'\xc2\xa0'
的文件来说,这似乎是一个临时解决方案,但是对于不具有这些字符的文件,这似乎是一个问题,因为它会导致以下错误跟踪:Traceback (most recent call last):
File "imeSeries.py", line 66, in <module>
main()
File "TimeSeries.py", line 56, in main
TimeSeries_dataframe = generate_timeSeries(unicoded_file)
File "TimeSeries.py", line 23, in generate_timeSeries
df.loc[df['C_Numbers']=='\xc2\xa0', 'C_Numbers' ] = '0'
File "/usr/lib/python2.7/dist-packages/pandas/core/ops.py", line 563, in wrapper
res = na_op(values, other)
File "/usr/lib/python2.7/dist-packages/pandas/core/ops.py", line 532, in na_op
raise TypeError("invalid type comparison")
TypeError: invalid type comparison
解决此问题的正确方法是什么?
任何帮助都感激不尽。
最佳答案
我设法通过将以下行添加到原始脚本中来解决此问题。
df = df.convert_objects(convert_numeric=True)
关于python - Python Pandas数据透视表如何处理'\xc2\xa0'?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/33867408/