本文介绍了ValueError:使用 pandas Pivot_table不允许使用负尺寸的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试制作项目协作推荐代码。我的完整数据集可以在找到。我希望用户成为行,项目变成列,而等级成为值。
I am trying to make item-item collaborative recommendation code. My full dataset can be found here. I want the users to become rows, items to become columns, and ratings to be the values.
我的代码如下:
import pandas as pd
import numpy as np
file = pd.read_csv("data.csv", names=['user', 'item', 'rating', 'timestamp'])
table = pd.pivot_table(file, values='rating', index=['user'], columns=['item'])
我的数据如下:
user item rating timestamp
0 A2EFCYXHNK06IS 5555991584 5 978480000
1 A1WR23ER5HMAA9 5555991584 5 953424000
2 A2IR4Q0GPAFJKW 5555991584 4 1393545600
3 A2V0KUVAB9HSYO 5555991584 4 966124800
4 A1J0GL9HCA7ELW 5555991584 5 1007683200
错误为:
Traceback (most recent call last):
File "D:\python\reco.py", line 9, in <module>
table=pd.pivot_table(file,values='rating',index=['user'],columns=['item'])
File "C:\python35\lib\site-packages\pandas\tools\pivot.py", line 133, in pivot_table
table = agged.unstack(to_unstack)
File "C:\python35\lib\site-packages\pandas\core\frame.py", line 4047, in unstack
return unstack(self, level, fill_value)
File "C:\python35\lib\site-packages\pandas\core\reshape.py", line 402, in unstack
return _unstack_multiple(obj, level)
File "C:\python35\lib\site-packages\pandas\core\reshape.py", line 297, in _unstack_multiple
unstacked = dummy.unstack('__placeholder__')
File "C:\python35\lib\site-packages\pandas\core\frame.py", line 4047, in unstack
return unstack(self, level, fill_value)
File "C:\python35\lib\site-packages\pandas\core\reshape.py", line 406, in unstack
return _unstack_frame(obj, level, fill_value=fill_value)
File "C:\python35\lib\site-packages\pandas\core\reshape.py", line 449, in _unstack_frame
fill_value=fill_value)
File "C:\python35\lib\site-packages\pandas\core\reshape.py", line 103, in __init__
self._make_selectors()
File "C:\python35\lib\site-packages\pandas\core\reshape.py", line 137, in _make_selectors
mask = np.zeros(np.prod(self.full_shape), dtype=bool)
ValueError: negative dimensions are not allowed
推荐答案
我不能保证这会完成(我已经厌倦了等待它进行计算),但是这是一种创建稀疏数据帧的方法,希望该方法可以最大程度地减少内存和帮助。
I cannot guarantee that this will complete (I got tired of waiting for it to compute), but here's a way to create a sparse dataframe that hopefully should minimize memory and help.
import pandas as pd
import numpy as np
file=pd.read_csv("data.csv",names=['user','item','rating','timestamp'])
from scipy.sparse import csr_matrix
user_u = list(sorted(file.user.unique()))
item_u = list(sorted(file.item.unique()))
row = file.user.astype('category', categories=user_u).cat.codes
col = file.item.astype('category', categories=item_u).cat.codes
data = file['rating'].tolist()
sparse_matrix = csr_matrix((data, (row, col)), shape=(len(user_u), len(item_u)))
df = pd.SparseDataFrame([ pd.SparseSeries(sparse_matrix[i].toarray().ravel(), fill_value=0)
for i in np.arange(sparse_matrix.shape[0]) ],
index=user_u, columns=item_u, default_fill_value=0)
请参见有关更多选项。
这篇关于ValueError:使用 pandas Pivot_table不允许使用负尺寸的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!