本文介绍了 pandas 剪成不唯一的标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对数据进行装箱并根据装箱应用浮点值.我以为pandas.cut是实现此目的的工具,但显然,每个bin标签都需要唯一的值.

I'm trying to bin data and apply a float value based on the bin. I thought pandas.cut was the tool for this, but apparently it requires unique values for each bin label.

values = [0.6, 0.5, 0.5, 0.6, 0.8, 0.9]
bins = [0, 2, 5, 10, 15, 25, 200]
binned = pd.cut(original_table[field], bins, labels=values)

>>> ValueError: Categorical categories must be unique

我的数据(original_table)非常大,并且迭代进行的速度都很慢,这就是为什么cut是一个吸引人的工具的原因.是否有解决方法可以使pd.cut正常工作?

My data (original_table) is very large and doing anything iteratively is quite slow, which is why cut was an appealing tool. Is there a workaround to make pd.cut work for this?

推荐答案

找到了解决方法:

values = [0.6, 0.5, 0.5, 0.6, 0.8, 0.9]
bins = [0, 2, 5, 10, 15, 25, 200]
binned = np.array(values)[pd.cut(original_table[field], bins, labels=False)]

这篇关于 pandas 剪成不唯一的标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-21 09:26