问题描述
我使用 pandas.cut
和由 IntervalIndex.from_tuples
创建的 bin 对数据框中的列进行了离散化.
I discretized a column in my dataframe using pandas.cut
with bins created by IntervalIndex.from_tuples
.
剪切按预期工作,但是类别显示为我在 IntervalIndex
中指定的元组.有没有办法将类别重命名为不同的标签,例如(小、中、大)?
The cut works as intended however the categories are shown as the tuples I specified in the IntervalIndex
. Is there any way to rename the categories into a different label e.g. (Small, Medium, Large)?
示例:
bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)
结果类别将是:
[NaN, (0, 1], NaN, (2, 3], (4, 5]]
Categories (3, interval[int64]): [(0, 1] < (2, 3] < (4, 5]]
我正在尝试将 [(0, 1] < (2, 3] < (4, 5]]
更改为 1, 2 ,3
> 或 小、中、大
.
I am trying to change [(0, 1] < (2, 3] < (4, 5]]
into something like 1, 2 ,3
or small, medium ,large
.
遗憾的是,在使用 IntervalIndex 时,pd.cut 的标签参数参数会被忽略.
Sadly, the labels parameter arguments of pd.cut is ignored when using IntervalIndex.
谢谢!
更新:
感谢@SergeyBushmanov,我注意到这个问题仅在尝试更改数据框内的类别标签时才存在(这正是我想要做的).更新示例:
Thanks to @SergeyBushmanov I noticed that this issue only exist when trying to change category labels inside a dataframe (which is what I am trying to do). Updated example:
In [1]: df = pd.DataFrame([0, 0.5, 1.5, 2.5, 4.5], columns = ['col1'])
In [2]: bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
In [3]: df['col1'] = pd.cut(df['col1'], bins)
In [4]: df['col1'].categories = ['small','med','large']
In [5]: df['col1']
Out [5]:
0 NaN
1 (0, 1]
2 NaN
3 (2, 3]
4 (4, 5]
Name: col1, dtype: category
Categories (3, interval[int64]): [(0, 1] < (2, 3] < (4, 5]]
推荐答案
如果我们有一些数据:
bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
x = pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)
您可以尝试重新分配类别,例如:
You may try re-assigning categories like :
In [7]: x.categories = [1,2,3]
In [8]: x
Out[8]:
[NaN, 1, NaN, 2, 3]
Categories (3, int64): [1 < 2 < 3]
或:
In [9]: x.categories = ["small", "medium", "big"]
In [10]: x
Out[10]:
[NaN, small, NaN, medium, big]
Categories (3, object): [small < medium < big]
更新:
df = pd.DataFrame([0, 0.5, 1.5, 2.5, 4.5], columns = ['col1'])
bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
x = pd.cut(df["col1"].to_list(),bins)
x.categories = [1,2,3]
df['col1'] = x
df.col1
0 NaN
1 1
2 NaN
3 2
4 3
Name: col1, dtype: category
Categories (3, int64): [1 < 2 < 3]
这篇关于使用带有 IntervalIndex 的 pandas.cut 后如何重命名类别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!