问题描述
无论如何,是否可以基于另一个变量来重命名值?在这里,我有两列,其中一列是ID,另一列是水果.但是,我在想是否可以根据ID唯一标识它们
Is there anyway to rename the values based on another variable? Over here I have two columns, one of which is ID and another is fruits. However, I was thinking would it be possible to uniquely identify them based on the ID
ID Fruits
1 Apple
1 Banana
1 Orange
1 Banana
2 Apple
2 Orange
2 Orange
3 Apple
3 Apple
3 Orange
希望实现这样的目标
ID Fruits
1 Apple
1 Banana
1 Orange
1 Banana1
2 Apple
2 Orange
2 Orange1
3 Apple
3 Apple1
3 Orange
推荐答案
设置
Setup
df = pd.DataFrame({
'id': [1,1,1,1,2,2,2,3,3,3],
'fruit': ['Apple', 'Banana', 'Orange', 'Banana', 'Apple', 'Orange', 'Orange', 'Apple', 'Apple', 'Orange']
})
选项1
cumcount
与 replace
和字符串连接(我使用的正则表达式模式仅匹配单个零,因此此答案可以还支持每组9个重复项):
Option 1cumcount
with replace
and string concatenation (I use a regex pattern that only matches a single zero so this answer can also support more than 9 duplicates per group):
df['fruit'] = df.fruit + df.groupby(
['id', 'fruit']).cumcount().astype(str).replace(
r'^0$', '', regex=True
)
选项2
存储groupby并通过 fillna
使用布尔索引(我个人更喜欢这种方法)
Option 2
Store the groupby and use boolean indexing with fillna
(I personally prefer this approach)
s = df.groupby(['id', 'fruit']).cumcount()
df['fruit'] = (df.fruit + s[s>0].astype(str)).fillna(df.fruit)
两者均导致:
id fruit
0 1 Apple
1 1 Banana
2 1 Orange
3 1 Banana1
4 2 Apple
5 2 Orange
6 2 Orange1
7 3 Apple
8 3 Apple1
9 3 Orange
这篇关于Python-根据另一个变量重命名重复的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!