问题描述
我有一个熊猫数据框df,例如:
I have a pandas dataframe, df, like:
name | grade | grade_type
---------------------------
sarah | B | letter
alice | A | letter
eliza | C | letter
beth | 76 | numeral
jones | 90 | numeral
df
中的所有值都是字符串,包括数字.我想根据检查grade_type
列,将grade
数值转换为字母,以获得:
All values in df
are strings, including the numbers. I want to convert the grade
numeric values into letters, based on checking the grade_type
column, to get:
name | grade | grade_type
---------------------------
sarah | B | letter
alice | A | letter
eliza | C | letter
beth | B | numeral
jones | A | numeral
为完整起见,数字到字母的等级转换为:
For completeness, the numeral-to-letter grade conversions are:
A: grade > 80
B: 70 < grade <= 80
C: 60 < grade <= 70
为什么这行不通?
for index, row in df.iterrows():
if row.grade_type == "numeral":
grade_val = int(row.grade.values[0])
if grade_val > 80:
row.grade = "A" # This assignment doesn't update row.grade!
elif...
另一种方法是使用df.apply(...lambda:...)
,但是我不太确定如何实现它,因为在决定是否更新grade
值之前,我们必须检查grade_type
列.
The alternative is using df.apply(...lambda:...)
, but I'm not too sure how to pull that off, since we have to check the grade_type
column before deciding whether or not to update the grade
value.
推荐答案
DataFrame不更新的原因是因为 iterrows():是副本.而您正在处理该副本.
The reason that your DataFrame doesn't update is because rows returned from iterrows(): are copies. And you're working on that copy.
您可以使用从返回的index
迭代并直接操作DataFrame:
You can use the index
returned from iterrows and manipulate DataFrame directly:
for index, row in df.iterrows():
grade_val = int(row.grade.values[0])
if grade_val > 80:
df.loc[index, 'grade'] = 'A'
...
或者如您所说,您可以使用 df. apply(),并向其传递一个自定义函数:
Or as you said you can use df.apply(), and pass it a custom function:
def get_grades(x):
if x['grade_type'] == 'letter':
return(x['grade_val'])
if x['grade_val'] > 80:
return "A"
...
df['grade'] = df.apply(lambda x: get_grades(x), axis=1)
您还可以在lambda中使用if
else
来检查x['grade_type']
是否为数字,如下所示,使用看起来更容易阅读的数字.
You can also use if
else
in your lambda to check if x['grade_type']
is numeric as follows, use the one that looks easier to read.
def get_grades(grade_val):
if grade_val > 80:
return "A"
...
df['grade'] = df.apply(lambda x: get_grades(x['grade'])
if x['grade_type'] == 'numeral' else x['grade'], axis=1)
这篇关于Python pandas:根据另一列的值更新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!