问题描述
有人可以通过下面所示的示例解释为什么loc在python大熊猫中使用吗?
Can anybody explain why is loc used in python pandas with examples like shown below?
for i in range(0, 2):
for j in range(0, 3):
df.loc[(df.Age.isnull()) & (df.Gender == i) & (df.Pclass == j+1),
'AgeFill'] = median_ages[i,j]
推荐答案
此处建议使用.loc
,因为方法df.Age.isnull()
,df.Gender == i
和df.Pclass == j+1
可能返回数据帧切片的视图或可能会返回副本.这会使大熊猫感到困惑.
The use of .loc
is recommended here because the methods df.Age.isnull()
, df.Gender == i
and df.Pclass == j+1
may return a view of slices of the data frame or may return a copy. This can confuse pandas.
如果不使用.loc
,最终将依次调用所有3个条件,这将导致您出现称为链式索引的问题.但是,当您使用.loc
时,只需一步即可访问所有条件,大熊猫不再困惑.
If you don't use .loc
you end up calling all 3 conditions in series which leads you to a problem called chained indexing. When you use .loc
however you access all your conditions in one step and pandas is no longer confused.
您可以在.
简单的答案是,尽管您通常可以不用使用.loc
而只需输入(例如)
The simple answer is that while you can often get away with not using .loc
and simply typing (for example)
df['Age_fill'][(df.Age.isnull()) & (df.Gender == i) & (df.Pclass == j+1)] \
= median_ages[i,j]
您将始终收到SettingWithCopy
警告,您的代码对此会有些混乱.
you'll always get the SettingWithCopy
warning and your code will be a little messier for it.
根据我的经验,.loc
花了我一段时间才得以解决,更新代码有点烦人.但这真的非常简单而且非常直观:df.loc[row_index,col_indexer]
.
In my experience .loc
has taken me a while to get my head around and it's been a bit annoying updating my code. But it's really super simple and very intuitive: df.loc[row_index,col_indexer]
.
有关更多信息,请参见建立索引并选择数据.
For more information see the pandas documentation on Indexing and Selecting Data.
这篇关于 pandas 中的loc函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!