问题描述
这之前已经讨论过,但答案相互矛盾:
This has been discussed before, but with conflicting answers:
我想知道的是:
- 为什么
inplace = False
是默认行为? - 什么时候换比较好?(好吧,我可以更改它,所以我想这是有原因的).
- 这是安全问题吗?也就是说,操作是否会因
inplace = True
而失败/行为异常? - 我能否提前知道某个
inplace = True
操作是否会真的"?就地进行?
- Why is
inplace = False
the default behavior? - When is it good to change it? (well, I'm allowed to change it, so I guess there's a reason).
- Is this a safety issue? that is, can an operation fail/misbehave due to
inplace = True
? - Can I know in advance if a certain
inplace = True
operation will "really" be carried out in-place?
- 许多 Pandas 操作都有一个
inplace
参数,始终默认为False
,这意味着原始 DataFrame 未受影响,并且操作返回一个新的 DF. - 当设置
inplace = True
时,操作可能在原始DF上工作,但它可能仍然在幕后的副本上工作,并且只需重新分配引用时完成.
- Many Pandas operations have an
inplace
parameter, always defaulting toFalse
, meaning the original DataFrame is untouched, and the operation returns a new DF. - When setting
inplace = True
, the operation might work on the original DF, but it might still work on a copy behind the scenes, and just reassign the reference when done.
- 速度更快,内存占用更少(第一个链接显示
reset_index()
运行速度提高两倍,使用峰值内存的一半!).
- Can be both faster and less memory hogging (the first link shows
reset_index()
runs twice as fast and uses half the peak memory!).
- 允许链式/函数式语法:
df.dropna().rename().sum()...
这很好,并提供了惰性求值或更有效的重新排序的机会(虽然我认为 Pandas 不会这样做). - 在可能是底层 DF 的切片/视图的对象上使用
inplace = True
时,Pandas 必须执行SettingWithCopy
检查,这很昂贵.inplace = False
避免了这种情况. - 始终如一可预测的幕后行为.
- Allows chained/functional syntax:
df.dropna().rename().sum()...
which is nice, and offers a chance for lazy evaluation or a more efficient re-ordering (though I don't think Pandas is doing this). - When using
inplace = True
on an object which is potentially a slice/view of an underlying DF, Pandas has to do aSettingWithCopy
check, which is expensive.inplace = False
avoids this. - Consistent & predictable behavior behind the scenes.
因此,将复制与视图问题放在一边,除非专门编写链式语句,否则始终使用 inplace = True
似乎性能更高.但这不是 Pandas 的默认选择,所以我错过了什么?
So, putting the copy-vs-view issue aside, it seems more performant to always use inplace = True
, unless specifically writing a chained statement. But that's not the default Pandas opt for, so what am I missing?
推荐答案
是的,是的.不仅有害.相当有害.此 GitHub 问题 提议弃用 inplace
参数api-wide 在不久的将来某个时候.简而言之,这里是 inplace
参数的所有错误:
Yes, it is. Not just harmful. Quite harmful. This GitHub issue is proposing the inplace
argument be deprecated api-wide sometime in the near future. In a nutshell, here's everything wrong with the inplace
argument:
inplace
,顾名思义,通常不会阻止创建副本,并且(几乎)从不提供任何性能优势inplace
不适用于方法链inplace
在 DataFrame 列上调用时会导致可怕的SettingWithCopyWarning
,并且有时可能无法就地更新列
inplace
, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefitsinplace
does not work with method chaininginplace
can lead to the dreadedSettingWithCopyWarning
when called on a DataFrame column, and may sometimes fail to update the column in-place
以上痛点都是初学者常见的陷阱,去掉这个选项会大大简化API.
The pain points above are all common pitfall for beginners, so removing this option will simplify the API greatly.
我们更深入地了解以上几点.
We take a look at the points above in more depth.
性能
一个常见的误解是使用 inplace=True
将导致更高效或优化的代码.一般来说,使用inplace=True
没有性能优势.方法的大多数就地和非就地版本无论如何都会创建数据的副本,就地版本会自动将副本分配回来.副本无法避免.
Performance
It is a common misconception that using inplace=True
will lead to more efficient or optimized code. In general, there no performance benefits to using inplace=True
. Most in-place and out-of-place versions of a method create a copy of the data anyway, with the in-place version automatically assigning the copy back. The copy cannot be avoided.
方法链inplace=True
也阻碍方法链.对比
result = df.some_function1().reset_index().some_function2()
相对于
temp = df.some_function1()
temp.reset_index(inplace=True)
result = temp.some_function2()
意外陷阱
要记住的最后一个警告是调用 inplace=True
可以触发 SettingWithCopyWarning
:
df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})
df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
这可能会导致意外行为.
Which can cause unexpected behavior.
这篇关于在 pandas 中, inplace = True 是否被认为有害?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!