问题描述
假设我有这个:
>>> x = pandas.DataFrame([[1.0, 2.0, 3.0], [3, 4, 5]], columns=["A", "B", "C"])
>>> print x
A B C
0 1 2 3
1 3 4 5
现在,我要按行归一化x
,即将每行除以其总和.如此问题中所述,可以使用x = x.div(x.sum(axis=1), axis=0)
来实现.但是,这会创建一个 new 数据框.如果我的DataFrame很大,即使我立即将其分配给原始名称,创建新的DataFrame也会消耗大量内存.
Now I want to normalize x
by row --- that is, divide each row by its sum. As described in this question, this can be achieved with x = x.div(x.sum(axis=1), axis=0)
. However, this creates a new DataFrame. If my DataFrame is large, a lot of memory can be consumed in creating this new DataFrame, even though I immediately assign it to the original name.
是否有一种有效的方法来执行此操作?我想要类似x.idiv()
的东西,它提供div
的axis
选项,但是会在适当的位置更新x
.对于这种特定情况,我需要进行划分,但是有时对所有基本操作都使用相似的就地版本也很不错.
Is there an efficient way to perform this operation in place? I want something like x.idiv()
that provides the axis
option of div
but updates x
in place. For this specific case I need the division, but sometimes it would also be nice to have similar in-place versions for all the basic operations.
(我可以通过逐行遍历并将每个规范化的行分配回原始位置来对其进行更新,但这很慢,我正在寻找一种更有效的解决方案.)
(I can update it in place by iterating over it row-wise and assigning each normalized row back into the original, but this is slow, and I'm looking for a more efficient solution.)
推荐答案
您可以直接在numpy中执行此操作(无需创建副本):
You can do this directly in numpy (without creating a copy):
In [11]: x1 = x.values.T
In [12]: x1
Out[12]:
array([[ 1., 3.],
[ 2., 4.],
[ 3., 5.]])
In [13]: x1 /= x1.sum(0)
In [14]: x
Out[14]:
A B C
0 0.166667 0.333333 0.500000
1 0.250000 0.333333 0.416667
也许应该有一个就位的div标志??
这篇关于对 pandas DataFrame进行就地逐行操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!