问题描述
我有一个有多列的熊猫数据框。我想从行和另一个列向量数据框 weight
中的值创建一个新列 weighted_sum
weighted_sum
应该具有以下值:
row [weighted_sum] = row [col0] * weight [0] + row [col1] * weight [1] + row [col2] * weight [2] + ...
我发现函数 编辑: 问题是你倍增一个fr具有不同大小的框架,具有不同的行索引。这是解决方案: 您可以访问列: 或者使用 将其全部合并: 这里是使用更大的 对于广泛的 所以, 注意:如果您的任何数据包含 I have a pandas data frame with multiple columns. I want to create a new column I found the function Edit:I changed things a bit. The problem is that you're multiplying a frame with a frame of a different size with a different row index. Here's the solution: You can either access the column: Or use To bring it all together: Here are the For a wide So, NOTE: If any of your data contain 这篇关于如何计算 pandas 中一行中所有元素的加权和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! sum(axis = 1)
,但不允许我乘以
我改变了一些东西。
weight
如下所示:
0
col1 0.5
col2 0.3
col3 0.2
df
如下所示:
col1 col2 col3
1.0 2.2 3.5
6.1 0.4 1.2
df * weight
返回数据帧已满的 Nan
值。
在[121]中:df = DataFrame([[1,2.2,3.5],[6.1,0.4 ,1.2]],columns = list('abc'))
在[122]中:weight = DataFrame(Series([0.5,0.3,0.2],index = list('abc' name = 0))
在[123]中:df
出[123]:
abc
0 1.00 2.20 3.50
1 6.10 0.40 1.20
在[124]中:weight
Out [124]:
0
a 0.50
b 0.30
c 0.20
[125]:df * weight
Out [125]:
0 abc
0 nan nan nan nan
1 nan nan nan nan
a nan nan nan nan
b nan nan nan nan
c nan nan nan nan
在[126]中:df * weight [0]
输出[126]:
abc
0 0.50 0.66 0.70
1 3.05 0.12 0.24
在[128]中:(df * weight [0])sum(1)
Out [128 ]:
0 1.86
1 3.41
dtype:float64
dot
取回另一个 DataFrame
在[127]中:df.dot(weight)
Out [127]:
0
0 1.86
1 3.41
在[130]中:df ['weighted_sum'] = df.dot(weight)
在[131]中:df
输出[131]:
abc weighted_sum
0 1.00 2.20 3.50 1.86
1 6.10 0.40 1.20 3.41
DataFrame
的 timeit
。
在[145]中:df = DataFrame(randn(10000000,3),columns = list('a bc')
weight
在[146]中:weight = DataFrame(Series([0.5,0.3,0.2],index = list('abc'),name = 0))
在[147]:timeit df.dot(weight)
10循环,最好3:57.5 ms每循环
在[148]中:timeit(df * weight [ 0])。sum(1)
10循环,最好3:125 ms每循环
DataFrame
:
在[162]中:df = DataFrame(randn(10000,1000))
在[163]中:weight = DataFrame(randn(1000,1))
在[164]中:timeit df。点(重量)
100循环,最佳3:每循环5.14毫秒
在[165]:timeit(df * weight [0])。sum(1)
10个循环,最好3:41.8 ms每循环
dot
更快,更可读。
NaN
s,那么你不应该使用 dot
,你应该使用multip-and-sum方法。 dot
不能处理 NaN
,因为它只是一个薄的包装器,围绕 numpy.dot()
(不处理 NaN
s)。weighted_sum
from the values in the row and another column vector dataframe weight
weighted_sum
should have the following value:row[weighted_sum] = row[col0]*weight[0] + row[col1]*weight[1] + row[col2]*weight[2] + ...
sum(axis=1)
, but it doesn't let me multiply with weight
.weight
looks like this: 0
col1 0.5
col2 0.3
col3 0.2
df
looks like this:col1 col2 col3
1.0 2.2 3.5
6.1 0.4 1.2
df*weight
returns a dataframe full of Nan
values.In [121]: df = DataFrame([[1,2.2,3.5],[6.1,0.4,1.2]], columns=list('abc'))
In [122]: weight = DataFrame(Series([0.5, 0.3, 0.2], index=list('abc'), name=0))
In [123]: df
Out[123]:
a b c
0 1.00 2.20 3.50
1 6.10 0.40 1.20
In [124]: weight
Out[124]:
0
a 0.50
b 0.30
c 0.20
In [125]: df * weight
Out[125]:
0 a b c
0 nan nan nan nan
1 nan nan nan nan
a nan nan nan nan
b nan nan nan nan
c nan nan nan nan
In [126]: df * weight[0]
Out[126]:
a b c
0 0.50 0.66 0.70
1 3.05 0.12 0.24
In [128]: (df * weight[0]).sum(1)
Out[128]:
0 1.86
1 3.41
dtype: float64
dot
to get back another DataFrame
In [127]: df.dot(weight)
Out[127]:
0
0 1.86
1 3.41
In [130]: df['weighted_sum'] = df.dot(weight)
In [131]: df
Out[131]:
a b c weighted_sum
0 1.00 2.20 3.50 1.86
1 6.10 0.40 1.20 3.41
timeit
s of each method, using a larger DataFrame
.In [145]: df = DataFrame(randn(10000000, 3), columns=list('abc'))
weight
In [146]: weight = DataFrame(Series([0.5, 0.3, 0.2], index=list('abc'), name=0))
In [147]: timeit df.dot(weight)
10 loops, best of 3: 57.5 ms per loop
In [148]: timeit (df * weight[0]).sum(1)
10 loops, best of 3: 125 ms per loop
DataFrame
:In [162]: df = DataFrame(randn(10000, 1000))
In [163]: weight = DataFrame(randn(1000, 1))
In [164]: timeit df.dot(weight)
100 loops, best of 3: 5.14 ms per loop
In [165]: timeit (df * weight[0]).sum(1)
10 loops, best of 3: 41.8 ms per loop
dot
is faster and more readable.NaN
s then you should not use dot
you should use the multiply-and-sum method. dot
cannot handle NaN
s since it is just a thin wrapper around numpy.dot()
(which doesn't handle NaN
s).