问题描述
我有一个如下形状的熊猫数据框 df
:(763, 65)
I have a pandas dataframe df
of the following shape: (763, 65)
我使用以下代码创建了 4 个新列:
I use the following code to create 4 new columns:
df[['col1', 'col2', 'col3','col4']] = df.apply(myFunc, axis=1)
def myFunc(row):
#code to get some result from another dataframe
return result1, result2, result3, result4
myFunc
中返回的数据帧的形状是 (1, 4)
.代码运行出现以下错误:
The shape of the dataframe which is returned in myFunc
is (1, 4)
. The code runs into the following error:
ValueError: 传递值的形状是 (763, 4),索引意味着 (763, 65)
我知道 df
有 65 列,而从 myFunc
返回的数据只有 4 列.但是,我只想创建 4 个新列(即 col1
、col2
等),因此在我看来,代码仅返回 4 时是正确的myFunc
中的列.我做错了什么?
I know that df
has 65 columns and that the returned data from myFunc
only has 4 columns. However, I only want to create the 4 new columns (that is, col1
, col2
, etc.), so in my opinion the code is correct when it only returns 4 columns in myFunc
. What am I doing wrong?
推荐答案
Demo:
In [40]: df = pd.DataFrame({'a':[1,2,3]})
In [41]: df
Out[41]:
a
0 1
1 2
2 3
In [42]: def myFunc(row):
...: #code to get some result from another dataframe
...: # NOTE: trick is to return pd.Series()
...: return pd.Series([1,2,3,4]) * row['a']
...:
In [44]: df[['col1', 'col2', 'col3','col4']] = df.apply(myFunc, axis=1)
In [45]: df
Out[45]:
a col1 col2 col3 col4
0 1 1 2 3 4
1 2 2 4 6 8
2 3 3 6 9 12
免责声明:尽量避免使用 .apply(..., axis=1)
- 因为它是一个 for 循环
在引擎盖下- 即它不是矢量化的,并且与矢量化的 Pandas/Numpy ufuncs 相比,它的运行速度会慢得多.
Disclaimer: try to avoid using .apply(..., axis=1)
- as it's a for loop
under the hood - i.e. it's not vectoried and will work much slower compared to vectorized Pandas/Numpy ufuncs.
PS 如果您能在 myFunc
函数中提供您尝试计算的详细信息,那么我们可以尝试找到矢量化解决方案...
PS if you would provide details of what you are trying to calculate in the myFunc
functuion, then we could try to find a vectorized solution...
这篇关于使用 apply + 函数为 Pandas 数据框创建多个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!