本文介绍了 pandas 每行应用多列而不是列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用apply时,我很难让熊猫返回多列.
I have trouble making pandas returning multiple columns when using apply.
示例:
import pandas as pd
import numpy as np
np.random.seed(1)
df = pd.DataFrame(index=range(2), columns=['a', 'b'])
df.loc[0] = [np.array((1,2,3))], 1
df.loc[1] = [np.array((4,5,6))], 1
df
a b
0 [[1, 2, 3]] 1
1 [[4, 5, 6]] 1
df2 = np.random.randint(1,9, size=(3,2))
df2
array([[4, 6],
[8, 1],
[1, 2]])
def example(x):
return np.transpose(df2) @ x[0]
df3 = df['a'].apply(example)
df3
0 [23, 14]
1 [62, 41]
我希望df3具有两列,每行每列中每个元素一个元素,而不是一列,每行中两个元素都有一个元素.
I want df3 to have two columns with one element in each per column per row, not one column with both elements per row.
所以我想要类似的东西
df3Wanted
col1 col2
0 23 14
1 62 41
有人知道如何解决此问题吗?
Does anybody know how to fix this?
推荐答案
要实现此目标,需要进行更改的夫妇:
Couple of changes are required to achieve this:
更新以下功能如下
def example(x):
return [np.transpose(df2) @ x[0]]
并在df3
wantedDF3 = pd.concat(df3.apply(pd.DataFrame, columns=['col1','col2']).tolist())
print(wantedDF3)
提供所需的输出:
col1 col2
0 40 12
0 97 33
避免发生内存错误问题的另一种方法来做同样的事情:保持example
函数和df3
不变(与问题相同)现在,最重要的是,使用下面的代码生成wantedDF3
Another way to do the same thing, to avoid memory error issues:Keep your example
function and df3
as it is (same as question)Now, just on top of that, use below code to generate wantedDF3
col1df = pd.DataFrame(df3.apply(lambda x: x[0]).values, columns=['col1'])
col2df = pd.DataFrame(df3.apply(lambda x: x[1]).values, columns=['col2'])
wantedDF3 = col1df.join(col2df)
这篇关于 pandas 每行应用多列而不是列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!