Pandas DataFrame 将函数应用于多列并输出多列

本文介绍了Pandas DataFrame 将函数应用于多列并输出多列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在寻找应用函数的最佳方式，该函数采用多个单独的 Pandas DataFrame 列并在同一个所述 DataFrame 中输出多个新列.假设我有以下内容:

I have been scouring SO for the best way of applying a function that takes multiple separate Pandas DataFrame columns and outputs multiple new columns in the same said DataFrame. Let's say I have the following:

def apply_func_to_df(df):
    df[['new_A', 'new_B']] = df.apply(lambda x: transform_func(x['A'], x['B'], x['C']), axis=1)

def transform_func(value_A, value_B, value_C):
    # do some processing and transformation and stuff
    return new_value_A, new_value_B

我试图将如上所示的这个函数应用到整个 DataFrame df 以输出 2 个新列.然而，这可以推广到一个用例/函数，它接受 n 个数据帧列并将 m 个新列输出到同一个数据帧.

I am trying to apply this function as shown above to the whole DataFrame df in order to output 2 NEW columns. However, this can generalize to a usecase/function that takes in n DataFrame columns and outputs m new columns to the same DataFrame.

以下是我一直在关注的事情(取得了不同程度的成功):

The following are things I have been looking at (with varying degrees of success):

为函数调用创建一个 Pandas 系列，然后附加到现有的 DataFrame，
压缩输出列(但在我当前的实现中发生了一些问题)
重写基本函数 transform_func 以明确期望行(即字段)A、B、C> 如下，然后对 df 进行应用:

Create a Pandas Series for the function call, then append to the existing DataFrame,
Zip the output columns (but there are some issues that happen in my current implementation)
Re-write the basic function transform_func to explicitly expect rows (i.e. fields) A, B, C as follows, then do an apply to the df:

def transform_func_mod(df_row):
    # do something with df_row['A'], df_row['B'], df_row['C]
    return new_value_A, new_value_B

我想要一种非常通用的 Pythonic 方式来完成这项任务，同时考虑到性能(内存和时间方面).我将不胜感激，因为我对 Pandas 不熟悉，所以我一直在为此苦苦挣扎.

I would like a very general and Pythonic way to accomplish this task, while taking performance into account (both memory- and time-wise). I would appreciate any input on this, as I have been struggling with this due to my unfamiliarity with Pandas.

推荐答案

按照以下方式编写您的 transform_func:

Write your transform_func the following way:

它应该有一个参数 - 当前行，
这个函数可以从当前行读取单个列并充分利用它们，
返回的对象应该是一个系列，其中:
- 值 - 无论你想返回什么，
- index - 目标列名.
- it should have one parameter - the current row,
- this function can read individual columns from the current rowand make any use of them,
- the returned object should be a Series with:
  - values - whatever you want to return,
  - index - target column names.
  示例:假设所有 3 列都是 string 类型，连接 A 和 B 列，向 C:
  Example: Assuming that all 3 columns are of string type, concatenate A and B columns, add "some string" to C:
```
def transform_func(row):
    a = row.A; b = row.B; c = row.C;
    return pd.Series([ a + b, c + '_xx'], index=['new_A', 'new_B'])
```
  要仅获取新值，请将此函数应用于每一行:
  To get only the new values, apply this function to each row:
```
df.apply(transform_func, axis=1)
```
  请注意，生成的 DataFrame 保留了原始行的键(稍后我们将使用此功能).
  Note that the resulting DataFrame retains keys of the original rows(we will make use of this feature in a moment).
  或者，如果您想将这些新列添加到您的 DataFrame，请加入您的 df与上述应用程序的结果，将连接结果保存在原始df:
  Or if you want to add these new columns to your DataFrame, join your dfwith the result of the above application, saving the join result underthe original df:
```
df = df.join(df.apply(transform_func, axis=1))
```
  按照截至 03:36:34Z 的评论进行编辑
  使用 zip 可能是最慢的选择.基于行的函数应该更快，它是一种更直观的构造.可能最快的方法是分别为每一列编写 2 个向量化表达式.在这种情况下，类似于:
  Edit following the comment as of 03:36:34Z
  Using zip is probably the slowest option.Row-based function should be quicker and it is a more intuitive construction.Probably the quickest way is to write 2 vectorized expressions, for each column separately. In this case something like:
```
df['new_A'] = df.A + df.B
df['new_B'] = df.C + '_xx'
```
  但一般问题是基于行的函数是否可以表示为向量化表达式(就像我上面所做的那样).在否定"情况下，您可以应用基于行的函数.
  But generally the problem is whether a row-based functioncan be expressed as vectorized expressions (as I did above).In the "negative" case you can apply a row-based function.
  要比较每个解决方案的速度，请使用 %timeit.
  To compare how quick is each solution, use %timeit.
  
  这篇关于Pandas DataFrame 将函数应用于多列并输出多列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

Some