pandas DataFrame.assign参数

本文介绍了 pandas DataFrame.assign参数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

问题

如何使用assign返回添加了多个新列的原始DataFrame的副本?

How can assign be used to return a copy of the original DataFrame with multiple new columns added?

期望结果

df = pd.DataFrame({'A': range(1, 5), 'B': range(11, 15)})
>>> df.assign({'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})
   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28

尝试

上面的示例导致:

ValueError: Wrong number of items passed 2, placement implies 1.

背景

Pandas中的assign函数获取连接到新分配的列的相关数据框的副本，例如

The assign function in Pandas takes a copy of the relevant dataframe joined to the newly assigned column, e.g.

df = df.assign(C=df.B * 2)
>>> df
   A   B   C
0  1  11  22
1  2  12  24
2  3  13  26
3  4  14  28

0.19.2文档表示此功能意味着可以向数据框添加多个列.

The 0.19.2 documentation for this function implies that more than one column can be added to the dataframe.

此外:

关键字是列名.

该函数的源代码声明它接受字典:

The source code for the function states that it accepts a dictionary:

def assign(self, **kwargs):
    """
    .. versionadded:: 0.16.0
    Parameters
    ----------
    kwargs : keyword, value pairs
        keywords are the column names. If the values are callable, they are computed 
        on the DataFrame and assigned to the new columns. If the values are not callable, 
        (e.g. a Series, scalar, or array), they are simply assigned.

    Notes
    -----
    Since ``kwargs`` is a dictionary, the order of your
    arguments may not be preserved. The make things predicatable,
    the columns are inserted in alphabetical order, at the end of
    your DataFrame. Assigning multiple columns within the same
    ``assign`` is possible, but you cannot reference other columns
    created within the same ``assign`` call.
    """

    data = self.copy()

    # do all calculations first...
    results = {}
    for k, v in kwargs.items():

        if callable(v):
            results[k] = v(data)
        else:
            results[k] = v

    # ... and then assign
    for k, v in sorted(results.items()):
        data[k] = v

    return data

推荐答案

您可以通过提供每个新列作为关键字参数来创建多列:

You can create multiple column by supplying each new column as a keyword argument:

df = df.assign(C=df['A']**2, D=df.B*2)

通过使用**将字典作为关键字参数解压缩，可以使您的示例字典正常工作:

I got your example dictionary to work by unpacking the dictionary as keyword arguments using **:

df = df.assign(**{'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})

assign似乎应该可以使用字典，但是根据您发布的源代码，它目前似乎不受支持.

It seems like assign should be able to take a dictionary, but it doesn't look to be currently supported based on the source code you posted.

结果输出:

   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28

这篇关于 pandas DataFrame.assign参数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！