动态访问 pandas 数据框列

本文介绍了动态访问 pandas 数据框列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

考虑这个简单的例子

import pandas as pd

df = pd.DataFrame({'one' : [1,2,3],
                   'two' : [1,0,0]})

df 
Out[9]: 
   one  two
0    1    1
1    2    0
2    3    0

我想编写一个函数，该函数将一个数据帧df和一个列mycol作为输入.

I want to write a function that takes as inputs a dataframe df and a column mycol.

现在可行:

df.groupby('one').two.sum()
Out[10]: 
one
1    1
2    0
3    0
Name: two, dtype: int64

这也可行:

 def okidoki(df,mycol):
    return df.groupby('one')[mycol].sum()

okidoki(df, 'two')
Out[11]: 
one
1    1
2    0
3    0
Name: two, dtype: int64

但此失败

def megabug(df,mycol):
    return df.groupby('one').mycol.sum()

megabug(df, 'two')
 AttributeError: 'DataFrameGroupBy' object has no attribute 'mycol'

这是怎么了?

我担心okidoki使用某些链接可能会产生一些细微的错误( https://pandas.pydata.org/pandas-docs/stable/indexing.html#why-does-assignment -使用链式索引时失败).

I am worried that okidoki uses some chaining that might create some subtle bugs (https://pandas.pydata.org/pandas-docs/stable/indexing.html#why-does-assignment-fail-when-using-chained-indexing).

如何仍然保留语法groupby('one').mycol?可以将mycol字符串转换为可能以这种方式工作的东西吗?谢谢！

How can I still keep the syntax groupby('one').mycol? Can the mycol string be converted to something that might work that way?Thanks!

推荐答案

您将字符串作为第二个参数传递.实际上，您正在尝试执行以下操作:

You pass a string as the second argument. In effect, you're trying to do something like:

df.'two'

这是无效的语法.如果尝试动态访问列，则需要使用索引符号[...]，因为点/属性访问器符号不适用于动态访问.

Which is invalid syntax. If you're trying to dynamically access a column, you'll need to use the index notation, [...] because the dot/attribute accessor notation doesn't work for dynamic access.

可以进行动态访问.例如，您可以使用getattr(但是我不建议这样做，这是一种反模式):

Dynamic access on its own is possible. For example, you can use getattr (but I don't recommend this, it's an antipattern):

In [674]: df
Out[674]: 
   one  two
0    1    1
1    2    0
2    3    0

In [675]: getattr(df, 'one')
Out[675]: 
0    1
1    2
2    3
Name: one, dtype: int64

可以从groupby调用中动态选择属性，例如:

Dynamically selecting by attribute from a groupby call can be done, something like:

In [677]: getattr(df.groupby('one'), mycol).sum() 
Out[677]: 
one
1    1
2    0
3    0
Name: two, dtype: int64

但是不要这样做.这是一种可怕的反模式，比df.groupby('one')[mycol].sum()更具可读性.

But don't do it. It is a horrid anti pattern, and much more unreadable than df.groupby('one')[mycol].sum().

这篇关于动态访问 pandas 数据框列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

myCol