问题描述
考虑这个简单的例子
import pandas as pd
df = pd.DataFrame({'one' : [1,2,3],
'two' : [1,0,0]})
df
Out[9]:
one two
0 1 1
1 2 0
2 3 0
我想编写一个函数,该函数将一个数据帧df
和一个列mycol
作为输入.
I want to write a function that takes as inputs a dataframe df
and a column mycol
.
现在可行:
df.groupby('one').two.sum()
Out[10]:
one
1 1
2 0
3 0
Name: two, dtype: int64
这也可行:
def okidoki(df,mycol):
return df.groupby('one')[mycol].sum()
okidoki(df, 'two')
Out[11]:
one
1 1
2 0
3 0
Name: two, dtype: int64
但此失败
def megabug(df,mycol):
return df.groupby('one').mycol.sum()
megabug(df, 'two')
AttributeError: 'DataFrameGroupBy' object has no attribute 'mycol'
这是怎么了?
我担心okidoki
使用某些链接可能会产生一些细微的错误( https://pandas.pydata.org/pandas-docs/stable/indexing.html#why-does-assignment -使用链式索引时失败).
I am worried that okidoki
uses some chaining that might create some subtle bugs (https://pandas.pydata.org/pandas-docs/stable/indexing.html#why-does-assignment-fail-when-using-chained-indexing).
如何仍然保留语法groupby('one').mycol
?可以将mycol
字符串转换为可能以这种方式工作的东西吗?谢谢!
How can I still keep the syntax groupby('one').mycol
? Can the mycol
string be converted to something that might work that way?Thanks!
推荐答案
您将字符串作为第二个参数传递.实际上,您正在尝试执行以下操作:
You pass a string as the second argument. In effect, you're trying to do something like:
df.'two'
这是无效的语法.如果尝试动态访问列,则需要使用索引符号[...]
,因为点/属性访问器符号不适用于动态访问.
Which is invalid syntax. If you're trying to dynamically access a column, you'll need to use the index notation, [...]
because the dot/attribute accessor notation doesn't work for dynamic access.
可以进行动态访问.例如,您可以使用getattr
(但是我不建议这样做,这是一种反模式):
Dynamic access on its own is possible. For example, you can use getattr
(but I don't recommend this, it's an antipattern):
In [674]: df
Out[674]:
one two
0 1 1
1 2 0
2 3 0
In [675]: getattr(df, 'one')
Out[675]:
0 1
1 2
2 3
Name: one, dtype: int64
可以从groupby调用中动态选择属性,例如:
Dynamically selecting by attribute from a groupby call can be done, something like:
In [677]: getattr(df.groupby('one'), mycol).sum()
Out[677]:
one
1 1
2 0
3 0
Name: two, dtype: int64
但是不要这样做.这是一种可怕的反模式,比df.groupby('one')[mycol].sum()
更具可读性.
But don't do it. It is a horrid anti pattern, and much more unreadable than df.groupby('one')[mycol].sum()
.
这篇关于动态访问 pandas 数据框列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!