问题描述
试图了解这种行为(为什么发生;如果是故意的,那么这样做的动机是什么?)
trying to understand this behavior (why it happens; and if it was intentional, then what was the motivation for it to be done this way)
所以我创建一个数据框
np.random.seed(0)
df = pd.DataFrame(np.random.random((4,2)))
0 1
0 0.548814 0.715189
1 0.602763 0.544883
2 0.423655 0.645894
3 0.437587 0.891773
我可以像这样引用列
df.columns = ['a','b']
df.a
0
0 0.548814
1 0.602763
2 0.423655
3 0.437587
我什至可以创造一个新的专栏
I can even make, what I think is a new column
df.third = pd.DataFrame(np.random.random((4,1)))
但df
仍然
df
0 1
0 0.548814 0.715189
1 0.602763 0.544883
2 0.423655 0.645894
3 0.437587 0.891773
但是,df.third
也存在(但是我在Spyder的变量查看器中看不到它)
however, df.third
also exists (but i can't see it in my variable viewer in Spyder)
df.third
0
0 0.118274
1 0.639921
2 0.143353
3 0.944669
如果我想添加第三列,则必须执行以下操作
if I wanted to add a third column, I'd have to do the following
df['third'] = pd.DataFrame(np.random.random((4,1)))
a b third
0 0.548814 0.715189 0.568045
1 0.602763 0.544883 0.925597
2 0.423655 0.645894 0.071036
3 0.437587 0.891773 0.087129
所以,我的问题是,当我做df.third与df ['third']时会发生什么?
So, my question is what's going on when I do df.third versus df['third']?
推荐答案
由于它添加了third
作为属性,因此应停止访问列作为属性,并始终使用df['third']
以避免模棱两可的行为.
Because it added third
as an attribute, you should stop accessing columns as an attribute and always use df['third']
to avoid ambiguous behaviour.
您应该养成始终使用df[col_name]
访问和分配列的习惯,这是为了避免出现类似问题
You should get into the habit of always accessing and assigning columns using df[col_name]
, this is to avoid problems like
df.mean = some_calc()
这里的问题是mean
是DataFrame的方法
well the problem here is that mean
is a method for a DataFrame
因此,您然后用一些计算值覆盖了方法.
So you've then overwritten a method with some computed value.
这里的问题是,这是为了方便起见而设计的一部分,数据分析书和一些早期的在线视频演示中的大熊猫将这作为分配给新列的一种方式,但是细微的错误可能如此普遍以至于确实应该禁止并删除IMO
The problem here is that this was part of the design as a convenience and the pandas for data analysis book and some early online video presentations showed this as a way of assigning to a new column but the subtle errors can be so pervasive that it really should be banned and removed IMO
很抱歉,我不能对此施加足够的压力,停止将列作为属性引用,这是我的一个严重错误,不幸的是,我仍然看到很多答案显示此用法
Seriously I can't stress this enough, stop referring to columns as an attribute, it's a serious bugbear of mine and unfortunately I still see lots of answers posted showing this usage
您会看到没有添加新列:
You can see that no new column is added:
In [97]:
df.third = pd.DataFrame(np.random.random((4,1)))
df.columns
Out[97]:
Index(['a', 'b'], dtype='object')
您可以看到third
被添加为属性:
You can see that third
was added as an attribute:
In [98]:
df.__dict__
Out[98]:
{'_data': BlockManager
Items: Index(['a', 'b'], dtype='object')
Axis 1: Int64Index([0, 1, 2, 3], dtype='int64')
FloatBlock: slice(0, 2, 1), 2 x 4, dtype: float64,
'_iloc': <pandas.core.indexing._iLocIndexer at 0x7e73b00>,
'_item_cache': {},
'is_copy': None,
'third': 0
0 0.844821
1 0.286501
2 0.459170
3 0.243452}
您可以看到您有Items
,__data
,Axis 1
等,但是随后您还有'third'
这是一个属性
You can see that you have an Items
, __data
, Axis 1
etc but then you also have 'third'
which is an attribute
这篇关于当我以以下方式修改 pandas 数据框时会发生什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!