问题描述
有人可以向我解释一下两者之间的区别吗?
Could somebody explain to me a difference between
df2 = df1
df2 = df1.copy()
df3 = df1.copy(deep=False)
我尝试了所有选项,并进行了以下操作:
I have tried all options and did as follows:
df1 = pd.DataFrame([1,2,3,4,5])
df2 = df1
df3 = df1.copy()
df4 = df1.copy(deep=False)
df1 = pd.DataFrame([9,9,9])
并返回如下:
df1: [9,9,9]
df2: [1,2,3,4,5]
df3: [1,2,3,4,5]
df4: [1,2,3,4,5]
因此,我发现.copy()
和.copy(deep=False)
之间的输出没有差异.为什么?
So, I observe no difference in the output between .copy()
and .copy(deep=False)
. Why?
我希望选项'=',copy(),copy(deep = False)之一返回[9,9,9]
I would expect one of the options '=', copy(), copy(deep=False) to return [9,9,9]
我想念什么?
推荐答案
如果看到创建的各种DataFrame的对象ID,则可以清楚地看到正在发生的事情.
If you see the object IDs of the various DataFrames you create, you can clearly see what is happening.
编写df2 = df1
时,将创建一个名为df2
的变量,并将其与ID为4541269200
的对象绑定.编写df1 = pd.DataFrame([9,9,9])
时,您将创建一个ID为4541271120
的 new 对象并将其绑定到变量df1
,但是该ID为4541269200
的对象先前已绑定到df1
继续生活.如果没有绑定到该对象的变量,它将被Python收集到垃圾.
When you write df2 = df1
, you are creating a variable named df2
, and binding it with an object with id 4541269200
. When you write df1 = pd.DataFrame([9,9,9])
, you are creating a new object with id 4541271120
and binding it to variable df1
, but the object with id 4541269200
which was previously bound to df1
continues to live. If there were no variables bound to that object, it will get garbage collected by Python.
In[33]: import pandas as pd
In[34]: df1 = pd.DataFrame([1,2,3,4,5])
In[35]: id(df1)
Out[35]: 4541269200
In[36]: df2 = df1
In[37]: id(df2)
Out[37]: 4541269200 # Same id as df1
In[38]: df3 = df1.copy()
In[39]: id(df3)
Out[39]: 4541269584 # New object, new id.
In[40]: df4 = df1.copy(deep=False)
In[41]: id(df4)
Out[41]: 4541269072 # New object, new id.
In[42]: df1 = pd.DataFrame([9, 9, 9])
In[43]: id(df1)
Out[43]: 4541271120 # New object created and bound to name 'df1'.
In[44]: id(df2)
Out[44]: 4541269200 # Old object's id not impacted.
2018年7月30日添加
Added on 7/30/2018
深度复制在熊猫中不起作用,并且开发人员考虑将可变的对象DataFrame内的对象作为反模式.请考虑以下内容:
Deep copying doesn't work in pandas and the devs consider putting mutable objects inside a DataFrame as an antipattern. Consider the following:
In[10]: arr1 = [1, 2, 3]
In[11]: arr2 = [1, 2, 3, 4]
In[12]: df1 = pd.DataFrame([[arr1], [arr2]], columns=['A'])
In[13]: df1.applymap(id)
Out[13]:
A
0 4515714832
1 4515734952
In[14]: df2 = df1.copy(deep=True)
In[15]: df2.applymap(id)
Out[15]:
A
0 4515714832
1 4515734952
In[16]: df2.loc[0, 'A'].append(55)
In[17]: df2
Out[17]:
A
0 [1, 2, 3, 55]
1 [1, 2, 3, 4]
In[18]: df1
Out[18]:
A
0 [1, 2, 3, 55]
1 [1, 2, 3, 4]
df2
,如果它是真正的深层副本,则其中的列表应具有新的ID.结果,当您修改df2内的列表时,它也会影响df1内的列表,因为它们是相同的对象.
df2
, if it was a true deep copy should have had new ids for the lists contained within it. As a result, when you modify a list inside df2, it affects the list inside df1 as well, because they are the same objects.
这篇关于python Pandas DataFrame复制(deep = False)vs复制(deep = True)vs'='的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!