问题描述
我已经尝试了一段时间pd.Series和pd.DataFrame,并且遇到了一些奇怪的问题.假设我有以下pd.DataFrame:
I've been experimenting for a while with pd.Series and pd.DataFrame and faced some strange problem. Let's say I have the following pd.DataFrame:
df = pd.DataFrame({'col':[[1,2,3]]})
请注意,此数据框包括包含列表的列.我想修改此数据框的副本并返回其修改后的版本,以使初始版本保持不变.为了简单起见,假设我要在其单元格中添加整数"4".
Notice, that this dataframe includes column containing list. I want to modify this dataframe's copy and return its modified version so that the initial one will remain unchanged. For the sake of simplicity, let's say I want to add integer '4' in its cell.
我尝试了以下代码:
def modify(df):
dfc = df.copy(deep=True)
dfc['col'].iloc[0].append(4)
return dfc
modify(df)
print(df)
问题在于,除了新的副本 dfc
外,初始DataFrame df
也被修改了.为什么?我应该怎么做才能防止初始数据帧被修改?我的熊猫版本是0.25.0
The problem is that, besides the new copy dfc
, the initial DataFrame df
is also modified. Why? What should I do to prevent initial dataframes from modifying? My pandas version is 0.25.0
推荐答案
从文档此处,在注释"部分中:
From the docs here, in the Notes section:
此问题在GitHub上的此问题中再次引用,其中开发人员指出:
This is referenced again in this issue on GitHub, where the devs state that:
因此,此功能按开发人员的意图工作-可变对象(如列表)不应嵌入到DataFrames中.
So this function is working as the devs intend - mutable objects such as lists should not be embedded in DataFrames.
我找不到使 copy.deepcopy
在DataFrame上按预期方式工作的方法,但是我确实找到了使用挑剔:
I couldn't find a way to get copy.deepcopy
to work as intended on a DataFrame, but I did find a fairly awful workaround using pickle:
import pandas as pd
import pickle
df = pd.DataFrame({'col':[[1,2,3]]})
def modify(df):
dfc = pickle.loads(pickle.dumps(df))
print(dfc['col'].iloc[0] is df['col'].iloc[0]) #Check if we've succeeded in deepcopying
dfc['col'].iloc[0].append(4)
print(dfc)
return dfc
modify(df)
print(df)
输出:
False
col
0 [1, 2, 3, 4]
col
0 [1, 2, 3]
这篇关于pandas.DataFrame.copy(deep = True)实际上没有创建深层副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!