返回(647054, 7)
创建一个虚拟列,该列具有5,196个唯一值.结果应该是形状为(647054, 5196)
I am working on the Walmart Kaggle competition and I'm trying to create a dummy column of of the "FinelineNumber" column. For context, df.shape
returns (647054, 7)
. I am trying to make a dummy column for df['FinelineNumber']
, which has 5,196 unique values. The results should be a dataframe of shape (647054, 5196)
, which I then plan to concat
to the original dataframe.
几乎每次我运行fineline_dummies = pd.get_dummies(df['FinelineNumber'], prefix='fl')
时,都会收到以下错误消息The kernel appears to have died. It will restart automatically.
我在具有16GB RAM的MacBookPro上的jupyter笔记本中运行python 2.7.
Nearly every time I run fineline_dummies = pd.get_dummies(df['FinelineNumber'], prefix='fl')
, I get the following error message The kernel appears to have died. It will restart automatically.
I am running python 2.7 in jupyter notebook on a MacBookPro with 16GB RAM.
有人可以解释为什么会发生这种情况(为什么它在大多数情况下都会发生,但并非每次都发生)?它是Jupyter笔记本还是熊猫虫?另外,我认为这可能与内存不足有关,但是在具有> 100 GB RAM的Microsoft Azure机器学习笔记本上出现了相同的错误.在Azure ML上,内核每次都几乎立即死亡.
Can someone explain why this is happening (and why it happens most of the time but not every time)? Is it a jupyter notebook or pandas bug? Also, I thought it might have to do with not enough RAM but I get the same error on a Microsoft Azure Machine Learning notebook with >100 GB of RAM. On Azure ML, the kernel dies every time - almost immediately.
It very much could be memory usage - a 647054, 5196 data frame has 3,362,092,584 elements, which would be 24GB just for the pointers to the objects on a 64-bit system. On AzureML while the VM has a large amount of memory you're actually limited in how much memory you have available (currently 2GB, soon to be 4GB) - and when you hit the limit the kernel typically dies. So it seems very likely it is a memory usage issue.
您可以尝试在数据框中执行 .to_sparse()首先进行任何其他操作.这样一来,熊猫便可以将大部分数据帧保存在内存之外.
You might try doing .to_sparse() on the data frame first before doing any additional manipulations. That should allow Pandas to keep most of the data frame out of memory.
这篇关于使用Pandas创建虚拟变量时Jupyter Notebook内核死亡的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!