本文介绍了将数据框从一个Jupyter Notebook文件导入到另一个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有3个单独的jupyter笔记本文件,分别处理单独的数据帧.我为每个df清理和处理这些笔记本中的数据.有没有办法在单独的笔记本中引用清理/最终数据?
我担心的是,如果我在一个笔记本上处理所有3个df,然后在(合并/合并)之后进行更多处理,那将是一英里长.我也不想仅仅为了使数据准备好在我的新笔记本中使用而重新编写一堆代码.
解决方案
如果使用的是熊猫数据帧,则一种方法是使用pandas.DataFrame.to_csv()
和pandas.read_csv()
保存和加载每个步骤之间的清理数据.
- Notebook1加载input1并保存result1.
- Notebook2加载result1并保存result2.
- Notebook3加载result2并保存result3.
如果这是您的数据:
import pandas as pd
raw_data = {'id': [10, 20, 30],
'name': ['foo', 'bar', 'baz']
}
input = pd.DataFrame(raw_data, columns = ['id', 'name'])
然后在notebook1.ipynb中,像这样处理它:
# load
df = pd.read_csv('input.csv', index_col=0)
# manipulate frame here
# ...
# save
df.to_csv('result1.csv')
...并对链中的每个阶段重复该过程.
# load
df = pd.read_csv('result1.csv', index_col=0)
# manipulate frame here
# ...
# save
df.to_csv('result2.csv')
最后,您的笔记本集合将如下所示:
- input.csv
- notebook1.ipynb
- notebook2.ipynb
- notebook3.ipynb
- result1.csv
- result2.csv
- result3.csv
文档:
I have 3 separate jupyter notebook files that deal with separate data frames. I clean and manipulate the data in these notebooks for each df. Is there a way to reference the cleaned up/final data in a separate notebook?
My concern is that if I work on all 3 dfs in one notebook and then do more with it after (merge/join), it will be a mile long. I also don't want to re-write a bunch of code just to get data ready for use in my new notebook.
解决方案
If you are using pandas data frames then one approach is to use pandas.DataFrame.to_csv()
and pandas.read_csv()
to save and load the cleaned data between each step.
- Notebook1 loads input1 and saves result1.
- Notebook2 loads result1 and saves result2.
- Notebook3 loads result2 and saves result3.
If this is your data:
import pandas as pd
raw_data = {'id': [10, 20, 30],
'name': ['foo', 'bar', 'baz']
}
input = pd.DataFrame(raw_data, columns = ['id', 'name'])
Then in notebook1.ipynb, process it like this:
# load
df = pd.read_csv('input.csv', index_col=0)
# manipulate frame here
# ...
# save
df.to_csv('result1.csv')
...and repeat that process for each stage in the chain.
# load
df = pd.read_csv('result1.csv', index_col=0)
# manipulate frame here
# ...
# save
df.to_csv('result2.csv')
At the end, your notebook collection will look like this:
- input.csv
- notebook1.ipynb
- notebook2.ipynb
- notebook3.ipynb
- result1.csv
- result2.csv
- result3.csv
Documentation:
这篇关于将数据框从一个Jupyter Notebook文件导入到另一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!