问题描述
我想使用Git对Jupyter Notebooks进行版本控制.不幸的是,默认情况下,Git和Jupyter Notebooks不能很好地播放. .ipynb
文件是.json
文件,不仅包含Python代码本身,还包含大量元数据(例如,单元执行计数)和单元输出.
I would like to version control Jupyter Notebooks using Git. Unfortunately, by default, Git and Jupyter Notebooks do not play nicely. An .ipynb
file is a .json
file containing not only the Python code itself but also plenty of metadata (e.g., cell execution counts) and cell output.
大多数现有解决方案(例如,在版本控制下使用IPython笔记本)依靠从笔记本中删除输出和元数据. (i)进行差异化时,它仍然保持.json
文件结构,这很难读,并且(ii)意味着无法使用诸如Github上的输出显示之类的功能,因为在提交之前会删除输出.
Most existing solutions (e.g., Using IPython notebooks under version control) rely on removing output and metadata from the notebook. This (i) still maintains the .json
file structure when diffing, which is a pain to read, and (ii) means that features such as output display on Github cannot be used, because the output gets removed before committing.
我的想法如下:每当我运行git diff
时,Git都会自动使用jupyter nbconvert --to python filename.ipynb
从我的*.ipynb
源文件转换为*.py
普通python文件.然后,它应该仅检测到影响代码本身的更改(而不是执行计数和输出,因为这些更改已由nbconvert
删除),而没有实际删除它们,并且应该使我的差异与未转换的.ipynb
文件相比更具可读性.我不希望文件的.py
版本被永久存储.它仅应用于git diff
.我的理解是,只需将nbconvert
指定为[diff] textconv
驱动程序,这应该可以实现,但是我无法使其正常工作.
My idea is the following: Whenever I run git diff
, Git automatically uses jupyter nbconvert --to python filename.ipynb
to convert from my *.ipynb
source files to *.py
plain python files. It should then only detect changes that affect the code itself (not execution counts and output, as those are removed by nbconvert
) without actually removing them and it should make my diffs much more readable than they are for unconverted .ipynb
files. I do not want the .py
version of the file to be stored permanently; it should only be used for git diff
. My understanding is that this should be possible by simply specifying nbconvert
as the [diff] textconv
driver, but I have not been able to get it to work.
我在/usr/local/bin
中创建了一个名为ipynb2py
的文件,其中包含
I have created a file named ipynb2py
in /usr/local/bin
containing
#!/bin/bash
jupyter nbconvert --to python $1
我已将以下内容添加到我的.gitconfig
文件
I have added the following to my .gitconfig
file
[diff "ipynb"]
textconv = ipynb2py
,然后将以下内容添加到我的.gitattributes
文件
and the following to my .gitattributes
file
*.ipynb diff=ipynb
将ipynb
textconv驱动程序分配给.ipynb
格式的所有文件.
to assign the ipynb
textconv driver to all files of the .ipynb
format.
现在,我希望git diff
会在每次运行它时自动执行一次转换(我知道这会大大减慢速度,但是值得为VCing笔记本提供一个可行的选择),然后显示一个不错的可读差异,仅基于转换后的笔记本状态 之间的差异.
Now, I would expect git diff
to automatically perform a conversion (I know this will slow down substantially but it's worth having a viable option for VCing notebooks) every time I run it and then show a nice readable diff, based only on the difference between notebook states after conversion.
当我执行git diff
时,它首先说[NbConvertApp] Converting notebook
,这告诉我Git正在按预期触发转换.但是,在经过fatal: unable to read files to diff
结束的长时间Python追溯之后,转换失败.
When I do a git diff
, it first says [NbConvertApp] Converting notebook
, which tells me that Git is triggering the conversion as expected. However, the conversion fails after a long Python traceback ending in fatal: unable to read files to diff
.
在出现fatal
错误消息之前,我立即收到以下消息
Immediately before the fatal
error message, I receive the following
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '\n# coding: utf-8\n\n# In[ ]:\n\nimport...
当然,我怀疑我的ipynb2py
脚本调用nbconvert
的方式存在问题,但是在我的存储库中运行ipynb2py notebook.ipynb
效果很好,所以这不是原因.
Of course, I suspected that there was a problem with the way in which my ipynb2py
script was invoking nbconvert
, but running ipynb2py notebook.ipynb
in my repo works perfectly well, so that cannot be the reason.
什么可能导致此错误?除了返回文本文件之外,有效的textconv
驱动程序有什么要求?
What could be causing this error? What are the requirements for a valid textconv
driver other than returning a text file?
git diff
[NbConvertApp] Converting notebook /var/folders/9t/p55_4b9971j4wwp14_45wy900000gn/T//lR5q08_notebook.ipynb to python
Traceback (most recent call last):
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 14, in parse_json
nb_dict = json.loads(s, **kwargs)
File "/Users/user/anaconda/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/Users/user/anaconda/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/user/anaconda/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/user/anaconda/bin/jupyter-nbconvert", line 11, in <module>
load_entry_point('nbconvert==5.1.1', 'console_scripts', 'jupyter-nbconvert')()
File "/Users/user/anaconda/lib/python3.6/site-packages/jupyter_core/application.py", line 266, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 305, in start
self.convert_notebooks()
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 473, in convert_notebooks
self.convert_single_notebook(notebook_filename)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 444, in convert_single_notebook
output, resources = self.export_single_notebook(notebook_filename, resources, input_buffer=input_buffer)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 373, in export_single_notebook
output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 171, in from_filename
return self.from_file(f, resources=resources, **kw)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 189, in from_file
return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/__init__.py", line 141, in read
return reads(fp.read(), as_version, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/__init__.py", line 74, in reads
nb = reader.reads(s, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 58, in reads
nb_dict = parse_json(s, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 17, in parse_json
raise NotJSONError(("Notebook does not appear to be JSON: %r" % s)[:77] + "...")
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '\n# coding: utf-8\n\n# In[ ]:\n\nimport...
fatal: unable to read files to diff
推荐答案
如果您仔细阅读 gitattributes
的文档(其中描述了textconv
config选项),您会注意到转换器程序必须将输出发送到标准输出:
If you carefully read the documentation of gitattributes
(where the textconv
config option is described) you will notice that the converter program must send the output to standard output:
有时候,希望看到文本转换版本的差异 一些二进制文件.例如,文字处理程序文档可以是 转换为ASCII文本表示形式,以及文本的差异 如图所示.即使此转换会丢失一些信息, 产生的差异对于人类观看很有用(但无法应用 直接).
Sometimes it is desirable to see the diff of a text-converted version of some binary files. For example, a word processor document can be converted to an ASCII text representation, and the diff of the text shown. Even though this conversion loses some information, the resulting diff is useful for human viewing (but cannot be applied directly).
textconv
config选项用于为以下程序定义程序 执行这样的转换.该程序应该采取一个 参数,要转换的文件名,然后 生成结果 在标准输出上输入文字.
The textconv
config option is used to define a program for performing such a conversion. The program should take a single argument, the name of a file to convert, and produce the resulting text on stdout.
...
因此,您必须在转换命令中添加--stdout
选项:
Therefore you must add the --stdout
option to your conversion command:
ipynb2py
#!/bin/bash
jupyter nbconvert --to python --stdout "$1"
这篇关于如何使用nbconvert作为git textconv驱动程序以启用Jupyter Notebooks的有效版本控制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!