本文介绍了如何使用nbconvert作为git textconv驱动程序以启用Jupyter Notebooks的有效版本控制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Git对Jupyter Notebooks进行版本控制.不幸的是,默认情况下,Git和Jupyter Notebooks不能很好地播放. .ipynb文件是.json文件,不仅包含Python代码本身,还包含大量元数据(例如,单元执行计数)和单元输出.

I would like to version control Jupyter Notebooks using Git. Unfortunately, by default, Git and Jupyter Notebooks do not play nicely. An .ipynb file is a .json file containing not only the Python code itself but also plenty of metadata (e.g., cell execution counts) and cell output.

大多数现有解决方案(例如,在版本控制下使用IPython笔记本)依靠从笔记本中删除输出和元数据. (i)进行差异化时,它仍然保持.json文件结构,这很难读,并且(ii)意味着无法使用诸如Github上的输出显示之类的功能,因为在提交之前会删除输出.

Most existing solutions (e.g., Using IPython notebooks under version control) rely on removing output and metadata from the notebook. This (i) still maintains the .json file structure when diffing, which is a pain to read, and (ii) means that features such as output display on Github cannot be used, because the output gets removed before committing.

我的想法如下:每当我运行git diff时,Git都会自动使用jupyter nbconvert --to python filename.ipynb从我的*.ipynb源文件转换为*.py普通python文件.然后,它应该仅检测到影响代码本身的更改(而不是执行计数和输出,因为这些更改已由nbconvert删除),而没有实际删除它们,并且应该使我的差异与未转换的.ipynb文件相比更具可读性.我不希望文件的.py版本被永久存储.它仅应用于git diff.我的理解是,只需将nbconvert指定为[diff] textconv驱动程序,这应该可以实现,但是我无法使其正常工作.

My idea is the following: Whenever I run git diff, Git automatically uses jupyter nbconvert --to python filename.ipynb to convert from my *.ipynb source files to *.py plain python files. It should then only detect changes that affect the code itself (not execution counts and output, as those are removed by nbconvert) without actually removing them and it should make my diffs much more readable than they are for unconverted .ipynb files. I do not want the .py version of the file to be stored permanently; it should only be used for git diff. My understanding is that this should be possible by simply specifying nbconvert as the [diff] textconv driver, but I have not been able to get it to work.

我在/usr/local/bin中创建了一个名为ipynb2py的文件,其中包含

I have created a file named ipynb2py in /usr/local/bin containing

#!/bin/bash
jupyter nbconvert --to python $1

我已将以下内容添加到我的.gitconfig文件

I have added the following to my .gitconfig file

[diff "ipynb"]
    textconv = ipynb2py

,然后将以下内容添加到我的.gitattributes文件

and the following to my .gitattributes file

*.ipynb diff=ipynb

ipynb textconv驱动程序分配给.ipynb格式的所有文件.

to assign the ipynb textconv driver to all files of the .ipynb format.

现在,我希望git diff会在每次运行它时自动执行一次转换(我知道这会大大减慢速度,但是值得为VCing笔记本提供一个可行的选择),然后显示一个不错的可读差异,仅基于转换后的笔记本状态 之间的差异.

Now, I would expect git diff to automatically perform a conversion (I know this will slow down substantially but it's worth having a viable option for VCing notebooks) every time I run it and then show a nice readable diff, based only on the difference between notebook states after conversion.

当我执行git diff时,它首先说[NbConvertApp] Converting notebook,这告诉我Git正在按预期触发转换.但是,在经过fatal: unable to read files to diff结束的长时间Python追溯之后,转换失败.

When I do a git diff, it first says [NbConvertApp] Converting notebook, which tells me that Git is triggering the conversion as expected. However, the conversion fails after a long Python traceback ending in fatal: unable to read files to diff.

在出现fatal错误消息之前,我立即收到以下消息

Immediately before the fatal error message, I receive the following

nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '\n# coding: utf-8\n\n# In[ ]:\n\nimport...

当然,我怀疑我的ipynb2py脚本调用nbconvert的方式存在问题,但是在我的存储库中运行ipynb2py notebook.ipynb效果很好,所以这不是原因.

Of course, I suspected that there was a problem with the way in which my ipynb2py script was invoking nbconvert, but running ipynb2py notebook.ipynb in my repo works perfectly well, so that cannot be the reason.

什么可能导致此错误?除了返回文本文件之外,有效的textconv驱动程序有什么要求?

What could be causing this error? What are the requirements for a valid textconv driver other than returning a text file?

git diff
[NbConvertApp] Converting notebook /var/folders/9t/p55_4b9971j4wwp14_45wy900000gn/T//lR5q08_notebook.ipynb to python
Traceback (most recent call last):
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 14, in parse_json
nb_dict = json.loads(s, **kwargs)
File "/Users/user/anaconda/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/Users/user/anaconda/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/user/anaconda/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/user/anaconda/bin/jupyter-nbconvert", line 11, in <module>
load_entry_point('nbconvert==5.1.1', 'console_scripts', 'jupyter-nbconvert')()
File "/Users/user/anaconda/lib/python3.6/site-packages/jupyter_core/application.py", line 266, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 305, in start
self.convert_notebooks()
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 473, in convert_notebooks
self.convert_single_notebook(notebook_filename)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 444, in convert_single_notebook
output, resources = self.export_single_notebook(notebook_filename, resources, input_buffer=input_buffer)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 373, in export_single_notebook
output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 171, in from_filename
return self.from_file(f, resources=resources, **kw)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 189, in from_file
return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/__init__.py", line 141, in read
return reads(fp.read(), as_version, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/__init__.py", line 74, in reads
nb = reader.reads(s, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 58, in reads
nb_dict = parse_json(s, **kwargs)
File "/Users/user/anaconda/lib/python3.6/site-packages/nbformat/reader.py", line 17, in parse_json
raise NotJSONError(("Notebook does not appear to be JSON: %r" % s)[:77] + "...")
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '\n# coding: utf-8\n\n# In[ ]:\n\nimport...
fatal: unable to read files to diff

推荐答案

如果您仔细阅读 gitattributes 的文档(其中描述了textconv config选项),您会注意到转换器程序必须将输出发送到标准输出:

If you carefully read the documentation of gitattributes (where the textconv config option is described) you will notice that the converter program must send the output to standard output:

有时候,希望看到文本转换版本的差异 一些二进制文件.例如,文字处理程序文档可以是 转换为ASCII文本表示形式,以及文本的差异 如图所示.即使此转换会丢失一些信息, 产生的差异对于人类观看很有用(但无法应用 直接).

Sometimes it is desirable to see the diff of a text-converted version of some binary files. For example, a word processor document can be converted to an ASCII text representation, and the diff of the text shown. Even though this conversion loses some information, the resulting diff is useful for human viewing (but cannot be applied directly).

textconv config选项用于为以下程序定义程序 执行这样的转换.该程序应该采取一个 参数,要转换的文件名,然后 生成结果 在标准输出上输入文字.

The textconv config option is used to define a program for performing such a conversion. The program should take a single argument, the name of a file to convert, and produce the resulting text on stdout.

...

因此,您必须在转换命令中添加--stdout选项:

Therefore you must add the --stdout option to your conversion command:

ipynb2py

#!/bin/bash
jupyter nbconvert --to python --stdout "$1"

这篇关于如何使用nbconvert作为git textconv驱动程序以启用Jupyter Notebooks的有效版本控制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-31 07:11