从ipython / jupyter笔记本中提取进出泡菜的方法 | jupyter笔记本中提取进出泡菜的方法

本文介绍了从ipython / jupyter笔记本中提取进出泡菜的方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图总结一个跨越很多ipython / jupyter笔记本的数据分析项目，每个笔记本都相当长。其中一个有助于这个过程的事情是，如果我至少知道整个投入的酸菜进入输出泡菜走出去。

什么是最干净/最快速/最有效的方法？

  我不确定这是否是最好的方法。 > def summerize_pickles（notebook_path）：
从IPython.nbformat导入当前为nbformat 
导入重新
 $ b $打开（notebook_path）为fh：
 nb = nbformat.reads_json （fh.read（））
 
 list_of_input_pickles = [] 
 list_of_output_pickles = [] 
 
 for cell in nb [worksheets] [0] [cells ]：
＃这确认至少有一个泡菜。 
 if cell [cell_type]！=codeor cell [input]。find（pickle）== -1：＃跳过非代码单元或代码单元但不引用pickle 
 continue 
 
＃如果有多行，它将逐行迭代
用于单元格[input]中的行。 （）：
如果line.find（pickle）== -1：＃跳过行不提及pickle可能减少搜索次数
 continue 
 ############################ #################### ######## ############################ ############## ############## 
 code_type = str（）
如果line.find（pickle.dump）！= -1或者line.find（。to_pickle ）！= -1：
 code_type =output
 elif line.find（pickle.load）！= -1或line.find（。read_pickle）！= -1： 
 code_type =input
 else：
 continue＃这将告诉代码跳过import cpickle as pickle这样的行。
 ########################### ＃############################ ##################### ####### ############################ 
 filename = re.findall（r'（。 *？）'，line）＃这将获取引号之间的所有内容。请参阅：http://stackoverflow.com/questions/171480/regex-grabbing-values-between-quotation-marks 
 ##################### ####### ############################ ############### ############# ############################ 
 if code_type ==输入：
 list_of_input_pickles.append（filename [0]）
 elif code_type ==output：
 list_of_output_pickles.append（filename [0]）
 
 pickles_dict = {input_pickles：list_of_input_pickles，
output_pickles：list_of_output_pickles} 
 
返回pickles_dict

I'm trying to summarize a data analysis project which runs across many ipython / jupyter notebooks and each notebook is fairly long. One of the things that would help this process is if I knew at least what the overall "input" pickles going in and "output" pickles going out.

What's the cleanest/quickest/most efficient way to do this?

解决方案

I'm not sure if this is the best way to do it, but it's at least one way...

def summerize_pickles(notebook_path):
    from IPython.nbformat import current as nbformat
    import re

    with open(notebook_path) as fh:
        nb = nbformat.reads_json(fh.read())

    list_of_input_pickles = []
    list_of_output_pickles = []

    for cell in nb["worksheets"][0]["cells"]:
        # This confirms there is at least one pickle in it.
        if cell["cell_type"] != "code" or cell["input"].find("pickle") == -1:   # Skipping over those cells which aren't code or those cells with code but which don't reference "pickle
            continue

        # In case there are multiple lines, it iterates line by line.
        for line in cell["input"].splitlines():
            if line.find("pickle") == -1:  # Skips over lines w/ no mention of "pickle" to potentially reduce the number of times it's searched.
                continue
            ############################    ############################    ############################    ############################
            code_type = str()
            if line.find("pickle.dump") != -1 or line.find(".to_pickle")!= -1:
                code_type = "output"
            elif line.find("pickle.load") != -1 or line.find(".read_pickle")!= -1:
                code_type = "input"
            else:
                continue   # This tells the code to skip over lines like "import cpickle as pickle"
            ############################    ############################    ############################    ############################
            filename = re.findall(r'"(.*?)"', line)   # This gets all the content between the quotes. See: http://stackoverflow.com/questions/171480/regex-grabbing-values-between-quotation-marks
            ############################    ############################    ############################    ############################
            if code_type == "input":
                list_of_input_pickles.append(filename[0])
            elif code_type == "output":
                list_of_output_pickles.append(filename[0])

    pickles_dict = {"input_pickles":list_of_input_pickles,
                    "output_pickles":list_of_output_pickles }

    return pickles_dict

这篇关于从ipython / jupyter笔记本中提取进出泡菜的方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！