如何在文档中搜索关键字

如何在文档中搜索关键字

本文介绍了如何在文档中搜索关键字,然后在 Python 中原始关键字的一组行数内搜索后续关键字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在文档中搜索关键字,然后检查该关键字是否在另一个关键字的 5 行以内.如果是,我想打印该行和以下 50 行.

I want to search for a key word in a document and then check to see whether that keyword is within 5 lines of another key word. If it is, I want to print the line and the following 50 lines.

在这个例子中,我在一个文档中搜索carrying"这个词,我想确保carrying"这个词在Financial Assets:"这个词的 5 行以内,我的代码能够找到并打印当我只包括搜索carrying"时的行,但是当我包括搜索Financial Assets:"时,它没有找到任何东西(即使我知道它在文档中).

In this example, I am searching a document for the word "carrying" and I want to make sure that the word "carrying" is within 5 lines of the words "Financial Assets:" My code is able to find and print the lines when I just include the search for "carrying", but when I include the search for "Financial Assets:" it does not find anything (even though I know it's there in the document).

import urllib2

data = []

html = urllib2.urlopen("ftp://ftp.sec.gov/edgar/data/1001627/0000950116-97-001247.txt")
searchlines = html.readlines()
for m, line in enumerate(searchlines):
    line = line.lower()
    if "carrying" in line and "Financial Assets:" in searchlines[m-5:m+5]:
        for l in searchlines[m-5:m+50]:
            data.append(l)
print ''.join(data)

任何帮助将不胜感激.

推荐答案

代替

"Financial Assets:" in searchlines[m-5:m+5]

你需要:

any("Financial Assets:" in line2 for line2 in searchlines[m-5:m+5])

您的原始代码查找包含Financial Assets:"内容的行,而不是将其作为每行中的子字符串查找.

Your original code looks for a line which contains exactly the content "Financial Assets:", instead of looking for it as a substring in each line.

这篇关于如何在文档中搜索关键字,然后在 Python 中原始关键字的一组行数内搜索后续关键字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-30 07:48