问题描述
我想在文档中搜索关键字,然后检查该关键字是否在另一个关键字的 5 行以内.如果是,我想打印该行和以下 50 行.
I want to search for a key word in a document and then check to see whether that keyword is within 5 lines of another key word. If it is, I want to print the line and the following 50 lines.
在这个例子中,我在一个文档中搜索carrying"这个词,我想确保carrying"这个词在Financial Assets:"这个词的 5 行以内,我的代码能够找到并打印当我只包括搜索carrying"时的行,但是当我包括搜索Financial Assets:"时,它没有找到任何东西(即使我知道它在文档中).
In this example, I am searching a document for the word "carrying" and I want to make sure that the word "carrying" is within 5 lines of the words "Financial Assets:" My code is able to find and print the lines when I just include the search for "carrying", but when I include the search for "Financial Assets:" it does not find anything (even though I know it's there in the document).
import urllib2
data = []
html = urllib2.urlopen("ftp://ftp.sec.gov/edgar/data/1001627/0000950116-97-001247.txt")
searchlines = html.readlines()
for m, line in enumerate(searchlines):
line = line.lower()
if "carrying" in line and "Financial Assets:" in searchlines[m-5:m+5]:
for l in searchlines[m-5:m+50]:
data.append(l)
print ''.join(data)
任何帮助将不胜感激.
推荐答案
代替
"Financial Assets:" in searchlines[m-5:m+5]
你需要:
any("Financial Assets:" in line2 for line2 in searchlines[m-5:m+5])
您的原始代码查找包含Financial Assets:"内容的行,而不是将其作为每行中的子字符串查找.
Your original code looks for a line which contains exactly the content "Financial Assets:", instead of looking for it as a substring in each line.
这篇关于如何在文档中搜索关键字,然后在 Python 中原始关键字的一组行数内搜索后续关键字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!