我是2岁的学生,正在从事文本挖掘。

一般而言,让我告诉您有关代码的信息,它首先接受pdf类型的文本并将其转换为doc.txt文件,然后处理该数据几百行,然后将所有文本中的句子存储到名为all_text(供将来使用),我还选择了一些文本并将其存储在名为摘要的列表中。

最后问题出在这部分上:

摘要列表如下所示

summary=['Artificial Intelligence (AI) is a science and a set of computational technologies that are inspired by—but typically operate quite differently from—the ways people use their nervous systems and bodies to sense, learn, reason, and take action.','In reality, AI is already changing our daily lives, almost entirely in ways that improve human health, safety,and productivity.','AI is also changing how people interact with technology.']


我想要的是从doc.txt句子中逐句读取的,如果该句子在摘要列表中,则将该句子放入BOLD标记“该句子”中,以用于摘要列表中的所有内容,这是我为此尝试过的小代码部分它没有帮助全部,但在这里

while i < len(lis):
    if lis[i] in txt:
        txt = txt.replace(lis[i], "<b>" + lis[i] + "</b>")

        print(lis[i])

   i += 1


这段代码没有按我预期的那样工作,我的意思是它对某些短句子有效,但是对那些我不知道为什么它不起作用的句子无效,请帮帮我吗?

最佳答案

为此,您可以使用列表理解,例如:

summary = ['sentenceE','sentenceA']
text = ['sentenceA','sentenceB','sentenceC','sentenceD','sentenceE']
output = ['<b>'+i+'</b>' if (i in summary) else i for i in text]
print(output) #prints ['<b>sentenceA</b>', 'sentenceB', 'sentenceC', 'sentenceD', '<b>sentenceE</b>']


请注意,summarytext应该是liststr个。

09-25 17:10
查看更多