本文介绍了在NLTK中将段落标记为句子,然后标记为单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将整个段落输入到我的文字处理器中,然后先分成句子,然后再分成单词.

I am trying to input an entire paragraph into my word processor to be split into sentences first and then into words.

我尝试了以下代码,但是它不起作用,

I tried the following code but it does not work,

    #text is the paragraph input
    sent_text = sent_tokenize(text)
    tokenized_text = word_tokenize(sent_text.split)
    tagged = nltk.pos_tag(tokenized_text)
    print(tagged)

但是这不起作用,并给我错误.那么,如何将段落标记为句子,然后再标记为单词?

however this is not working and gives me errors. So how do I tokenize paragraphs into sentences and then words?

一个示例段落:

这东西似乎使那只黑褐色的小狗感到吃惊和震惊,使他伤心.他绝望地沉在孩子的脚下.当重击一声再加上幼稚的训诫时,他转过身来,用独特的方式握住了爪子.同时用他的耳朵和眼睛向孩子祈祷.

**警告:**这只是互联网上的随机文字,我不拥有上述内容.

**WARNING:**This is just a random text from the internet, I do not own the above content.

推荐答案

您可能打算遍历sent_text:

import nltk

sent_text = nltk.sent_tokenize(text) # this gives us a list of sentences
# now loop over each sentence and tokenize it separately
for sentence in sent_text:
    tokenized_text = nltk.word_tokenize(sentence)
    tagged = nltk.pos_tag(tokenized_text)
    print(tagged)

这篇关于在NLTK中将段落标记为句子,然后标记为单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-06 05:56