本文介绍了在NLTK中将段落标记为句子,然后标记为单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试将整个段落输入到我的文字处理器中,然后先分成句子,然后再分成单词.
I am trying to input an entire paragraph into my word processor to be split into sentences first and then into words.
我尝试了以下代码,但是它不起作用,
I tried the following code but it does not work,
#text is the paragraph input
sent_text = sent_tokenize(text)
tokenized_text = word_tokenize(sent_text.split)
tagged = nltk.pos_tag(tokenized_text)
print(tagged)
但是这不起作用,并给我错误.那么,如何将段落标记为句子,然后再标记为单词?
however this is not working and gives me errors. So how do I tokenize paragraphs into sentences and then words?
一个示例段落:
这东西似乎使那只黑褐色的小狗感到吃惊和震惊,使他伤心.他绝望地沉在孩子的脚下.当重击一声再加上幼稚的训诫时,他转过身来,用独特的方式握住了爪子.同时用他的耳朵和眼睛向孩子祈祷.
**警告:**这只是互联网上的随机文字,我不拥有上述内容.
**WARNING:**This is just a random text from the internet, I do not own the above content.
推荐答案
您可能打算遍历sent_text
:
import nltk
sent_text = nltk.sent_tokenize(text) # this gives us a list of sentences
# now loop over each sentence and tokenize it separately
for sentence in sent_text:
tokenized_text = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokenized_text)
print(tagged)
这篇关于在NLTK中将段落标记为句子,然后标记为单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!