Python 在第 X 句之后剪切一个字符串

我必须剪切 Unicode字符串字符串，它实际上是一篇文章(包含句子)，我想在python中的第X句之后剪切此文章字符串。

句子结尾的一个很好的指标是它以句号(“.”)结尾，后面的单词以大写名称开头。如

myarticle == "Hi, this is my first sentence. And this is my second. Yet this is my third."

如何做到这一点？

谢谢

最佳答案

考虑下载 Natural Language Toolkit ( NLTK )。然后，您可以创建不会中断诸如“美国”之类的句子。或无法拆分以“？!”结尾的句子。

>>> import nltk
>>> paragraph = u"Hi, this is my first sentence. And this is my second. Yet this is my third."
>>> sentences = nltk.sent_tokenize(paragraph)
[u"Hi, this is my first sentence.", u"And this is my second.", u"Yet this is my third."]

您的代码变得更具可读性。要访问第二个句子，请使用您习惯的符号。

>>> sentences[1]
u"And this is my second."

关于Python 在第 X 句之后剪切一个字符串，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/3412316/