问题描述
我记得很久以前在NLTK网站上浏览了句子分割部分.
我使用句点"手动换行符"的粗体文本替换句点"空格"以实现句子分段,例如使用Microsoft Word替换(.
-> .^p
)或Chrome扩展名:
https://github.com/AhmadHassanAwan/Sentence-Segmentation
https://chrome.google.com/webstore/detail/sentence -segmenter/jfbhkblbhhigbgdnijncccdndhbflcha
这代替了NLTK的Punkt标记生成器之类的NLP方法.
我进行细分以帮助我更轻松地定位和重新阅读句子,这有时有助于阅读理解.
独立子句边界消歧和独立子句分段如何处理?有没有尝试执行此操作的工具?
下面是一些示例文本.如果在句子中可以识别出一个独立的从句,那就会有分歧.从句子的结尾开始,它向左移动,并贪婪地分裂:
例如
(我不确定是否将其正确分割.)
如果没有分割独立子句的方法,那么我可以使用任何搜索词来进一步探讨该主题吗?
谢谢.
据我所知,尚无现成的工具可以解决这个确切的问题.通常,NLP系统不会遇到识别英语语法定义的不同类型的句子和从句的问题. EMNLP上发表了一篇论文,提供了一种算法,该算法在解析树中使用SBAR
标记来识别句子中的独立和独立子句.
您应该发现第3节很有用.它详细讨论了英语语法,但我认为整篇文章都与您的问题无关.
请注意,他们已经使用了伯克利解析器(此处提供了演示 ),但您显然可以使用任何其他选区解析工具(例如,斯坦福解析器演示可在此处).
I remember skimming the sentence segmentation section from the NLTK site a long time ago.
I use a crude text replacement of "period" "space" with "period" "manual line break" to achieve sentence segmentation, such as with a Microsoft Word replacement (.
-> .^p
) or a Chrome extension:
https://github.com/AhmadHassanAwan/Sentence-Segmentation
https://chrome.google.com/webstore/detail/sentence-segmenter/jfbhkblbhhigbgdnijncccdndhbflcha
This is instead of an NLP method like the Punkt tokenizer of NLTK.
I segment to help me more easily locate and reread sentences, which can sometimes help with reading comprehension.
What about independent clause boundary disambiguation, and independent clause segmentation? Are there any tools that attempt to do this?
Below is some example text. If an independent clause can be identified within a sentence, there’s a split. Starting from the end of a sentence, it moves left, and greedily splits:
E.g.
(I’m not sure if I split it properly.)
If there are no means to segment independent clauses, are there any search terms that I can use to further explore this topic?
Thanks.
To the best of my knowledge, there is no readily available tool to solve this exact problem. Usually, NLP systems do not get into the problem of identifying different types of sentences and clauses as defined by English grammar. There is one paper published in EMNLP which provides an algorithm which uses the SBAR
tag in parse trees to identify independent and dependent clauses in a sentence.
You should find section 3 of this paper useful. It talks about English language syntax in some details, but I don't think the entire paper is relevant to your question.
Note that they have used the Berkeley parser (demo available here), but you can obviously any other constituency parsing tool (e.g. the Stanford parser demo available here).
这篇关于独立子句边界消除歧义和独立子句分段–有什么工具可以做到这一点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!