本文介绍了使用NLTK检查英语语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始使用 NLTK库,我想检查一下英语句子是否正确是否.

I'm starting to use the NLTK library, and I want to check whether a sentence in English is correct or not.

示例:

他看见鲍勃"-不正确

他看见鲍勃"-正确

我阅读了,但这对我来说很难.我需要一个简单的例子.

I read this, but it's quite hard for me. I need an easier example.

推荐答案

语法检查是NLP研究的活跃领域,因此目前尚无100%的答案(甚至可能没有80%的答案).最简单的方法(或至少是一个合理的基线)是一个n-gram语言模型(规范话音长度的LM概率并为语法"或非语法"设置启发式阈值.

Grammar checking is an active area of NLP research, so there isn't a 100% answer (maybe not even an 80% answer) at this time. The simplest approach (or at least a reasonable baseline) would be an n-gram language model (normalizing LM probabilities for utterance length and setting a heuristic threshold for 'grammatical' or 'ungrammatical'.

您可以使用Google的n-gram语料库,也可以训练自己的域内数据.您也许可以使用NLTK做到这一点;您肯定可以使用LingPipe,SRI语言建模工具包或OpenGRM.

You could use Google's n-gram corpus, or train your own on in-domain data. You might be able to do that with NLTK; you definitely could with LingPipe, the SRI Language Modeling Toolkit, or OpenGRM.

也就是说,一个n-gram模型不能很好地执行.如果它满足您的需求,那很好,但是如果您想做得更好,就必须训练机器学习分类器.语法分类器通常会使用句法和/或语义处理中的功能(例如POS标签,依赖项和选区解析等).您可以查看Joel Tetrault及其在ETS或Jennifer所研究团队的一些工作福斯特和她在都柏林的团队.

That said, an n-gram model won't perform all that well. If it meets your needs, great, but if you want to do better, you'll have to train a machine-learning classifier. A grammaticality classifier would generally use features from syntactic and/or semantic processing (e.g. POS-tags, dependency and constituency parses, etc.) You might look at some of the work from Joel Tetrault and the team he worked with at ETS, or Jennifer Foster and her team at Dublin.

对不起,没有一个简单直接的答案...

Sorry there isn't an easy and straightforward answer...

这篇关于使用NLTK检查英语语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-20 10:00