问题描述
我正在寻找一种简单的方法来检测文本的简短摘录,一些句子是否为英语.在我看来,这个问题比尝试检测一种任意语言要容易得多.有没有可以做到这一点的软件?我正在用python编写,并且希望使用python库,但也可以使用其他方法.我尝试过使用google,但后来意识到TOS不允许自动查询.
I'm looking for a simple way to detect whether a short excerpt of text, a few sentences, is English or not. Seems to me that this problem is much easier than trying to detect an arbitrary language. Is there any software out there that can do this? I'm writing in python, and would prefer a python library, but something else would be fine too. I've tried google, but then realized the TOS didn't allow automated queries.
推荐答案
我阅读了一种使用 Trigrams
您可以遍历文本,并尝试检测单词中最常用的三字组.如果最常用的词与英语单词中最常用的词匹配,则文本可以用英语书写
You can go over the text, and try to detect the most used trigrams in the words. If the most used ones match with the most used among english words, the text may be written in English
尝试查看这个红宝石项目:
Try to look in this ruby project:
https://github.com/feedbackmine/language_detector
这篇关于检测文本是否为英语(批量)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!