问题描述
我正在为我正在上的大学班做一个项目.
I'm doing a project for a college class I'm taking.
我正在使用PHP构建一个简单的Web应用程序,该应用程序基于一组字典将推文分类为正"(或快乐)和负"(或悲伤).我现在正在考虑的算法是朴素贝叶斯分类器或决策树.
I'm using PHP to build a simple web app that classify tweets as "positive" (or happy) and "negative" (or sad) based on a set of dictionaries. The algorithm I'm thinking of right now is Naive Bayes classifier or decision tree.
但是,我找不到任何可以帮助我进行认真的语言处理的PHP库. Python具有NLTK( http://www.nltk.org ). PHP有类似的东西吗?
However, I can't find any PHP library that helps me do some serious language processing. Python has NLTK (http://www.nltk.org). Is there anything like that for PHP?
我正计划将WEKA用作Web应用程序的后端(通过从PHP内部在命令行中调用Weka),但效率似乎不高.
I'm planning to use WEKA as the back end of the web app (by calling Weka in command line from within PHP), but it doesn't seem that efficient.
您知道我应该为这个项目使用什么吗?还是我应该切换到Python?
Do you have any idea what I should use for this project? Or should I just switch to Python?
谢谢
推荐答案
如果要使用Naive Bayes分类器,则实际上并不需要大量的NL处理.您只需要一种算法即可阻止推文中的单词,并根据需要删除停用词.
If you're going to be using a Naive Bayes classifier, you don't really need a whole ton of NL processing. All you'll need is an algorithm to stem the words in the tweets and if you want, remove stop words.
定标算法比比皆是,并且不难编写代码.删除停用词只是搜索哈希图或类似内容的问题.尽管这是一个非常不错的工具,但我认为没有必要切换您的开发平台以适应NLTK.
Stemming algorithms abound and aren't difficult to code. Removing stop words is just a matter of searching a hash map or something similar. I don't see a justification to switch your development platform to accomodate the NLTK, although it is a very nice tool.
这篇关于用PHP进行文本挖掘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!