我需要您的帮助来确定分析正面"与负面"的行业特定句子(即电影评论)的最佳方法.之前看过OpenNLP之类的库,但是太底层了——它只是给了我基本的句子构成;我需要的是一个更高层次的结构:- 希望有词表- 希望可以在我的数据集上训练
I need your help in determining the best approach for analyzing industry-specific sentences (i.e. movie reviews) for "positive" vs "negative". I've seen libraries such as OpenNLP before, but it's too low-level - it just gives me the basic sentence composition; what I need is a higher-level structure:- hopefully with wordlists- hopefully trainable on my set of data
您正在寻找的通常称为 情绪分析.通常,情绪分析无法处理微妙的微妙之处,例如讽刺或讽刺,但如果您将大量数据投入其中,它的表现会很好.
What you are looking for is commonly dubbed Sentiment Analysis. Typically, sentiment analysis is not able to handle delicate subtleties, like sarcasm or irony, but it fares pretty well if you throw a large set of data at it.
Sentiment analysis usually needs quite a bit of pre-processing. At least tokenization, sentence boundary detection and part-of-speech tagging. Sometimes, syntactic parsing can be important. Doing it properly is an entire branch of research in computational linguistics, and I wouldn't advise you with coming up with your own solution unless you take your time to study the field first.
OpenNLP 有一些帮助情绪分析的工具,但如果你想要更严肃的东西,你应该查看 LingPipe 工具包.它有一些内置的 SA 功能和一个不错的教程.你可以在你自己的数据集上训练它,但不要认为这完全是微不足道的:-).
谷歌搜索这个词可能也会给你一些资源.如果您有任何更具体的问题,尽管问,我正在密切关注 nlp 标签 ;-)