本文介绍了你如何找到句子的主语?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我是NLP的新手,正在研究我应该使用什么语言工具包来执行以下操作。我想做两件事之一来完成同样的事情:I am new to NLP and was doing research about what language toolkit I should be using to do the following. I would like to do one of the two things which accomplishes the same thing: 我基本上想要对文本进行分类,通常一个句子包含15个单词。想判断句子是否在谈论特定主题。I basically would like to classify a text, usually one sentence that contains 15 words. Would like to classify if the sentence is talking about a specific subject.是否有一个给出句子的工具,它会找出句子的主语。Is there a tool that given a sentence, it finds out the subject of a sentence.我使用的是PHP和Java,但该工具可以是在Linux命令行上运行的任何东西I am using PHP and Java but the tool can be anything that runs on Linux command line非常感谢。推荐答案最基本的方法是创建一个标记的训练数据集并使用它来训练分类器。分类器如何工作是一个更复杂的问题 - 对于垃圾邮件过滤和许多其他事情,只需查看单词频率就可以很好地工作。The most basic way of doing this is create a set of labeled training data and using it to train a classifier. How the classifier works is a more complicated issue- for spam filtering and many other things, just looking at the word frequency works pretty well.这是一个基本的例子: http://openclassroom.stanford.edu/ MainFolder / DocumentPage.php?course = MachineLearning& doc = exercise / ex6 / ex6.htmlHere is a basic example: http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex6/ex6.html写一个Naive Bayes分类器是微不足道的。像MALLET这样的软件包也会有更好的机器学习方法。 Lingpipe也会有这样的东西。It is trivial to write a Naive Bayes classifier; a package like MALLET will also have this plus better machine learning methods. Lingpipe will also have this sort of stuff.你真正需要关心的是数据的质量和你的功能。根据数据的质量,我的意思是没有那么多边界情况的大量数据,而我的意思是你只选择单词,单词组合(单词ngrams),或依赖项功能,或者更复杂的东西。您需要一种方法来创建要素数据以及实际进行学习!从这个意义上讲,Lingpipe很好,因为你可以进行标记化和所有这一切,而不是编写自己的函数来执行此操作,或者必须将其他工具拼凑到您自己的要素生成代码中。What you really should care about is the quality of data and what your features are. By quality of data I mean lots of data without that many borderline cases, and by features I mean are you choosing just words, or combinations of words (word ngrams), or dependency features, or something more complex. You need a way to create the feature data as well as actually do the learning! In this sense Lingpipe is good as you can do tokenization and all that first as opposed to writing your own functions to do this or having to cobble other tools together into your own feature generation code.可在此处找到MALLET指南: http://courses.washington。 edu / ling570 / fei_fall10 / 11_15_Mallet.pdfA guide to MALLET can be found here: http://courses.washington.edu/ling570/fei_fall10/11_15_Mallet.pdf 这篇关于你如何找到句子的主语?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-12 12:22