本文介绍了Weka可以处理多少文字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个情感分析任务,我需要指定weka可以处理多少数据(以我的情况为例).我已经有2500条意见的语料库已被标记.我知道这是一个很小的语料库,但是我的论文导师要我专门讨论Weka可以处理多少数据.

解决方案

Weka的局限性在于您使用的学习算法以及可用于训练的内存量.大多数分类器要求将整个集合加载到内存中进行训练,但是也有用于流式传输数据的选项.有关更多信息,请参见关于大数据的weka页面. >

对于像您一样小的数据集,您将没有任何问题.但是,在遇到任何大数据问题时,您将无法再将其编写在单台计算机上.使用Weka并没有什么不同,一旦到达那里,就有多种方法可以使其正常工作.据我所知,在足够的硬件资源,时间和独创性的情况下,您将能够处理的数据量没有硬性限制.

I have a sentiment analysis task and I need to specify how much data (in my case text) weka can handle. I have a corpus of 2500 opinions already tagged. I know that it´s a small corpus but my thesis advisor is asking me to specifically argue on how much data can Weka handle.

解决方案

Your limitation with Weka will be on whatever learning algorithm you use and how much memory you have available for training. Most classifiers require the whole set be loaded into memory for training, but there are options for streaming data as well. See the weka page on big data for more information.

For a dataset as small as yours, you will not have any problem. With any big data issue, though, you hit a point where you no longer can just script it on a single machine. With Weka it is no different and there are ways of making it work once you get there. To my knowledge, there is no hard limit on the amount of data you will be able to handle, given enough hardware resources, time, and ingenuity.

这篇关于Weka可以处理多少文字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-18 17:02