要使nltk.tokenize.word_tokenize正常工作要下载什么?

本文介绍了要使nltk.tokenize.word_tokenize正常工作要下载什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我将在群集中我的帐户受空间配额非常限制的群集上使用nltk.tokenize.word_tokenize.在家里，我通过nltk.download()下载了所有nltk资源，但是据我发现，它占用了约2.5GB.

I am going to use nltk.tokenize.word_tokenize on a cluster where my account is very limited by space quota. At home, I downloaded all nltk resources by nltk.download() but, as I found out, it takes ~2.5GB.

对我来说，这似乎有些矫kill过正.您能否建议nltk.tokenize.word_tokenize的最小(或几乎最小)依赖性?到目前为止，我已经看过nltk.download('punkt')，但是我不确定它是否足够，大小如何.为了使它正常工作，我应该运行什么?

This seems a bit overkill to me. Could you suggest what are the minimal (or almost minimal) dependencies for nltk.tokenize.word_tokenize? So far, I've seen nltk.download('punkt') but I am not sure whether it is sufficient and what is the size. What exactly should I run in order to make it work?

NLTK

要使nltk.tokenize.word_tokenize正常工作要下载什么?

问题描述

推荐答案