spacy

如何限制Spacy使用的CPU数量？

我想从一大堆句子中提取词性和命名实体。由于RAM方面的限制，我首先使用Python NLTK将文档解析为句子。然后，我遍历句子并使用nlp.pipe()进行提取。但是，当我这样做时，Spacy占用了我的整个计算机； Spacy使用每个可用的CPU。这样不好，因为我的计算机是共享的。如何限制Spacy使用的CPU数量？这是我到目前为止的代码:

# require
from nltk import *
import spacy

# initialize
file = './walden.txt'
nlp  = spacy.load( 'en' )

# slurp up the given file
handle = open( file, 'r' )
text   = handle.read()

# parse the text into sentences, and process each one
sentences = sent_tokenize( text )
for sentence in nlp.pipe( sentences, n_threads=1 ) :

  # process each token
  for token in sentence : print( "\t".join( [ token.text, token.lemma_, token.tag_ ] ) )

# done
quit()

最佳答案

我对自己的问题的回答是:“调用操作系统，并使用一个名为taskset的Linux实用程序。”

# limit ourselves is a few processors only
os.system( "taskset -pc 0-1 %d > /dev/null" % os.getpid() )

此特定解决方案将运行进程限制为内核＃1和＃2。这个解决方案对我来说已经足够了。

关于spacy - 如何限制Spacy使用的CPU数量？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/50537146/

spacy - 如何限制Spacy使用的CPU数量？