python - TypeError:tokenize_lemmatize_spacy()缺少1个必需的位置参数:“first_arg”

所以我得到这个错误试图返回我的sklearn矢量化器的不同值：

>>>  python features.py lemmatize_PS Gold.xlsx


Traceback (most recent call last):
  File "features.py", line 351, in <module>
    fea1, fea0, fe, fi, fo, fu, fo, fea2 = build_feature_matrix_S(sentences)
  File "features.py", line 100, in build_feature_matrix_S
    vectorizer_freq = CountVectorizer(tokenizer = tokenize_lemmatize_spacy(first_arg), binary=False, min_df=5, ngram_range=gram)
TypeError: tokenize_lemmatize_spacy() missing 1 required positional argument: 'first_arg'

tokenize_lemmatize函数如下所示：

def tokenize_lemmatize_spacy(texte, first_arg):
    texte = normalize(texte)
    mytokens = nlp(texte)

    if first_arg == 'lemmatize_only':
        # Lemmatizing each token and converting each token into lowercase
        mytokens = [word.lemma_.lower().strip() for word in mytokens if word.pos_ != "SPACE"]

    elif first_arg == 'lemmatize_PS':
        # Lemmatizing each token and converting each token into lowercase
        mytokens = [word.lemma_.lower().strip() for word in mytokens if word.pos_ != "SPACE" ]
        # Removing stop words and punctuations
        mytokens = [word for word in mytokens if word not in stopwords and word not in punctuations]

    else:
        raise Exception("Wrong feature type entered. Possible values:  'lemmatize_only', 'lemmatize_PS'")
    return mytokens

我测试了功能token_lemmatize，它可以工作，但是现在当我尝试在另一个脚本中使用它时，出现以下错误。

最佳答案

CountVectorizer需要一个可调用对象，但是您正在尝试提供该函数的输出。

使用partial

from functools import partial
vectorizer_freq = CountVectorizer(tokenizer=partial(tokenize_lemmatize_spacy,
                                                    first_arg='lemmatize_PS')
                                  binary=False, min_df=5, ngram_range=gram)