所以我得到这个错误试图返回我的sklearn矢量化器的不同值:

>>>  python features.py lemmatize_PS Gold.xlsx



Traceback (most recent call last):
  File "features.py", line 351, in <module>
    fea1, fea0, fe, fi, fo, fu, fo, fea2 = build_feature_matrix_S(sentences)
  File "features.py", line 100, in build_feature_matrix_S
    vectorizer_freq = CountVectorizer(tokenizer = tokenize_lemmatize_spacy(first_arg), binary=False, min_df=5, ngram_range=gram)
TypeError: tokenize_lemmatize_spacy() missing 1 required positional argument: 'first_arg'




tokenize_lemmatize函数如下所示:

def tokenize_lemmatize_spacy(texte, first_arg):
    texte = normalize(texte)
    mytokens = nlp(texte)

    if first_arg == 'lemmatize_only':
        # Lemmatizing each token and converting each token into lowercase
        mytokens = [word.lemma_.lower().strip() for word in mytokens if word.pos_ != "SPACE"]

    elif first_arg == 'lemmatize_PS':
        # Lemmatizing each token and converting each token into lowercase
        mytokens = [word.lemma_.lower().strip() for word in mytokens if word.pos_ != "SPACE" ]
        # Removing stop words and punctuations
        mytokens = [word for word in mytokens if word not in stopwords and word not in punctuations]

    else:
        raise Exception("Wrong feature type entered. Possible values:  'lemmatize_only', 'lemmatize_PS'")
    return mytokens


我测试了功能token_lemmatize,它可以工作,但是现在当我尝试在另一个脚本中使用它时,出现以下错误。

最佳答案

CountVectorizer需要一个可调用对象,但是您正在尝试提供该函数的输出。

使用partial

from functools import partial
vectorizer_freq = CountVectorizer(tokenizer=partial(tokenize_lemmatize_spacy,
                                                    first_arg='lemmatize_PS')
                                  binary=False, min_df=5, ngram_range=gram)

09-08 04:56