本文介绍了使用 NLTK 或类似方法将名词分类为抽象或具体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!


如何在 Python 中将名词列表分类为抽象的或具体的?

How can I categorize a list of nouns into abstract or concrete in Python?


"Have a seat in that chair."


In above sentence chair is noun and can be categorized as concrete.



I would suggest training a classifier using pretrained word vectors.

您需要两个库:spacy 用于标记文本和提取词向量,scikit-learn 用于机器学习:

You need two libraries: spacy for tokenizing text and extracting word vectors, and scikit-learn for machine learning:

import spacy
from sklearn.linear_model import LogisticRegression
import numpy as np
nlp = spacy.load("en_core_web_md")


Distinguishing concrete and abstract nouns is a simple task, so you can train a model with very few examples:

classes = ['concrete', 'abstract']
# todo: add more examples
train_set = [
    ['apple', 'owl', 'house'],
    ['agony', 'knowledge', 'process'],
X = np.stack([list(nlp(w))[0].vector for part in train_set for w in part])
y = [label for label, part in enumerate(train_set) for _ in part]
classifier = LogisticRegression(C=0.1, class_weight='balanced').fit(X, y)


When you have a trained model, you can apply it to any text:

for token in nlp("Have a seat in that chair with comfort and drink some juice to soothe your thirst."):
    if token.pos_ == 'NOUN':
        print(token, classes[classifier.predict([token.vector])[0]])


# seat concrete
# chair concrete
# comfort abstract
# juice concrete
# thirst abstract


You can improve the model by applying it to different nouns, spotting the errors and adding them to the training set under the correct label.

这篇关于使用 NLTK 或类似方法将名词分类为抽象或具体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-06 03:09