下面的代码将句子分解为单独的标记,输出如下

 "cloud"  "computing"  "is" "benefiting"  " major"  "manufacturing"  "companies"


import en_core_web_sm
nlp = en_core_web_sm.load()

doc = nlp("Cloud computing is benefiting major manufacturing companies")
for token in doc:
    print(token.text)

我理想上想要的是一起阅读“云计算”,因为从技术上讲这是一个词。

基本上我正在寻找一个双字母组。 Spacy中有没有允许Bi gram或Tri gram的功能?

最佳答案

Spacy允许检测名词块。因此,要将您的名词短语解析为单个实体,请执行以下操作:

  • 检测名词块
    https://spacy.io/usage/linguistic-features#noun-chunks
  • 合并名词块
  • 再次进行依赖项解析,它将现在将“云计算”解析为单个实体。

  • >>> import spacy
    >>> nlp = spacy.load('en')
    >>> doc = nlp("Cloud computing is benefiting major manufacturing companies")
    >>> list(doc.noun_chunks)
    [Cloud computing, major manufacturing companies]
    >>> for noun_phrase in list(doc.noun_chunks):
    ...     noun_phrase.merge(noun_phrase.root.tag_, noun_phrase.root.lemma_, noun_phrase.root.ent_type_)
    ...
    Cloud computing
    major manufacturing companies
    >>> [(token.text,token.pos_) for token in doc]
    [('Cloud computing', 'NOUN'), ('is', 'VERB'), ('benefiting', 'VERB'), ('major manufacturing companies', 'NOUN')]
    

    关于python-3.x - Spacy中有双字母组或三字母组合功能吗?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53598243/

    10-12 05:10