为什么用Lucene标记文本? | 为什么用Lucene标记文本

本文介绍了为什么用Lucene标记文本?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是Lucene的初学者.这是我的资料来源:

I'm beginner of lucene. Here's my source:

ft = new FieldType(StringField.TYPE_STORED);
    ft.setTokenized(false);
    ft.setStored(true);
    ftNA = new FieldType(StringField.TYPE_STORED);
    ftNA.setTokenized(true);
    ftNA.setStored(true);

为什么用lucene标记?例如:我的名字是lee"的字符串值

Why tokenized in lucene? For example: the String value of "my name is lee"

用字母标记的情况下，我"，名称"，是"，"lee"
没有标记的情况下，我的名字是李"

我不明白为什么要通过标记化来建立索引.标记化和未标记化之间有什么区别?

I'dont understand why indexing by tokenized. What is the difference between tokenized and not tokenized?

推荐答案

Lucene通过在文档中找到满足查询表达的约束条件的代币来工作. em>.

Lucene works by finding tokens in documents which satisfy constraints expressed by a query.

例如，如果搜索lee，则查询将查找包含令牌 lee的所有文档.如果未对字段进行标记，则只能找到my name is lee，而不能找到例如lee.

If you search for lee for instance, the query will find all documents that contain the token lee. If the field isn't tokenized, you'll only be able to find my name is lee, but not just lee for instance.

现在假设您搜索"is lee".这是一个PhraseQuery，这意味着它将与令牌is和令牌lee匹配.

Now suppose you search for "is lee". This is a PhraseQuery, which means it'll match the token is followed by the token lee.

令牌化是因为Lucene使用的是倒排索引，即它将令牌映射到包含它们的文档.

Tokenization is needed because Lucene works with an inverted index, ie it maps tokens to the documents that contain them.

这篇关于为什么用Lucene标记文本?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！