自定义全文索引存储在Cassandra中

自定义全文索引存储在Cassandra中

本文介绍了自定义全文索引存储在Cassandra中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一种情况,我使用Cassandra的数据库,我需要全文搜索功能。
现在我知道Apache Solr,Apache Cassandra和DSE搜索。



但是,我不想使用昂贵的专有软件搜索)。我不想使用Apache Solr的原因是因为我不想处理HA,分片和冗余。 Cassandra是完美的HA,分片和冗余;我想在现有的Cassandra DB中存储我的全文索引。



所以我要找的是将一个字符串分解为可索引部分。例如:



字符串input =我喜欢苹果和bannanas;

  String tokens [] = makeTokenIndex(input); 

// tokens = {I,like,apples,bannanas,apple,bannana}
pre>


显然我可以在空格上分割字符串,并使用单词作为索引键。但我正在寻找比这更聪明的东西。可以处理复数,找到单词的根等等...



修改Apache Lucene是最好的解决方案,还是有另一个选项? / p>

解决方案

我没有使用Cassandra,但我想你说的是使用一个Cassandra实现Lucene的界面。 Lucene使用目录与存储机制交互。



我发现了几个可能有帮助的项目:








我不能说出任何一个的经验。


I've got a situation where I'm using Cassandra for DB and I need full-text search capability. Now I'm aware of Apache Solr, Apache Cassandra, and DSE search.

However, I do not want to use a costly and proprietary software(DSE search). The reason I do not want to use Apache Solr is because I don't want to deal with HA, sharding, and redundency for it. Cassandra is perfect for HA, sharding, and redundency; I would like to store my full-text index in the existing Cassandra DB.

So what I'm looking for is something that will break down a string into its indexable parts. For example:

String input = "I like apples and bannanas.";

String tokens[] = makeTokenIndex(input);

//tokens = {"I","like","apples","bannanas","apple","bannana"}

Obviously I could split strings on spaces and use the words as index-keys. But I'm looking for something smarter than that. Something that can handle plurals, find the root of a word, etc...

Would modifying Apache Lucene be the best solution for this, or is there another option?

解决方案

I've not used Cassandra, but I think you're talking about using a Cassandra implementation of Lucene's Directory interface. Lucene uses a Directory to interact with a storage mechanism.

I found a couple of projects that might help:

I can't speak with experience about either one, though.

这篇关于自定义全文索引存储在Cassandra中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 16:26