

我有一组类型为 Idea

public class Idea
    public string Title { get; set; }
    public string Body { get; set; }

我想按子字符串搜索此对象.例如,当我有标题为" idea "的对象时,我希望在输入" idea "的任何子字符串时都能找到它: i,id,ide,想法,d,de,dea,e,ea,a .

I want to search this objects by substring. For example when I have object of title "idea", I want it to be found when I enter any substring of "idea": i, id, ide, idea, d, de, dea, e, ea, a.


I'm using RavenDB for storing data. The search query looks like that:

var ideas = session
              .Query<IdeaByBodyOrTitle.IdeaSearchResult, IdeaByBodyOrTitle>()
              .Where(x => x.Query.Contains(query))


public class IdeaByBodyOrTitle : AbstractIndexCreationTask<Idea, IdeaByBodyOrTitle.IdeaSearchResult>
    public class IdeaSearchResult
        public string Query;
        public Idea Idea;

    public IdeaByBodyOrTitle()
        Map = ideas => from idea in ideas
                       select new
                               Query = new object[] { idea.Title.SplitSubstrings().Concat(idea.Body.SplitSubstrings()).Distinct().ToArray() },
        Indexes.Add(x => x.Query, FieldIndexing.Analyzed);


SplitSubstrings() is an extension method which returns all distinct substrings of given string:

static class StringExtensions
    public static string[] SplitSubstrings(this string s)
        s = s ?? string.Empty;
        List<string> substrings = new List<string>();
        for (int i = 0; i < s.Length; i++)
            for (int j = 1; j <= s.Length - i; j++)
                substrings.Add(s.Substring(i, j));
        return substrings.Select(x => x.Trim()).Where(x => !string.IsNullOrEmpty(x)).Distinct().ToArray();

这不起作用.特别是因为RavenDB无法识别 SplitSubstrings()方法,因为它在我的自定义程序集中.如何使这项工作,基本上如何迫使RavenDB识别这种方法?除此之外,我的方法是否适合这种搜索(按子字符串搜索)?

This is not working. Particularly because RavenDB is not recognizing SplitSubstrings() method, because it is in my custom assembly. How to make this work, basically how to force RavenDB to recognize this method ? Besides that, is my approach appropriate for this kind of searching (searching by substring) ?



Basically, I want to build auto-complete feature on this search, so it need to be fast.


Btw: I'm using RavenDB - Build #960



You can perform substring search across multiple fields using following approach:


public class IdeaByBodyOrTitle : AbstractIndexCreationTask<Idea>
    public IdeaByBodyOrTitle()
        Map = ideas => from idea in ideas
                       select new



So by default, if you check the index terms inside the raven client, it looks following:

Title                    Body
------------------       -----------------
"the idea title 1"       "the idea body 1"
"the idea title 2"       "the idea body 2"


Based on that, wildcard query can be constructed:

var wildquery = string.Format("*{0}*", QueryParser.Escape(query));

然后与 .In .Where 构造一起使用(在内部使用OR运算符):

which is then used with the .In and .Where constructions (using OR operator inside):

var ideas = session.Query<User, UsersByDistinctiveMarks>()
                   .Where(x => x.Title.In(wildquery) || x.Body.In(wildquery));



Alternatively, you can use pure lucene query:

var ideas = session.Advanced.LuceneQuery<Idea, IdeaByBodyOrTitle>()
                   .Where("(Title:" + wildquery + " OR Body:" + wildquery + ")");


您也可以使用 .Search 表达式,但是如果要跨多个字段搜索,则必须以不同的方式构造索引:

You can also use .Search expression, but you have to construct your index differently if you want to search across multiple fields:

public class IdeaByBodyOrTitle : AbstractIndexCreationTask<Idea, IdeaByBodyOrTitle.IdeaSearchResult>
    public class IdeaSearchResult
        public string Query;
        public Idea Idea;

    public IdeaByBodyOrTitle()
        Map = ideas => from idea in ideas
                       select new
                               Query = new object[] { idea.Title, idea.Body },

var result = session.Query<IdeaByBodyOrTitle.IdeaSearchResult, IdeaByBodyOrTitle>()
                    .Search(x => x.Query, wildquery,
                            escapeQueryOptions: EscapeQueryOptions.AllowAllWildcards,
                            options: SearchOptions.And)


还要记住, * term * 相当昂贵,尤其是前导通配符.在此帖子中,您可以找到有关它的更多信息.据说,通配符前导会迫使lucene对索引进行全面扫描,因此会大大降低查询性能.Lucene在内部存储按字母顺序排序的索引(实际上是字符串字段的术语),并从左到右读取".这就是为什么快速搜索尾部通配符而搜索慢的通配符的原因.

Also have in mind that *term* is rather expensive, especially the leading wildcard. In this post you can find more info about it. There is said, that leading wildcard forces lucene to do a full scan on the index and thus can drastically slow down query-performance. Lucene internally stores its indexes (actually the terms of string-fields) sorted alphabetically and "reads" from left to right. That’s the reason why it is fast to do a search for a trailing wildcard and slow for a leading one.

因此也可以使用 x.Title.StartsWith("something"),但这显然不会搜索所有子字符串.如果需要快速搜索,可以更改要分析的字段的索引"选项,但不会再次搜索所有子字符串.

So alternatively x.Title.StartsWith("something") can be used, but this obviously do not search across all substrings. If you need fast search, you can change the Index option for the fields you want to search on to be Analyzed but it again will not search across all substrings.

如果子字符串查询中有一个 空格键 ,请选中此,以寻求可能的解决方案.有关建议,请查看 http://architects.dzone.com/articles/how-do-suggestions-ravendb .

If there is a spacebar inside of the substring query, please check this question for possible solution.For making suggestions check http://architects.dzone.com/articles/how-do-suggestions-ravendb.


08-03 23:12