分组的总术语频率

分组的总术语频率

本文介绍了Solr-分组的总术语频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说,我在Solr(最新)中抓取了以下一组分组网站:

Let's say I have the following set of grouped websites crawled and indexed in Solr (latest) :

{
    "id":"1",
    "domain": "http://www.category1website1.com",
    "domainGroup": "Group 1"
},{
    "id":"2",
    "domain": "http://www.category1website2.com",
    "domainGroup": "Group 1"
},{
    "id":"3",
    "domain": "http://www.category2website1.com",
    "domainGroup": "Group 2"
}

我正在寻找一个结果集,该结果集将为我提供每个单独域中的词频,但也可以为该搜索词(按domainGroup汇总)提供汇总词频.

I'm looking for a result set that will give me the term frequency in each individual domain but also the aggregated term frequency of that search term (aggregated by domainGroup).

对此进行了研究,使我想到了3种可能性:

Researching this has lead me to 3 possibilities:

  1. 可以使用Facet Pivot
  2. 可以使用方面+术语频率向量
  3. 无法完成
  1. Can be done with Facet Pivot
  2. Can be done with Facet + Term Frequency Vectors
  3. Cannot be done

1和2是不同的,我不确定哪个对我有用,或更糟糕的是,这两个都不通过"option" 3.

1 and 2 are different and I'm not sure which would work for me, or worse, neither via "option" 3.

很抱歉,如果不清楚.我正在尝试检索搜索词"的频率,但是我还需要按domainGroup字段汇总的频率.换句话说,我需要在一个请求中搜索所有域中的搜索词",并不仅在单个域(默认值)中检索搜索词"的频率,还需要在所有domainGroups的集合频率中进行搜索(因此,同一domainGroup下所有域中的词频).

Sorry if it's not clear. I'm trying to retrieve the frequency of the "search term" but I also need the frequency aggregated by domainGroup field. In other words I need to search ALL domains for "search term" in one request and retrieve the frequency of "search term" in NOT ONLY the individual domains (the default), but also the aggregated frequencies for all domainGroups (so the sum of term frequencies in all domains under the same domainGroup).

推荐答案

我认为具有术语频率向量的方面是您所需要的.尝试这样的查询:

I think the Facets with Term frequency Vectors is what you need.Try a query like this:

http://something/solr/select/?qt=tvrh&q=query:http://www.category2website1.com&tv.fl=query&tv.all=true&f.id.tv.tf=true&facet.field=domainGroup&facet=true&facet.limit=-1&facet.mincount=1

这篇关于Solr-分组的总术语频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 10:15