问题描述
我需要构建一个部分倒排索引
.类似的东西:
I need to build a partial Inverted Index
. Something like:
l = {{x, {h, a, b, c}}, {y, {c, d, e}}}
iI[l]
(*
-> {{a, {x}}, {b, {x}}, {c, {x, y}}, {d, {y}}, {e, {y}}, {h, {x}}}
*)
我认为它的作用很明显.在输入列表中,{x, y ...} 是唯一的,而 {a, b, c, ..} 不是.输出应该按 #[[1]]
排序.
I think it is pretty clear what it does. In the input list, the {x, y ...} are unique, while the {a, b, c, ..} are not. The output ought to be ordered by #[[1]]
.
现在,我正在这样做:
iI[list_List] := {#, list[[Position[list, #][[All, 1]]]][[All, 1]]} & /@
(Union@Flatten@Last@Transpose@list)
但是对于这么简单的任务来说看起来太复杂了,看起来太慢了,我应该可以应付Legion.
But it looks too convoluted for such an easy task, seems too slow, and I should be able to cope with Legion.
用于比较结果的试驾:
words = DictionaryLookup[];
abWords = DictionaryLookup["ab" ~~ ___];
l = {#, RandomChoice[abWords, RandomInteger[{1, 30}]]} & /@ words[[1 ;; 3000]];
First@Timing@iI[l]
(*
-> 5.312
*)
那么,有什么加速的想法吗?
So, any ideas for an speedup?
推荐答案
似乎是Reap
-Sow
的经典任务(由于@Heike 在最终版本中有所改进):
Seems a classic task for Reap
-Sow
(improvement in the final version due to @Heike):
iI[list_] := Sort[Reap[Sow @@@ list, _, List][[2]]]
那么,
iI[l]
{{a, {x}}, {b, {x}}, {c, {x, y}}, {d, {y}}, {e, {y}}, {h, {x}}}
和
In[22]:=
words=DictionaryLookup[];
abWords=DictionaryLookup["ab"~~___];
l={#,RandomChoice[abWords,RandomInteger[{1,30}]]}&/@words[[1;;3000]];
First@Timing@iI[l]
Out[25]= 0.047
编辑
这是一个具有类似(稍差)性能的替代版本:
Here is an alternative version with a similar (slightly worse) performance:
iIAlt[list_] :=
Sort@Transpose[{#[[All, 1, 2]], #[[All, All, 1]]}] &@
GatherBy[Flatten[Thread /@ list, 1], Last];
有趣的是,Reap
- Sow
在这里给出了一种比基于结构操作的解决方案更快的解决方案.
It is interesting that Reap
- Sow
here gives an even slightly faster solution than the one based on structural operations.
编辑 2
仅作说明 - 对于那些喜欢基于规则的解决方案的人,这里有一个基于 Dispatch
和 ReplaceList
的组合:
Just for an illustration - for those who prefer rule-based solutions, here is one based on a combination of Dispatch
and ReplaceList
:
iIAlt1[list_] :=
With[{disp = Dispatch@Flatten[Thread[Rule[#2, #]] & @@@ list]},
Map[{#, ReplaceList[#, disp]} &, Union @@ list[[All, 2]]]]
不过,它比其他两个慢了大约 2-3 倍.
It is about 2-3 times slower than the other two, though.
这篇关于时间高效的部分倒排索引构建的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!