问题描述
我正在尝试理解节流阀阻止算法.该算法使用两个区域R1和R2,它们的定义如下:
I'm trying to understand the snoball stemming algorithmus. The algorithmus is using two regions R1 and R2 that are definied as follows:
R2是R1中元音之后的第一个非元音之后的区域,或者 如果没有单词,则为单词末尾的空区域 非元音.
R2 is the region after the first non-vowel following a vowel in R1, or is the null region at the end of the word if there is no such non-vowel.
http://snowball.tartarus.org/texts/r1r2.html
示例是
b e a u t i f u l
|<------------->| R1
|<----->| R2
b e a u t y
|<->| R1
->|<- R2
a n i m a d v e r s i o n
|<----------------------------------------->| R1
|<--------------------------------->| R2
s p r i n k l e d
|<------------->| R1
->|<- R2
e u c h a r i s t
|<--------------------->| R1
|<--------->| R2
我的问题是,为什么在弹性状态中的"kled"和在圣体圣事中的"harist"被定义为R1?我以为正确的结果将是墨水"和"arist"?
My question is, why is "kled" in springkled and "harist" in eucharist defined as R1? I thought the correct result would be "inkled" and "arist"?
推荐答案
您应该再次阅读定义,它说:
You should read the definition again, it says :
否: 紧随其后的是 元音.
Not: followed by a vowel.
在sprinkled
中,元音之后的第一个非元音是n
,因此后面的区域是kled
.
In sprinkled
, the first non-vowel following a vowel is n
, so the region after is kled
.
与eucharist
相同,元音之后的第一个非元音为c
,因此后面的区域为harist
.
The same for eucharist
, the first non-vowel following a vowel is c
, so the region after is harist
.
这篇关于雪球阻止:定义区域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!