问题描述
我正在阅读有关Prolog中上下文无关文法的本教程,他们在页面底部提到了使用差异列表在Prolog中实现上下文无关文法,其中包括以下代码块:
I'm reading this tutorial on context free grammars in Prolog, and they mention at the bottom of the page implementing a context free grammar in Prolog using difference lists, with the following code block included:
s(X,Z):- np(X,Y), vp(Y,Z).
np(X,Z):- det(X,Y), n(Y,Z).
vp(X,Z):- v(X,Y), np(Y,Z).
vp(X,Z):- v(X,Z).
det([the|W],W).
det([a|W],W).
n([woman|W],W).
n([man|W],W).
v([shoots|W],W).
它提到:
还有
[a,woman,shoots,a,man] [].
代表一个女人向男人开枪的句子,因为它说:如果我用掉左边的所有符号,而在右边留下符号,那么我就有一个感兴趣的句子.感兴趣的是这两个列表的内容之间的区别.
represents the sentence a woman shoots a man because it says: If I consume all the symbols on the left, and leave behind the symbols on the right, then I have the sentence I am interested in. That is, the sentence we are interested in is the difference between the contents of these two lists.
这就是我们需要了解的差异列表以重写我们的识别器的全部内容.如果我们只是简单地记住消费某物的想法,而将其抛在脑后,那么我们将获得以下识别器:
That’s all we need to know about difference lists to rewrite our recogniser. If we simply bear in mind the idea of consuming something, and leaving something behind in mind, we obtain the following recogniser:
作为一种解释,但这只是做任何事情都无法向我澄清此代码.我知道它比使用append/3
效率更高,但是就消耗事物和将其他事物抛在后面的概念而言,它似乎比以前的append/3
实现更加混乱,而且我无法做到这点.此外,当我将该代码复制并粘贴到 Prolog可视化工具以尝试理解它时,可视化工具说是错误.有人能对此有所启发吗?
As an explanation, but that just isn't doing anything to clarify this code to me. I understand it's more efficient than using append/3
, but as for the notion of consuming things and leaving others behind, it seems so much more confusing than the previous append/3
implementation, and I just can't make heads or tails of it. Furthermore when I copy and paste that code into a Prolog visualizer to try to understand it, the visualizer says there are errors. Could anyone shed some light on this?
推荐答案
列为事实
让我们尝试通过一个反例对此进行解释.让我们用简单的事实指定名词,动词等:
Listing as facts
Lets try to explain this with a counter example. Let's specify the nouns, verbs, etc. with simple facts:
det(the).
det(a).
n(woman).
n(man).
v(shoots).
现在我们可以将名词短语 np
实施为:
Now we can implement a noun phrase np
as:
np([X,Y]) :-
det(X),
n(Y).
换句话说,我们说名词短语是带有两个单词的句子,第一个是det
,第二个是n
".这将起作用:如果我们查询np([a,woman])
,它将成功执行,等等.
In other words we say "a noun phrase is a sentence with two words the first being a det
, the second a n
". And this will work: if we query np([a,woman])
, it will succeed etc.
但是现在我们需要做更多的事情,定义动词短语.动词短语有两种:一种是带动词的名词短语,其最初定义为:
But now we need to do something more advance, define the verb phrase. There are two possible verb phrases: the one with a verb and a noun phrase that was originally defined as:
vp(X,Z):- v(X,Y),np(Y,Z).
我们可以将其定义为:
vp([X|Y]) :-
v(X),
np(Y).
还有一个只有一个动词的人:
And the one with only a single verb:
vp(X,Z):- v(X,Z).
可以将其转换为:
vp([X]) :-
v(X).
猜测问题
但是,问题在于,两种变体的单词数都不同:有些动词短语中只有一个单词和三个单词.那不是真正的问题,但现在说-我知道这是不正确的英语-存在一个定义为vp
后跟np
的句子,因此应为:
The guessing problem
The problem however is that both variants have a different number of words: there are verb phrases with one word and with three words. That's not really a problem, but now say - I know this is not correct English - there exists a sentence defined as vp
followed by np
, so this would be:
s(X,Z):- vp(X,Y), np(Y,Z).
在原始语法中.
问题是,如果我们想将其转换为新的表示方式,我们需要知道多少vp
会消耗 (vp
会吃掉多少字).我们无法事先知道:由于到此为止我们对句子还不太了解,因此我们无法猜测vp
是否会吃掉一个或三个单词.
The problem is that if we want to transform this into our new way of representing it, we need to know how much vp
will consume (how much words will be eaten by vp
). We cannot know this in advance: since at this point we don't know much about the sentence, we cannot make a guess whether vp
will eat one or three words.
我们当然可以用以下方式猜测单词的数量:
We might of course guess the number of words with:
s([X|Y]) :-
vp([X]),
np(Y).
s([X,Y,Z|T]) :-
vp([X,Y,Z]),
np(Z).
但是我希望您可以想象,如果用1、3、5和7个单词来定义动词短语,事情将会变得很麻烦.解决此问题的另一种方法是将其留给Prolog:
But I hope you can imagine that if you would define verb phrases with 1, 3, 5 and 7 words, things will get problematic. Another way to solve this, is leave this to Prolog:
s(S) :-
append(VP,NP,S),
vp(VP),
np(NP).
现在,Prolog将猜测如何先将句子细分为两部分,然后再尝试匹配每一部分.但是问题在于,对于带有 n 个单词的句子,存在 n 个断点.
Now Prolog will guess how to subdivide the sentence into two parts first and then try to match every part. But the problem is that for a sentence with n words, there are n breakpoints.
因此Prolog首先将其拆分为:
So Prolog will for instance first split it as:
VP=[],NP=[shoots,the,man,the,woman]
(请记住,我们交换了动词短语和名词短语的顺序).显然,vp
如果得到一个空字符串,将不会很高兴.这样就很容易被拒绝.但接下来它说:
(remember we swapped the order of the verb phrase and noun phrase). Evidently vp
will not be very happy if it gets an empty string. So that will be rejected easily. But next it says:
VP=[shoots],NP=[the,man,the,woman]
现在,vp
只对shoots
感到满意,但是要实现这一点需要一些计算工作.但是np
并没有使这部分有趣.因此Prolog再次回溯:
Now vp
is happy with only shoots
, but it will require some computational effort to realize that. np
is however not amused with this long part. So Prolog backtracks again:
VP=[shoots,the],NP=[man,the,woman]
现在vp
会再次抱怨说的话太多了.最终Prolog将使用以下内容正确分割它:
now vp
will complain again that it has been given too much words. Finally Prolog will split it correctly with:
VP=[shoots,the,woman],NP=[the,woman]
重点是,它需要大量的猜测.对于这些猜测中的每一个,vp
和np
也将需要进行工作.对于真正复杂的语法,vp
和np
可能会进一步拆分句子,从而导致大量的反复尝试.
The point is that it requires a large number of guesses. And for each of these guesses vp
and np
will require work as well. For a real complicated grammar, vp
and np
might split the sentence further resulting in a huge amount of try-and-error.
真正的原因是append/3
没有如何拆分句子的语义"线索,因此它尝试了所有可能.但是,人们更感兴趣的是一种方法,其中vp
可以提供有关其真正想要的句子份额的信息.
The true reason is the append/3
has no "semantical" clue how to split the sentence, so it tries all possibilities. One is however more interested in an approach where vp
could provide information about what share of the sentence it really wants.
此外,如果您必须将句子分为三部分,执行此操作的方法甚至会增加到 O(n ^ 2),依此类推.因此猜测不会成功.
Furthermore if you have to split the sentence into 3 parts, the number of ways to do this even increases to O(n^2) and so on. So guessing will not do the trick.
您还可以尝试生成一个随机动词短语,然后希望该动词短语匹配:
You might also try to generate a random verb phrase, and then hope the verb phrase matches:
s(S) :-
vp(VP),
append(VP,NP,S),
np(NP).
但是在这种情况下,猜中的动词短语数量将呈指数级增长.当然,您可以提供提示"等以加快该过程,但仍需要一些时间.
But in that case the number of guessed verb phrases will blow up exponentially. Of course you can give "hints" etc. to speed up the process, but it will still take some time.
您要做的是将句子的一部分提供给每个谓词,使得谓词看起来像这样:
What you want to do is to provide the part of the sentence to every predicate such that a predicate looks like:
predicate(Subsentence,Remaining)
Subsentence
是该谓词开头的单词的列表.例如,名词短语可能看起来像[the,woman,shoots,the,man]
.每个谓词都会消费它感兴趣的词:直到特定点的词.在这种情况下,名词短语仅对['the','woman']
感兴趣,因为这是名词短语.为了进行剩余的分析,它返回剩余的部分[shoots,the,woman]
,以希望其他谓词可以消耗句子的其余部分.
Subsentence
is a list of words that start for that predicate. For instance for a noun phrase, it might look like [the,woman,shoots,the,man]
. Every predicate consumes the words it is interested in: the words up to a certain point. In this case, the noun-phrase is only interested in ['the','woman']
, because that is a noun-phrase. To do the remaining parsing, it returns the remaining part [shoots,the,woman]
, in the hope that some other predicate can consume the remainder of the sentence.
对于我们的事实表,很简单:
For our table of facts that is easy:
det([the|W],W).
det([a|W],W).
n([woman|W],W).
n([man|W],W).
v([shoots|W],W).
因此,这意味着如果查询一个设置:[the,woman,shoots,...]
,并询问det/2
这是否是确定者,它将说:是,the
是确定者,而其余部分[woman,shoots,...]
不是确定符的一部分,请与其他匹配项匹配.
It thus means that if you query a setence: [the,woman,shoots,...]
, and you ask the det/2
whether that's a determiner, it will say: "yes, the
is a determiner, but the remaining part [woman,shoots,...]
" is not part of the determiner, please match it with something else.
完成此匹配是因为列表表示为链接列表.实际上,[the,woman,shoots,...]
表示为[the|[woman|[shoots|...]]]
(因此它指向下一个子列表").如果您符合条件:
This matching is done because a list is represented as a linked list. [the,woman,shoots,...]
, is actually represented as [the|[woman|[shoots|...]]]
(so it points to the next "sublist"). If you match:
[the|[woman|[shoots|...]]]
det([the|W] ,W)
它将[woman|[shoots|...]]
与W
统一,从而导致:
It will unify [woman|[shoots|...]]
with W
and thus result in:
det([the|[woman|[shoots|...]],[woman|[shoots|...]]).
因此返回了剩余列表,因此它已消耗 the
部分.
Thus returning the remaining list, it has thus consumed the the
part.
现在,如果我们定义名词短语:
Now in case we define the noun phrase:
np(X,Z):- det(X,Y), n(Y,Z).
然后再次调用[the,woman,shoots,...]
,它将查询与该列表统一的X
.它将首先调用将消耗the
的det
,而无需回溯.接下来的Y
等于[woman,shoots,...]
,现在n/2
将消耗女人并返回[shoots,...]
.这也是np
返回的结果,另一个谓词将不得不使用它.
And we again call with [the,woman,shoots,...]
, it will query unifying X
with that list. It will first call det
that will consume the
, without any need for backtracking. Next Y
is equal to [woman,shoots,...]
, now the n/2
will consume woman and return [shoots,...]
. This is also the result the np
will return, and another predicate will have to consume this.
假设您将自己的名字介绍为附加名词:
Say you introduce your name as an additional noun:
n([doug,smith|W],W).
(抱歉使用小写字母,但Prolog会将其视为变量).
(sorry for using small cases, but otherwise Prolog sees these as variables).
它将简单地尝试将前两个单词与doug
和smith
匹配,如果成功,则尝试匹配句子的其余部分.因此,可以这样设置:[the,doug,smith,shoots,the,woman]
(很抱歉,在英语中,一些名词短语直接映射到名词np(X,Y) :- n(X,Y)
,因此对于更复杂的英语语法,可以将the
删除).
It will simply try to match the first two words with doug
and smith
, and if that succeeds, try to match the remaining of the sentence. So one can make a setence like: [the,doug,smith,shoots,the,woman]
(sorry about that, furthermore in English, some noun phrases map to a noun directly np(X,Y) :- n(X,Y)
so the the
can be removed for a more complex English grammar).
猜测完全消除了吗? 否.消费仍有可能重叠.例如,您可以添加:
Is the guessing completely eliminated? No. It is still possible that there is overlap in consuming. For instance you might add:
n([doug,smith|W],W).
n([doug|W],W).
在这种情况下,如果您查询[the,doug,smith,shoots,the,woman]
.它将首先消耗/食用det
中的,然后将寻找要消耗[doug,smith,...]
的名词.有两个候选人. Prolog首先将尝试只吃doug
,并将[smith,shoots,...]
作为整个动词短语匹配,但是由于smith
不是动词,它将回溯,重新考虑只吃一个单词,因此决定同时吃两个doug
和smith
作为名词.
In that case if you query for [the,doug,smith,shoots,the,woman]
. It will first consume/eat the in det
, next it will look for a noun to consume from [doug,smith,...]
. There are two candidates. Prolog will first try to eat only doug
, and match [smith,shoots,...]
as an entire verb phrase, but since smith
is not a verb, it will backtrack, reconsider eating only one word and thus decinde to eat both doug
and smith
as a noun instead.
关键是,使用添加时,Prolog也必须回溯.
The point is that when using append, Prolog would have to backtrack as well.
通过使用差异列表,您可以吃任意数量的单词.返回余数,以使其他句子部分(如动词短语)都将消耗余数.此外,该列表始终是完全扎根的,因此绝对不能首先使用蛮力来生成各种动词短语.
By using difference lists, you can eat an arbitrary amount of words. The remainder is returned such that other sentence parts like the verb phrase aim to consume the remainder. The list is furthermore always fully grounded, so one definitely does not use brute force to generate all kinds of verb phrases first.
这篇关于在Prolog中使用差异列表的上下文无关文法如何发挥作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!