问题描述
披露:这是在FsCheck(我维护的F#随机测试框架)中提出的.我有一个解决方案,但我不喜欢它.而且,我不明白这个问题-它只是被规避了.
Disclosure: this came up in FsCheck, an F# random testing framework I maintain. I have a solution, but I do not like it. Moreover, I do not understand the problem - it was merely circumvented.
(monadic,如果我们要使用大词)序列的一个相当标准的实现是:
A fairly standard implementation of (monadic, if we're going to use big words) sequence is:
let sequence l =
let k m m' = gen { let! x = m
let! xs = m'
return (x::xs) }
List.foldBack k l (gen { return [] })
其中gen可以由所选的计算构建器代替.不幸的是,该实现会占用堆栈空间,因此如果列表足够长,最终堆栈就会溢出.问题是:为什么呢?我知道foldBack原则上不是尾递归,但是F#团队的聪明兔子已经在foldBack实现中规避了这一点.计算生成器实现中存在问题吗?
Where gen can be replaced by a computation builder of choice. Unfortunately, that implementation consumes stack space, and so eventually stack overflows if the list is long enough.The question is: why? I know in principle foldBack is not tail recursive, but the clever bunnies of the F# team have circumvented that in the foldBack implementation. Is there a problem in the computation builder implementation?
如果我将实现更改为以下内容,则一切正常:
If I change the implementation to the below, everything is fine:
let sequence l =
let rec go gs acc size r0 =
match gs with
| [] -> List.rev acc
| (Gen g)::gs' ->
let r1,r2 = split r0
let y = g size r1
go gs' (y::acc) size r2
Gen(fun n r -> go l [] n r)
为完整起见,可以在在FsCheck源文件中
推荐答案
基于Tomas的答案,让我们定义两个模块:
Building on Tomas's answer, let's define two modules:
module Kurt =
type Gen<'a> = Gen of (int -> 'a)
let unit x = Gen (fun _ -> x)
let bind k (Gen m) =
Gen (fun n ->
let (Gen m') = k (m n)
m' n)
type GenBuilder() =
member x.Return(v) = unit v
member x.Bind(v,f) = bind f v
let gen = GenBuilder()
module Tomas =
type Gen<'a> = Gen of (int -> ('a -> unit) -> unit)
let unit x = Gen (fun _ f -> f x)
let bind k (Gen m) =
Gen (fun n f ->
m n (fun r ->
let (Gen m') = k r
m' n f))
type GenBuilder() =
member x.Return v = unit v
member x.Bind(v,f) = bind f v
let gen = GenBuilder()
为简化起见,让我们将原始序列函数重写为
To simplify things a bit, let's rewrite your original sequence function as
let rec sequence = function
| [] -> gen { return [] }
| m::ms -> gen {
let! x = m
let! xs = sequence ms
return x::xs }
现在,无论sequence
是用Kurt.gen
还是Tomas.gen
定义的,sequence [for i in 1 .. 100000 -> unit i]
都将运行完成.问题不是sequence
在使用您的定义时导致堆栈溢出,而是从sequence
的调用返回的函数在调用 时导致堆栈溢出.
Now, sequence [for i in 1 .. 100000 -> unit i]
will run to completion regardless of whether sequence
is defined in terms of Kurt.gen
or Tomas.gen
. The issue is not that sequence
causes a stack overflow when using your definitions, it's that the function returned from the call to sequence
causes a stack overflow when it is called.
要了解为什么会这样,让我们根据基本的单子运算扩展sequence
的定义:
To see why this is so, let's expand the definition of sequence
in terms of the underlying monadic operations:
let rec sequence = function
| [] -> unit []
| m::ms ->
bind (fun x -> bind (fun xs -> unit (x::xs)) (sequence ms)) m
将Kurt.unit
和Kurt.bind
值内联并疯狂简化,我们得到了
Inlining the Kurt.unit
and Kurt.bind
values and simplifying like crazy, we get
let rec sequence = function
| [] -> Kurt.Gen(fun _ -> [])
| (Kurt.Gen m)::ms ->
Kurt.Gen(fun n ->
let (Kurt.Gen ms') = sequence ms
(m n)::(ms' n))
现在,希望很清楚为什么调用let (Kurt.Gen f) = sequence [for i in 1 .. 1000000 -> unit i] in f 0
会使堆栈溢出:f
需要一个非尾递归调用来对结果函数进行排序和求值,因此每个递归调用都会有一个堆栈帧.
Now it's hopefully clear why calling let (Kurt.Gen f) = sequence [for i in 1 .. 1000000 -> unit i] in f 0
overflows the stack: f
requires a non-tail-recursive call to sequence and evaluation of the resulting function, so there will be one stack frame for each recursive call.
将Tomas.unit
和Tomas.bind
内联到sequence
的定义中,我们得到以下简化版本:
Inlining Tomas.unit
and Tomas.bind
into the definition of sequence
instead, we get the following simplified version:
let rec sequence = function
| [] -> Tomas.Gen (fun _ f -> f [])
| (Tomas.Gen m)::ms ->
Tomas.Gen(fun n f ->
m n (fun r ->
let (Tomas.Gen ms') = sequence ms
ms' n (fun rs -> f (r::rs))))
对这种变体进行推理很棘手.您可以凭经验验证它不会对某些任意大的输入造成损失(如Tomas在他的回答中所示),并且您可以逐步进行评估以使自己相信这一事实.但是,堆栈消耗取决于传入列表中的Gen
实例,并且 可能会为本身不是尾部递归的输入而炸毁堆栈:
Reasoning about this variant is tricky. You can empirically verify that it won't blow the stack for some arbitrarily large inputs (as Tomas shows in his answer), and you can step through the evaluation to convince yourself of this fact. However, the stack consumption depends on the Gen
instances in the list that's passed in, and it is possible to blow the stack for inputs that aren't themselves tail recursive:
// ok
let (Tomas.Gen f) = sequence [for i in 1 .. 1000000 -> unit i]
f 0 (fun list -> printfn "%i" list.Length)
// not ok...
let (Tomas.Gen f) = sequence [for i in 1 .. 1000000 -> Gen(fun _ f -> f i; printfn "%i" i)]
f 0 (fun list -> printfn "%i" list.Length)
这篇关于为什么此F#序列函数不能尾部递归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!