Monday, May 26, 2014

Y-Combinator

\(\mathbf Y\)-Combinator

In an earlier post, I wrote, I would explore the \(\mathbf Y\)-Combinator in more detail. Back then, we used it to have a very simple and non-intrusive way of memoizing recursive functions. Here we look at the origins of the \(\mathbf Y\)-Combinator from a more theoretical point-of-view.

Several models for computability

Humans have an intuitive understanding of what it means that a function can be computed. This understanding, however, cannot be directly formalized, because it comes from intuition rather than rigid rules. In search for a formalization of the term computability, various different formalisms were defined. E.g. WHILE-programs, partial recursive functions, untyped lambda calculus, turing machines, etc. One thing that is very particular, is that those models seem to capture our intuition of what computable means well and it has been proven, that all those models are able to compute the same exact class of functions. The hypothesis, that all those models capture the intuitive notion of computability, is know as the Church-Turing-Thesis.

Functional languages and the lambda calculus

Typical imperative languages are high-level abstractions of core operations represented by a turing machine, i.e. changing state on an infinite tape. Whereas functional languages are based on the core operation of lambda calculus, namely lambda abstraction. In fact, the very first version of LISP was an emulation of the lambda calculus.

For our exploring of the lambda calculus, I chose three languages:

  • Scheme: A LISP-dialect which is dynamically typed. Being dynamically typed will for our purpose take the role of an untyped lambda calculus.
  • Haskell: A statically typed lazy functional programming language. Being lazy is relevant, because there are certain expressions, that only terminate, because of the lazy evaluation order.
  • F#: A statically typed strict functional programming language. This one makes reasoning about the program particularily convenient, because terms have static types and the evaluation order is very straight-forward. And we also use it to show, what impacts laziness can have.

There are two kinds of lambda calculi; the typed and the untyped ones. In general, the untyped one is strictly more powerful than the typed one. In particular, in the untyped lambda calculus, recursion can be expressed as a particular lambda term \(\mathbf Y = \lambda f.(\lambda x.f (xx))(\lambda x.f (xx))\). This term has the property, that for any \(F\), \(\mathbf Y F = F (\mathbf Y F)\). In Scheme this can be translated almost literarily, because of being dynamically typed:

1: 
2: 
3: 
4: 
5: 
6: 
(define Y
    (lambda (F)
        ((lambda (procedure)
            (F (lambda (arg) ((procedure procedure) arg))))
        (lambda (procedure)
            (F (lambda (arg) ((procedure procedure) arg)))))))

the respective terms in F# or Haskell would not type check. Consider the lambda term fun x -> x x. What would the type be? From the first occurence, we infer x : 'a -> 'a and from the second occurence, we infer x : 'a and this is a contradiction, basically the term cannot be typed.

It turns out, that this is not merely a coincidence, but actually a fundamental problem with the typed lambda calculus. According to the strong normalization theorem, in the typed lambda calculus, every term has a strong normal form, i.e. every program necessarily terminates and as such it cannot be turing complete. However, our modern programming languages like Haskell or F# certainly are turing complete and they are typed. What we need to add to the typed lambda calculus to restore turing completeness is a fixed point combinator, namely an equivalent of the \(\mathbf Y\)-Combinator.

In F#, we have an explicit keyword for that, rec:

1: 
let rec Y f x = f (Y f) x

in Haskell, the rec keyword is implicit and the compiler infers what functions are recursive, but that is merely syntax.

1: 
y f = f (y f)

What the \(\mathbf Y\)-Combinator is useful for

It is useful to know, that in functional languages the recursion can be modelled with the \(\mathbf Y\)-Combinator, because we can easily rewrite a function as non-recursive, by just passing in another argument instead of explicit recursion:

1: 
2: 
3: 
let fact fact = function
    | 1 -> 1
    | n -> n * (fact (n-1))

Now, since the function is no longer recursive, or rather the recursion is done extrinsically, we can inject additional behaviour. In an earlier post, I showed, that we can make a special version of the \(\mathbf Y\)-Combinator that adds memoization for the whole recursive function. Or another possible use might be to keep track of recursion depth in an iterative deepening algorithm.

The use cases are not many, but if you happen to run into one, it is good to know, that we can intertwine a (potentially different) \(\mathbf Y\)-Combinator with an otherwise recursive function.

Laziness matters

If you try to write the \(\mathbf Y\)-Combinator in F# as

1: 
let rec Y f = f (Y f)

then every time you create a function using it, you will get a stack overflow. This is, because the compiler tries to fully evaluate the term, whereas with the additional \(\eta\)-introduction, the evaluation order is a different one.

Another interesting property of lazy evaluation is, that in certain cases, a function can be computed, if the evaluation strategy is lazy as opposed to eager. Consider the following example:

1: 
2: 
let konst x y = x
konst 3 (failwith "oh no")

now this function explodes, even though the argument y is never used. The equivalent code in Haskell goes through without problems:

1: 
2: 
konst x y = x
konst 3 (error "oh no")

the reason is, that Haskell is lazy in it's evaluation strategy and so the undefined term is never actually evaluated, because it is never used.

Now this looks like a nice property to have, but it is not without caveats. First of all, evaluating lazy adds some unnecessary overhead in case, we know that we will need all the values and also it makes code more difficult to reason about, because from just looking at the code of a function, it is not necessarily clear when what parts are evaluated. As such the runtime behaviour of code is usually much harder to predict.

F# on the other hand follows a more traditional eager evaluation strategy. In fact, F# goes one step further than most languages, in so far, as code is evaluated, compiled and type-checked in a top-down, left-to-right fashion, whereas most languages do not require dependecy ordering.

Yet, sometimes it is useful to have lazy behaviour. For this, F#, being a .net language, has a reasonable amount of laziness built into the ecosystem. First of all, we have sequences, cached sequences and lazy lists, those roughly correspond to Haskell's lists. Those are useful for certain algorithms, where it is most natural to start off with infinite sequences and filter, fold or search over them.

Apart from that, F# has a lazy keyword, which allow to mark any expression for lazy evaluation. So whilst F# is eager by default, we can easily introduce laziness, when it helps us.

In the previous example, we can fix the code, by making the code evaluate lazily:

1: 
2: 
let konst x y = x
konst 3 (lazy (failwith "no problem here"))

We have seen, that laziness and memoization can have very interesting effects on complexity or computability.

This has been a more theoretical post, my next one will be much more hands-on. Stay tuned.

val Y : f:(('a -> 'b) -> 'a -> 'b) -> x:'a -> 'b

Full name: ycombinator.Y
val f : (('a -> 'b) -> 'a -> 'b)
val x : 'a
val fact : fact:(int -> int) -> _arg1:int -> int

Full name: ycombinator.fact
val fact : (int -> int)
val n : int
val Y : f:('a -> 'a) -> 'a

Full name: ycombinator.Y
val f : ('a -> 'a)
val konst : x:'a -> y:'b -> 'a

Full name: ycombinator.konst
val y : 'b
val failwith : message:string -> 'T

Full name: Microsoft.FSharp.Core.Operators.failwith

1 comment: