Collector Functions in Scheme

The Collector paradigm is a functional way of assembling and processing collections of data. As a design pattern it also offers a few benefits, and at the very least provides a unique academic exercise.

I’ve recently been working through the classic book; The Little Schemer, which is a cute and concise introduction to Scheme. The Chapter covering this topic delivered the first real friction, so I took some time to chew through it, understand it, and expand its explanation.

Let’s start by defining a function which takes a list of numbers, and uses the collector paradigm to sort the numbers into two categories, Odd and Even:

(define sep-odds
  (lambda (lat col)
    (cond
      ((null? lat)
        (col '() '()))
      ((= (modulo (car lat) 2) 0)
        (sep-odds (cdr lat) (lambda (odds evens)
                              (col odds (cons (car lat) evens)))))
      (else
        (sep-odds (cdr lat) (lambda (odds evens)
                                (col (cons (car lat) odds) evens)))))))

This function will consume two arguments: a list of numbers (lat) and a collector function (col). Its body consist’s of one cond statement with 3 branches:

BranchDescription
(null? lat)Checks if lat is empty
(= (modulo (car lat) 2) 0)Checks if the first entry in the list is even
elseConcludes first list entry must be odd

Take a note as to how sep-odds recurs: It does not build a list in the simple way of functionally building up a collection. Instead, if sep-odds is to recur it builds a lambda function which will perform a particular task (in this case using cons to prepend the number to one of the lists) and then passes this lambda function to the next call of sep-odds as the col argument.

Wrapping a task in a lambda is a neat trick: It delays execution of the wrapped code until the lambda is called.

Another neat trick is that sep-odds stuffs the previous collector function (bound to col as a function parameter) into the new lambda’s body. I like to visualize this as a sandwich, with previous collectors being stuffed inside new ones. Finally, when one of these lambdas is called, it does its work, and then calls the collector function embedded inside it.

For our sep-odds function, you may notice that we need to provide an initial collector. Let’s get to it.

First Time with Scheme? Please Note: (car x) returns the first element of the list x, and (cdr x) returns list x excluding the first element.

We will define list as our 0th collector. This will be the very centre of the lambda function sandwich:

(define col-0 list)

Now let’s step through what happens when we perform the following function call:

(sep-odds '(1 2 3 4) col-0) ; return value -> ((1 3) (2 4))

(car lat) for the first call of sep-odds is 1, therefore the third condition (else) is matched. This will generate a new lambda with col-0 as col and 1 (car lat) inside it. For clarity, we will label this newly generated lambda function as col-1:

; consumes two lists (odds and evens) and adds 1 to the odds list
(define col-1
  (lambda (odds evens)
    (col-0 (cons 1 odds) evens)))

This conditional branch will then call sep-odds again, with the rest of the list (cdr lat) and the newly-generated col-1 as arguments:

(sep-odds '(2 3 4) col-1)

After this second function call the second conditional branch (= (modulo (car lat) 2) 0) is triggered as 2 is an even number. Same story again: a new lambda is built, but this time col-1 is bound to col:

; consumes two lists (odds and evens) and adds 2 to the evens list
(define col-2
  (lambda (odds evens)
    (col-1 odds (cons 2 evens))))

Now col-2 and the remainder of the list will serve as the arguments for the sep-odds recursive call:

(sep-odds '(3 4) col-2)

We repeat, again with an odd number (3):

; consumes two lists (odds and evens) and adds 3 to the odds list
(define col-3
  (lambda (odds evens)
    (col-2 (cons 3 odds) evens)))

and our next call:

(sep-odds '(4) col-3)

Again, with an even number (4):

; consumes two lists (odds and evens) and adds 4 to the evens list
(define col-4
  (lambda (odds evens)
    (col-3 odds (cons 4 evens))))

Finally, since the input list is empty, the null? conditional is triggered with this final call:

(sep-odds '() col-4)

That null? conditional applies the col function. In this case it is our latest, col-4:

; excerpt:
((null? lat) (col-4 '() '()))

At this moment our sandwich is at its juiciest. Each layer takes an input from the previous one, as the layers are stripped away. We will begin with the outside ‘bun’ of the sandwich and go inwards. Each collector function calls the collector embedded within it:

(define col-4
  (lambda (odds evens)
    (col-3 odds (cons 4 evens))))
;   |____|
;     |
;     |
      (define col-3
        (lambda (odds evens)
          (col-2 (cons 3 odds) evens)))
;         |____|
;           |
;           |
            (define col-2
              (lambda (odds evens)
                (col-1 odds (cons 2 evens))))
;               |____|
;                 |
;                 |
                  (define col-1
                    (lambda (odds evens)
                      (col-0 (cons 1 odds) evens)))

We are lucky we used labels, In reality the lambda generated looks like this:

(lambda (odds evens)
    ((lambda (odds evens)
          ((lambda (odds evens)
                ((lambda (odds evens)
                      (col-0 (cons 1 odds) evens))
                  odds (cons 2 evens)))
            (cons 3 odds) evens))
      odds (cons 4 evens)))

And this is how the function calls look with the computed values of the arguments:

1. (col-4 '() '())
2. (col-3 '() '(4))
3. (col-2 '(3) '(4))
4. (col-1 '(3) '(2 4))
5. (col-0 '(1 3) '(2 4))
6. (list '(1 3) '(2 4))
7. ((1 3) (2 4))

All of these expressions have the same value.

So now the big question: Why? Why complicate things like this? Why not just use multiple lists as arguments to gather the data we want?

Well, Collector functions offer two advantages: Delayed Execution, and Continuations.

To address the first point, in this example, each generated lambda is doing something relatively inexpensive: using cons to prepend a number to a list. But what if it was performing some sort of expensive operation? For example: performing operations on a matrix with massive proportions, or rendering graphics. Perhaps there is hypothetical example where we do not want to do anything computationally expensive until the data input has been completely verified, from start to finish? In that case a collector paradigm would work well. Building the collector works sort of as a type of manifest or queue, of operations to be performed.

Secondly, if you have a function which represents a series of steps, and each step has their own state, you can travel back to these states. In effect, you save a snapshot of that point in time. And in Scheme, since you can pass function along; you can pass these snapshots around. This gives you flexibility to return back to times where data was valid or matched some sort of pattern. One thing that comes to mind is traversing a Tree data structure and if you are exploring the children of a branch, you can pass forward a collector that represents that root, until you reach something you need and feed that data into the collector.