The Scheme language

About This Document
Introduction
The Scheme Idiom
Atoms, Lists, and Expressions
Data Types
The Read-Eval-Print Loop
Quote
Manipulating Lists
Defining Functions
The lambda form
Flow of Control
Recursion
Code and Data
More Flow of Control

About this document

This document was originally written for the benefit of students in my CSC 270 course. Several of the assignments in the course were done in the Scheme language, and others required understanding of how the Scheme language works. Adelphi has an "info" page on the subject, readable with the command info scheme, but most of the (enormous amount of) information in this info-page is intended for people who basically know the language and want to look up some detail they've forgotten. For those who don't basically know the language, this will have to do.

The students for whom I originally wrote these notes had spent a year learning Pascal, and were now required to learn a very different language called Scheme. The natural tendency of such students was to write in Scheme with a strong "Pascal accent": they wrote the same way they would have written in Pascal, but changed the semicolons, commas, and parentheses to make it legal Scheme. As a result, even the students who got these Pascal-programs-in-Scheme to work correctly missed the opportunity to learn the different perspective Scheme brings to programming. My main concern, therefore, was to get students to think in a manner natural to the Scheme language.

The Web page, however, has taken on a life of its own, and is apparently being used as a reference by programming teachers at a number of other schools. (If you are such a teacher, feel free to link to this page, but I would prefer that you link, not copy it so that I can easily make corrections.) Accordingly, I've revised it (with many helpful suggestions, not all of which I've taken, from Shriram Krishnamurthi of Rice University) to reflect its new, broader audience.

Introduction

Scheme is a newer dialect of an old language named Lisp (Lisp and Fortran both date to the late 1950's). But while Fortran was conceived as a way for engineers to easily write programs that used mathematical equations, Lisp was intended as a logical tool. Indeed, the theoretical basis of Lisp and Scheme, "lambda calculus", was invented as a way of describing computation in the 1930's, before there were any actual computers. (For more information on these languages, see the Lisp FAQ and the Scheme FAQ, collections of Frequently Asked Questions from the comp.lang.lisp and comp.lang.scheme newsgroups.) For the rest of this article, I'll refer to "Scheme" rather than "Lisp", since Scheme is the more popular of the two these days.

Using and programming in Scheme requires a very different way of thinking than does using and programming in Pascal, C, C++, or many other "procedural" languages. The most obvious difference is that Scheme is (traditionally) interactive. When working in most procedural languages, you write a program in a text editor, get out of the editor, compile the program, run the program, find some bugs, and go back to the editor to fix them. In Scheme, you type commands one at a time and you see the responses to them immediately; the programming process consists of typing particular commands that have the effect of defining new commands. You can put such definitions into a file with a text editor and then load it as a unit, and you can mix this way of doing things freely with the more usual command-response way. Indeed, one way to load a file is to type a command which has the effect of reading in a specified file just as though you were typing it.

I've used the word "command" in the above paragraph. In fact, a better word would be "expression": Scheme expressions are primarily thought of as having a value (for example, the expression (+ 3 5) has the value 8), and only secondarily as doing something. This is another big difference between Scheme and the procedural languages you've seen before: in Pascal, each statement does something -- it assigns a variable, or reads a number, or calls a procedure -- whereas in Scheme, each expression (like (+ 3 5) has a value, but only a few special ones (like (define x 5)) do anything as a side effect. On the other hand, it means that Scheme has a strong resemblance to the language of algebra.

Many high school and university computer science programs now teach Scheme as a first language, and I've seriously considered this for our program. For a discussion of why this might be a good idea, see the article Mastering the AP CS Curriculum Without Using C++ from the TeachScheme! Project.

The Scheme Idiom

Programmers coming to Scheme from other languages like C, Pascal, C++, Fortran, etc. often solve problems the same way they would have to in those languages, even though there's a simpler, more natural way in Scheme. Following is a list of common programming concepts in these other languages, accompanied by how a Scheme programmer would be more likely to handle the same situation.

Integer-indexed arrays

Scheme has integer-indexed arrays, but unless the problem really calls for integer indexing, it's often more natural and convenient to use variable-length lists. Scheme also provides support for "association lists", which can be thought of as sparse arrays with no restriction on the type of the index.

Numbers

Scheme supports numeric computation better than C, Pascal, or Fortran (see below), but in many non-numeric applications where C/Pascal programmers would use numbers for looping or internal representation, a Scheme programmer is more likely to use symbols or complex data structures, which can be manipulated just as easily.

Procedural sequence

Fortran, C, and Pascal programmers tend to specify the desired behavior of a program in terms of a sequence of actions to be taken in order. Scheme programmers tend instead to think in terms of function applications, and to specify program behavior in terms of algebraic properties.

Looping

Although Scheme provides powerful looping constructs, it's often more natural in Scheme to use recursion. According to the Scheme standard, if a function is written "tail-recursively", it is optimized to be just as efficient as the corresponding for-loop in C or Pascal.

Variables and assignment statements

As mentioned above, a Scheme programmer can write pages of code without ever using an assignment statement. (In Fall, 1999, I taught a full semester of beginning programming in Scheme without using an assignment statement; I was hoping to get to it by the end of the term, but other things like traversing binary and n-ary trees seemed more important.) Instead, the Scheme programmer typically uses function composition, with few intermediate variables, storing all the data in function parameters that never change.

Specialized control constructs

The C language (to pick a well-known example) defines a variety of control constructs, each with its own special syntax:

if
if...else
while
for
do...while
switch...case...default:
Brace-delimited compound statements
Labeled statements
goto
continue
break
return

Most of these constructs are encompassed by the Scheme "cond" and "do" statements. In rare instances a Scheme programmer may use a more specialized construct such as "case" or a powerful one like "call-with-current-continuation" (which has no analogue in C or Pascal), but "cond" and function invocation together constitute the vast majority of Scheme control constructs in practice. Furthermore, where a C programmer is stuck with the twelve constructs above, a Scheme programmer can easily define new control constructs (see below).

Atoms, Lists, Values, and Expressions

The word "expression" technically applies to something that "has a value" or "can be evaluated". One way that Scheme differs from languages like Pascal, C, C++, Java, etc. is that its expressions are themselves legitimate values that can be passed as parameters or held in variables, and the process of "evaluation" is really just converting a complex value like "(+ 3 (* 4 5))" into the simpler "(+ 3 20)" and then into the still simpler "23".

In short, every Scheme expression is a value. However, not every value is an expression, in the sense that one could type it into a Scheme interpreter without getting a syntax error message.

The values on which the Scheme language operates come in a variety of forms, of which the most common are

a number, like 3, 355/113, or 3.14;
a symbol, like y, or +, or add-course;
a string, like "Bloch" or "Software II"; and
a parenthesized list of simpler values, like (+ 3 (* 4 x)), ("Trina" "Ian" "Sherri" "Keith" "Peg"), or (blue red green (light grey)).

You'll run into other kinds of values too, such as functions and function closures; more on these later.

Numbers, symbols, and strings are collectively referred to as atoms, so called because they cannot be broken down into smaller meaningful pieces. Lists, on the other hand, are the most common kind of complex values, i.e. values made up of smaller pieces which would have meaning by themselves. One of Scheme's strong points is the ease with which it allows the programmer to manipulate lists. Lists are typed in, and printed out, in a standard form: a pair of parentheses around a sequence of list elements, each separated by at least one space from the previous and next list elements.

Practically, a symbol looks like a variable name: it consists of a sequence of one or more letters, numbers, and punctuation marks (most often hyphens and underscores) without intervening space. Philosophically, a symbol is a data object which has no inherent meaning (as opposed to a number, which has various built-in behaviours with respect to arithmetic). Symbols can be used as the names of variables, or they can be used as data in their own right. For example, if you were keeping track of birds you had seen, you might type
(define birdlist '(crow raven pigeon starling sparrow hawk))
This defines a new symbol named "birdlist" as a variable, whose value is a list containing the symbols named "crow", "raven", etc. These symbols are all distinct from one another, but they have no built-in properties; indeed, the only meaning ascribed to any of them is that birdlist is a variable name whose current value is such-and-such a list. (I'll explain the apostrophe before the list of bird names in a moment.)

Data Types

In every C or Pascal program, the first few lines (sometimes the first few hundred lines!) declare various variables to be of various types. In Scheme, there are no such declaration statements, because any variable can hold any type of value. To put it another way, in Scheme, variables don't have type; values have type. That is, a particular variable may hold an integer at one time, a fraction at another, then a floating-point number, a character string, a list, or even a function; at any given time the value of the variable, not the fixed type of the variable, determines what can be done with the variable and how to do it.

These typeless variables present both advantages and disadvantages for the programmer. The obvious advantage is the saving of time figuring out and typing declaration statements. A less obvious advantage is the ability to write a function that works equally well on a variety of different types of data. (For example, in Pascal it is impossible to write a general "matrix multiplication" function that works for all sizes of matrices; if you want to work with 2x2, 3x3, 4x4, and 10x10 matrices, you need to write four versions of the same function. In Scheme, you could easily write a single matrix-multiplication function that looked at its arguments to see what size they were.) On the other hand, Scheme will silently allow you to assign a floating-point number to a variable that you thought contained a list of character-strings, thus creating a potentially hard-to-find bug, while C or Pascal would give you a compiler error message before you had a chance to even run the buggy program.

Numbers

A few words about numbers in Scheme. In most programming languages, there are two kinds of numbers: "integers" (limited in range, e.g. from -2147483648 to 2147483647) and "reals" or "floating-point numbers" (limited in both range and precision, typically in accordance with the IEEE Floating-Point Standard). The number 1/3, for example, is not an integer, and cannot be represented exactly as a binary floating-point number, so 1/3+1/3+1/3 may not be 1. Programmers are taught never to compare real numbers for equality, but only for whether they are "close enough".

Scheme numbers, on the other hand, more closely approximate a mathematician's notion of "number". A Scheme integer is essentially unlimited in range (until you use up all the memory in your computer representing it). A Scheme fraction like 1/3 is stored exactly as a ratio of two (essentially unlimited) integers, so 1/3+1/3+1/3 is 1, as an ordinary equality test will tell you. The language also handles IEEE-standard floating-point numbers, of course, labelling them as "inexact" to distinguish them from integers and fractions, on which arithmetic is exact. Scheme also supports complexes and interval arithmetic, either of which would require defining a new compound data structure in a language like C or Pascal.

The Read-Eval-Print Loop

When you start an interactive Scheme system (the most common kind), you see a prompt which indicates you're talking to a "read-eval-print loop". That is, Scheme is trying to read an expression, and waiting for you to type it. Once it has read an expression, it evaluates this expression and then prints the result. It's the same thing a calculator does: you type in an expression like "3+(4*5)", the calculator computes the value of that expression, and shows you the value.

Note that both the expression you type in and the value the system prints out can be any of the aforementioned kinds: you can type in a symbol and get back a list, type in a list and get back a number, etc.

Each kind of expression is evaluated in a different way:

A number

evaluates to itself, e.g. if you type in 3.14, the printed result is 3.14.

A symbol

is treated as a variable name, and evaluates to the value stored in the variable, e.g. if the variable x has the value (blue red), and you type x, the printed result is (blue red).

A string

evaluates to itself, e.g. if you type in "Bloch", the printed result is "Bloch".

A list

is evaluated under the assumption that its first element is the name of a function, and the rest of the elements are expressions providing arguments to that function. (If the first element is not the name (or some other representation) of a function, you get an error message.) Thus the evaluator will evaluate the first element of the list, see what function it is, then evaluate (perhaps recursively) each of the remaining elements of the list, and finally apply the function to the resulting values.

For example, let's trace what happens when you type in

(+ y 4)

where y is a variable currently holding the value 6.

The reader waits for you to type in an expression with correctly-matched parentheses. As soon as you've done so (and hit return), it constructs a 3-element list whose elements are the symbol named "+", the symbol named "y", and the number 4.
The evaluator evaluates the first element of the list, the symbol named "+". The value of this symbol happens to be a predefined function that adds numbers.
The evaluator then goes on to evaluate each of the other elements of the list. The symbol named "y" evaluates to the number 6, while the number 4 evaluates to the number 4.
The evaluator applies the built-in addition function to the results 6 and 4, producing as result the number 10.
The printer prints the number 10 in decimal.

The above description of evaluating lists is slightly oversimplified. In fact, several function-looking words in Scheme (most notably define and cond) are called "special forms" and treated differently: the evaluator does not necessarily evaluate all the remaining list elements, but handles each special form differently.

For example, let's trace what happens when you type in

(define z (+ y 4))

where y is still a variable holding the value 6.

The reader, as soon as you're done typing, constructs a 3-element list: the first element is the symbol named "define", the second the symbol named "z", and the third is itself a 3-element list of which the first element is the symbol named "+", the second the symbol named "y", and the third the number 4.
The evaluator examines the symbol named "define", recognizes it as a special form, and therefore does not evaluate z and (+ y 4).
The built-in function for "define" creates a new variable bound to its first argument (the symbol named "z").
The built-in function for "define" calls the evaluator explicitly on its second argument (the 3-element list (+ y 4)), which is evaluated as in the previous example to yield the number 10.
The built-in function for "define" assigns the new variable the newly-calculated value, the number 10.
The built-in function for "define" returns its first argument, the symbol named "z".
The printer prints the name of this symbol, "z". (In some implementations of Scheme, "define" may return nothing at all, in which case the printer will print nothing at all.)

Quote or Apostrophe

If everything in an expression is pre-evaluated, one might ask, how can I work with symbols themselves? The simplest answer uses a special input symbol, the apostrophe (which most Schemers pronounce "quote"). When placed before a symbol (or anything else, for that matter), the apostrophe means "don't evaluate this". For example, if you type

'bluejay

the evaluator does not look up the value of the symbol bluejay, but simply returns the symbol bluejay, which is then printed:

bluejay

Note that in some implementations of Scheme, notably the "Beginner" and "Intermediate" modes of Dr. Scheme, the output still has an apostrophe, just like the input:

'bluejay

Similarly, if you type

'(+ 3 4)

the evaluator does not add 3 and 4 and return 7, but rather returns the three-element list it was given:

(+ 3 4)

Under the hood of quote

In fact, the apostrophe is simply a convenient abbreviation for a special form named quote, which simply takes an argument, doesn't evaluate it, and returns it unaltered. Thus when you type

'bluejay

the "read" part of the read-eval-print loop translates it into

(quote bluejay)

. The evaluator then treats quote as a special form, doesn't evaluate the argument bluejay, and returns the symbol bluejay. > Similarly, if you type

'(crow raven bluejay)

the reader treats it as though you had typed

(quote (crow raven bluejay))

and the evaluator returns the three-element list

(crow raven bluejay)

Manipulating Lists

One of Scheme's most useful features is its built-in support for variable-length lists. A list is constructed using the list function:

(list 4 5 6 7)
;Value: (4 5 6 7)

Note that the list function takes as many arguments as you choose to give it, and constructs a list with that many elements.

(list 5 y birdlist '(a b c))
;Value: (5 6 (crow raven pigeon starling sparrow hawk) (a b c))

In this example, some of the elements in the list were variables, and they were replaced by their values. list is not a special form: it evaluates its arguments in a perfectly ordinary way before putting them together into a list. Note also that a list can be an element of another list.

To get individual elements from a list, you can use the built-in functions first, second,...:

(first birdlist)
;Value: crow
(second birdlist)
;Value: raven
(third birdlist)
;Value: pigeon

Only the first few such functions are defined, and in practice most Scheme programmers only use first, second, and perhaps third. The reason is that a list is not an array, and should not be used for the same purposes as an array: if you really want to get at the i-th element of a list, you're better off with a real array. However, lists make it easy to work with "the rest of" a list: Scheme has a built-in function rest that gives you "all but the first element".

(A few old-fashioned Scheme programmers use the older names "car" and "cdr" instead of "first" and "rest". The names "car" and "cdr" are something of an embarrassment: they refer to two machine instructions on a particular model of IBM computer popular in the late 1950's, standing for "contents of the address register" and "contents of the data register". Don't worry about them, but if you hear somebody talking about them, you know it's a Lisp/Scheme old-timer.)

(rest birdlist)
;Value: (raven pigeon starling sparrow hawk)
(first birdlist)
;Value: crow
(first (rest birdlist))
;Value: raven
(second birdlist)
;Value: raven
(first (rest (rest birdlist)))
;Value: pigeon
(third birdlist)
;Value: pigeon
(first (rest (rest (rest birdlist))))
;Value: starling

Note that the names "second", "third", etc. are defined only for convenience: as long as you have "first" and "rest", you can get at any list element you wish.

So "first", "rest", and their relatives allow you to take a list apart. Scheme also provides functions to put a list together. The simplest one is cons, which takes two arguments (an object and a list) and constructs a new list with the object inserted in front of the list:

(cons 'swan birdlist)
;Value: (swan crow raven pigeon starling sparrow hawk)
(cons 4 '(5 6 7))
;Value: (4 5 6 7

Note that cons does not change any of its arguments (indeed, most Scheme functions don't):

birdlist
;Value: (crow raven pigeon starling sparrow hawk)
(cons 'swan birdlist)
;Value: (swan crow raven pigeon starling sparrow hawk)
birdlist
;Value: (crow raven pigeon starling sparrow hawk)

The simplest possible list is called "the empty list" or "the null list".

()
;Value: ()
'()
;Value: ()
(cons 'lonely ())
;Value: (lonely)
(cons 'a (cons 'b ()))
;Value: (a b)

Note that cons-ing something with the empty list () produces, as you would expect, a list starting with the "something", but not containing any other elements. A list with one element is very different from that element itself!

'lonely
;Value: lonely
(list 'lonely)
;Value: (lonely)
(cons 'lonely '())
;Value: (lonely)
(cons birdlist birdlist)
;Value: ((crow raven pigeon starling sparrow hawk) crow raven pigeon
starling sparrow hawk)

The last example illustrates that cons does not care whether its first argument is itself a list; cons treats it as a single object to be inserted into another list. If you want to combine two or more lists, you can use the append function:

(append birdlist birdlist)
;Value: (crow raven pigeon starling sparrow hawk crow raven pigeon
starling sparrow hawk)
birdlist
;Value: (crow raven pigeon starling sparrow hawk)

Note that append, too, does not change its arguments.

Defining Functions

The above techniques will allow you to manipulate numbers, strings, symbols, and lists interactively, but not to do real programming. For that you need to define new functions. Recall that the C library provides certain useful functions like printf and scanf, and you can also define your own C functions, which are then treated the same way as C library functions. In the same way, although Scheme provides many built-in functions like list, +, first, cons, display, etc., you can equally well define your own functions.

Here's a simple example:

(define (add1 x) (+ x 1)))
;Value: add1

This defines a new function named "add1" which takes one parameter, "x", and returns the value of the expression "(+ x 1)". After typing this into Scheme, we can use "add1" freely, e.g.

(add1 4)
;Value: 5
(define y 6)
;Value: y
(add1 y)
;Value: 7

and even use it in other functions that you define:

(define (add3 x) (add1 (add1 (add1 x))))
;Value: add3
(add3 y)
;Value: 9
y
;Value: 6

For another example, suppose we want to build a list with three copies of something:

(define (threecopies x) (list x x x))
;Value: threecopies
(threecopies 4)
;Value: (4 4 4)
(threecopies birdlist)
;Value: ((crow raven pigeon starling sparrow hawk) (crow raven pigeon
starling sparrow hawk) (crow raven pigeon starling sparrow hawk))

The Lambda Form

In C or Pascal, every time you define a function you start by giving it a name. The same happens in the above examples, "add1" and "add3". But in Scheme you can also define a function without a name. For example, if you wanted to add 1 to the number 6, but didn't want to define an "add1" function, you could say

((lambda (x) (+ x 1)) 6)
;Value: 7

This is of course an unrealistic example, since any reasonable Scheme programmer would say

(+ 1 6)

instead. But the function "threecopies" above provides a more realistic example. If you didn't plan to use the operation "make a list with three copies of x" in the future, you might skip defining "threecopies" and just say

((lambda (x) (list x x x)) birdlist)
;Value: ((crow raven pigeon starling sparrow hawk) (crow raven pigeon
starling sparrow hawk) (crow raven pigeon starling sparrow hawk))

((lambda (x) (list x x x)) '(a b c d e f g))
;Value: ((a b c d e f g) (a b c d e f g) (a b c d e f g))

In this case, using an unnamed function would be shorter and less error-prone than writing

(list '(a b c d e f g) '(a b c d e f g) '(a b c d e f g))

So what is this "lambda" thing, anyway? "lambda" is another special form used for building functions. The first argument to "lambda" is an example of how the function is to be called:
(func-name param1 param2 ...)
in parentheses, the function name and a list of its parameters. The second argument is an expression which is the body of the function. In fact, although we defined the "add1" function above with the syntax

(define (add1 x) (+ x 1))

some Scheme programmers prefer an alternate syntax:

(define add1 (lambda (x) (+ x 1)))

This way of stating things is longer and less convenient, but makes clearer what's happening: we define a new symbol named "add1" and give it, as its value, a function which takes an argument which it calls "x" and returns x+1. Thus typing "(add1 5)" is exactly equivalent to typing "((lambda (x) (+ x 1)) 5)", since "add1" has the function "(lambda (x) (+ x 1))" as its value. Similarly, the following two function calls are exactly equivalent:

(threecopies '(a b c d e f g))
;Value: ((a b c d e f g) (a b c d e f g) (a b c d e f g))
((lambda (x) (list x x x)) '(a b c d e f g))
;Value: ((a b c d e f g) (a b c d e f g) (a b c d e f g))

Flow of Control

Every programming language needs some way of making decisions. The simplest "flow-of-control" construct, therefore, is the conditional, or "if-then-else". In Scheme, "if" is a special form with three arguments. It evaluates the first; if the first evaluates to "#t" (Scheme's notation for "true"), it evaluates and returns the second argument, and if not, it evaluates and returns the third argument. For example,

x
;Value: 3
y
;Value: 6
(> y x)
;Value: #t
(< y x)
;Value: #f
(if (> y x) 'raven 'bluebird)
;Value: raven
(if (< y x) 'raven 'bluebird)
;Value: bluebird

In C and Pascal, it's common to write a sequence of statements like "if... then... else if... then... else if... then... else...". The same can be done in Scheme, but the parentheses start nesting very deeply:

(if (< x y) 'x-less-than-y
            (if (= x y) 'x-equals-y
                        (if (= x (+ y 1)) 'x-one-more-than-y
                                          'x-at-least-two-more-than-y)))

A more convenient, and incidentally more general, way to do this is with the "cond" special form. "cond" takes one or more arguments, each of which is a list of expressions. It evaluates the first expression in the first list: if it's non-null, it then evaluates the rest of the expressions in the first list and returns the value of the last one; if it's null, on the other hand, "cond" goes on to the second list and does the same thing: evaluate the first expression, see whether it's null, and so on. In other words,

(cond (#t expr1b)
      (expr2a expr2b)
      ...
      (exprna exprnb))

is equivalent to expr1b, while

(cond (#f expr1b)
      (expr2a expr2b)
      ...
      (exprna exprnb))

is equivalent to

(cond (expr2a expr2b)
      ...
      (exprna exprnb))

So the following "cond" form has the same effect as the above three nested "if"s:

(cond ((< x y) 'x-less-than-y)
      ((= x y) 'x-equals-y)
      ((= x (+ y 1)) 'x-one-more-than-y)
      (#t 'x-at-least-two-more-than-y))

By the way, you can use the word "else" in place of exprna, producing the following slightly more readable equivalent of the above:

(cond ((< x y) 'x-less-than-y)
      ((= x y) 'x-equals-y)
      ((= x (+ y 1)) 'x-one-more-than-y)
      (else 'x-at-least-two-more-than-y))

Recursion

In procedural languages like C and Pascal, some of the most common control constructs are loops, particularly for-loops and while-loops. Most dialects of Lisp, including Scheme, also have these control constructs, but they are less widely used than another approach called recursion.

Recursion, in a programming language, simply means a function calling itself. Of course, if the function f(x) is computed by calling f(x) again, the computer will go into an infinite loop, asking the same question over and over again, so it's important that each recursive call be on a simpler version of the same question. Perhaps the most well-known example is the factorial function,

(define (factorial n)
        (if (<= n 0) 1
            (* n (factorial (- n 1)))))

That is, the factorial of n is 1 if n <= 0, and otherwise it can be computed by first computing the factorial of n-1 and multiplying the result by n. Thus

(factorial 0) returns 1
(factorial 1) returns 1*1 = 1
(factorial 2) returns 2*1 = 2
(factorial 3) returns 3*2 = 6
(factorial 4) returns 4*6 = 24
(factorial 5) returns 5*24 = 120
and so on.

Note that the function is defined in terms of a simple case, in which the answer is known (or easily computed) without recursion, and a recurrence, in which the answer, the factorial of n, could be found easily if only we knew the answer to a simpler question, the factorial of n-1. This is the most common form of recursive functions, both in Scheme and in procedural languages like C and Pascal.

Recursive functions are generally used when the form of data the function uses is itself self-referential, and can be thought of as adding successive layers of complexity to a simple "base case". For example, the most common base case for working with natural numbers is 0; the most common base case for working with lists is the empty list.

I usually use the following step-by-step approach to writing recursive functions.

Specify clearly what you want the function to do or return. (This should be the first step in any programming task!)
Look for how the function behaves with respect to the structure of the input:
- Base case(s): What should the function do when given a "base case" input that can't usefully be broken down into smaller pieces? In these situations the answer is usually already known or can be easily computed without recursion.
- Recurrence(s): If the function is given a complex input, consider what the function would do if the input were broken down into simpler pieces (you should know this from step 1), and figure out how to get the answer to the complex version of the question from the answers to simpler versions of the question.
This step can be used to decide whether the function at hand lends itself to recursive programming: if you don't find any natural recurrences, perhaps this function would be more appropriately expressed in some other way.
Write the code:
- Write a cond with exactly as many clauses as the total number of base cases and recurrences.
- For each case, test whether the input matches this case. (Do the simplest ones first, and the recursive ones last!)
- For each case, translate the recurrence (how to get the answer to the complex question from answers to simpler questions) into Scheme.
Test and debug:
- for each simple case, come up with simple test data that use that case, and make sure the program gives the right answer on those data;
- for each recursive case, come up with simple test data that use that recursive case once, and make sure the program gives the right answer on those data;
- come up with fairly simple test data that use two or three recursive calls, and make sure the program works correctly on them;
- test the program on realistic data for which you know the correct answer;
- test the program on real data for which you don't know the answer, and see whether the answer looks reasonable.

I recommend writing the following functions as exercises in recursive programming. These are not homework assignments, to be turned in for a grade, but I might put a recursion problem on your next exam.

(factorial n) = n * (n-1) * (n-2) * ... * 2 * 1
(power x n) = x^n
(sum list) = the sum of the elements in a list (presumably of numbers)
(length list) = the number of elements in the list
(member x list) = #t if x is an element of the list, #f otherwise
(nth-element n list) = the n-th element of the list, assuming n >= 1
(delete x list) = a new list of the elements in the specified list that aren't eql? to the specified element x
(reverse list) = the same elements in reverse order; I came up with both a fairly simple recursive version, and a slightly more complicated (but more efficient) tail-recursive version
(fibonacci n) = 1 if n=0 or n=1, otherwise the sum of the previous two Fibonacci numbers
(count-atoms list) = the number of atoms in the specified list; note that this is not necessarily the same as (length list), because a list might contain another list of atoms as one of its elements. For example, (length '(a (b c) d)) = 3, but (count-atoms '(a (b c) d)) = 4.
(union list1 list2) = a list containing all the elements that appear in either list1 or list2, without repeats (you may assume that list1 and list2 themselves don't contain any repeats). I suggest using eql? for comparing two objects
(intersection list1 list2) = a list containing all the elements that appear in both list1 and list2, without repeats (see above).

Code and Data

As you've already gathered, Scheme functions and Scheme data look remarkably similar: they're both parenthesized lists of symbols, numbers, strings, and other lists. By contrast, a C or Pascal program is usually written in a text file, then compiled and linked into a binary form with which you can't do anything useful except "run", "rename", "copy", and "delete", while the data used by a C or Pascal program are mostly numbers, characters, strings, and pointers.

One implication of this is that in Scheme, you're not limited to the control constructs the language designer thought would be useful; you can make up your own. In Pascal, for example, the predefined control constructs are

begin statement statement ... end;
if condition then statement;
if condition then statement else statement;
while condition do statement;
repeat statement until condition;
for variable := start-value to end-value do statement;
case expression of label : statement; label : statement; ... end
goto label
Procedure and function calls

These (unless I've left out one or two) are all the control constructs available in Pascal, and you can't invent any more. In Scheme, by contrast, code and data look exactly alike, and you can write a function that operates on code just as easily as a function that operates on data. You can easily (and usefully!) define a "data structure" that contains code in one or more of its fields, and you can easily define routines that take code as parameters, pick it apart and put it back together differently, or return executable code as a value.

This property is necessary if the language is to be cleanly defined or written in itself, which is nice because you only have to learn one language to understand what's going on. For example, perhaps the most important function in Scheme is eval, which takes a Scheme object representing an expression, evaluates the expression, and returns the result. (The Scheme version of eval takes a second "environment" parameter; we'll discuss this later.) So one could write a Scheme read-eval-print loop in a few lines of Scheme:

(define scheme (lambda ()
    (print (eval (read) (the-environment)))
    (scheme)))

Suppose, for another example, you were trying to program a process with twenty steps, each depending on the previous ones, and you didn't know until run-time how many of them to perform. In Pascal:

case n of
	1: step1;
	2: begin step1; step2; end;
	3: begin step1; step2; step3; end;
	...
	20: begin step1; step2; ... step20; end;
	end;

or perhaps

if (n >= 1) then
  begin
  step1;
  if (n >= 2) then
    begin
    step2;
    if (n >= 3) then
      begin
      ...
      end;
    end;
  end;

or maybe even

for i := 1 to n do
  case i of
    1: step1;
    2: step2;
    ...
    20: step20;
    end;

All of these methods are inelegant, repetitive, and prone to typographical errors, and none of them really makes clear that what's going on is "doing the first n steps of a sequence of steps". But worse, what if "doing the first n steps" turned out to be a common situation in your program? You'd need to write all this code over again each time it happened, and knowing that you'd gotten one copy of it right would give you no assurance that you had typed the next copy correctly.

In Scheme, you could abstract the idea of "doing the first n steps" into a function, do-first-n, write, test, and debug it once, and use it every time you needed to do the first n steps of something. An added advantage is that the steps themselves, as well as n, can be decided at run-time, and simply passed in to do-first-n.

(define do-first-n (lambda (n steplist)
    (cond ((null? steplist) ())
          ((= n 0) ())
          (#t (eval (car steplist) (the-environment))
              (do-first-n (- n 1) (cdr steplist))))))

Of course, "do-first-n" is not a terribly common control construct in most programs. But there are many other "cliches" of programming that have to be written out in full every time they're used in a procedural language, but can be abstracted in Scheme. Here are some examples. Each of these happens to have a predefined Scheme function that does it for you, but you could have written any one of them yourself in a few lines, like the above recursive definition of do-first-n:

do some particular thing to each element of a list: done by the Scheme primitives map, map*, append-map, for-each, and their relatives;
find the first element of a list that meets a certain criterion: done by the Scheme primitives list-search-positive, member, there-exists?, and their relatives;
find all elements of a list that meet a certain criterion: done by the Scheme primitives list-transform-positive, delete, for-all?, and their relatives;
apply an arbitrary operation to combine a list into one value: done by the Scheme primitives apply, reduce, and their relatives;
sort a list according to an arbitrary comparison operator: done by the Scheme primitive sort;

In all these examples, there is a parameter ("steps to perform", "particular thing", "certain criterion", "arbitrary comparison operator") which, in a procedural language, would be considered code, and therefore more difficult to handle than data.

If we're going to be passing code all over the place as function arguments, how do we specify it? Well, most code comes in the form of defined functions, so we can just pass the function name, which Scheme treats as a variable whose value is the function itself:

(define add3 (lambda (x) (+ x 3)))
ADD3
(map add3 '(3 5 7 2 9 -5))
(6 8 10 5 12 -2)

But in a procedural language, you're not restricted to passing arguments that happen to live in variables already; you can pass literals (like 3 or "hello there") too. In the same way, you can discuss code in a literal form, without bothering to define a new function. Since add3 was defined above as essentially a variable whose value was the function (lambda (x) (+ x 3)), why not just pass the function itself?

(map (lambda (x) (+ x 3)) '(3 5 7 2 9 -5))
(6 8 10 5 12 -2)

In fact, perhaps a majority of the uses of these "cliche" functions are with lambda-expressions, rather than named functions, as arguments.

More Flow of Control

In principle, any flow-of-control structure you want can be built from "cond" and recursion. But sometimes you really want a for-loop or a while-loop or a switch statement (case for Pascal fans).

First, let me describe something that doesn't sound like a flow-of-control structure; it's really just a way of declaring local variables. Here's an example:

(let ((x 3) (y 6) (z 'bluebird)) (list x y z z y))
;Value: (3 6 bluebird bluebird 6)
(let ((x '(a b c d e f g))) (list x x x))
;Value: ((a b c d e f g) (a b c d e f g) (a b c d e f g))
(let ((n 5000)) (* n n n n)
;Value: 625000000000000
n
;Unbound variable: n

"let" expects its first argument to be a list of lists: each smaller list starts with a variable name, optionally followed by a value to give that variable. "let" declares all the variables temporarily, gives them the corresponding values (if any), evaluates the remaining arguments, returns the last one, and discards the variables it just declared.

Note that "let" is NOT an assignment statement! It is not intended to change the value of an existing variable, but rather to declare a new one; if the new one has the same name as an existing variable, the old variable (with its old value) will still exist after the "let" is over. A good way to think about this is to read the first example above as "let x be 3, y be 6, and z be bluebird for the moment. Now evaluate the expression `(list x y z z y)'. OK, you can forget about x, y, and z now."

There are several variants on "let", with names like "let*", "letrec", "fluid-let", and "named let", but I don't want to go into the differences now; you can read about them in the documentation.

The "do" special form resembles "let" -- it declares and initializes some local variables, then evaluates some expressions -- but it does all this repeatedly until a specified condition is met. "do" expects its first argument to be a list of lists, each consisting of a variable name, an optional initial value, and an optional "update expression". The second argument to "do" should be a list of expressions analogous to those in "cond": if the first one evaluates to non-null, it evaluates the rest and returns the last one, but otherwise, "do" goes on to evaluate its third, fourth, fifth, etc. arguments, then replaces each local variable with the value of its "update expression" and repeats the whole process. For example:

(do ((x 1 (+ x 1)) (sum 0 (+ sum x)))
    ((> x 10) sum)
    (display x) 
    (display " "))
1 2 3 4 5 6 7 8 9 10
;Value: 55

This is equivalent to the Scheme code (do-it 1 0) where "do-it" is defined by

(define (do-it x sum)
    (cond ((> x 10) sum)
          (else (display x)
                (display " ")
                (do-it (+ x 1) (+ sum x)))))

If you prefer another language, it's also roughly equivalent to the C code

{ int x,sum;
  for (x=1, sum=0; x <= 10; sum += x++)
    printf ("%d ",x);
  return sum;
}

You are visitor number to this page since Feb. 4, 1997.

Last modified:

Stephen Bloch / sbloch@adelphi.edu

The Scheme language

Table of Contents

Under the hood of quote