Data analysis

In this step you examine more closely what kinds of information the function operates upon, both its input and its output. You have several options:
Simple types
Numbers, booleans, strings, and symbols are all represented by built-in types in Scheme, so if each of your parameters, and your output, and any temporary value you need in between, is of one of these types, you need no further data analysis.
Several possibilities
If your input is to be a number, and some numbers are treated differently from others, you can define the different kinds of numbers. For example, a bank might pay no interest on accounts with a balance under $500.00, 3% interest on accounts from $500.00 to $999.99, and 4% interest on accounts larger than that; this problem effectively divides the set of numbers into three interestingly-different subsets: less than 500, at least 500 but less than 1000, and at least 1000.

Similarly, if your input or output is a symbol, and it'll always be one of five possibilities (say, 'A, 'B, 'C, 'D, 'F), this determines a data type very different from simply "symbol".

Mixed data
A variation on the above is mixed (or composite) data: one of your parameters, or your result, may be any one of several different types.

For example, a shape might be defined as either a square, a rectangle, or a circle; a list is either empty or a cons.

Scheme doesn't provide a way to explicitly define mixed data (Java, C++, and many other languages do), so we usually just write a comment describing the mixed data type:

;; A shape is either a square, a rectangle, or a circle.
Compound types
If one of your parameters, or your result, or a temporary value in between, is most naturally thought of as combining or comprising several simpler data, then you need a compound data type, or struct (in Scheme, C, and C++; record in Pascal.)

For example, a posn comprises a pair of numbers, while a student comprises a name (which may in turn comprise two strings or symbols), a student ID (represented as a symbol), and a GPA (represented as a number).

These are defined in Scheme using the define-struct function. Write down, in a comment, the names and intended types of each part of your compound data type, then write an appropriate define-struct, then write contracts for all the functions that are automatically defined by it:

;; A name is two symbols, personal and family.
(define-struct name (personal family))
;; make-name: symbol, symbol => name
;; name?: object => boolean
;; name-personal: name => symbol
;; name-family: name => symbol

;; A student is a name ("name"), a symbol ("ID"), and a number ("GPA").
(define-struct student (name ID GPA))
;; make-student: name, symbol, number => student
;; student?: object => boolean
;; student-name: student => name
;; student-ID: student => symbol
;; student-GPA: student => number
Note that the word "name" in the above serves both as the name of a data structure and as the name of a field in another data structure.
Self-referential data
Most self-referential (or "recursive") data structures are built by combining a mixed type with a compound type.

For example, a list is either empty or a cons. A cons comprises an object (the "first" element) and a list (the "rest").
Similarly, in a bottom-up family-tree data structure, a family-tree might be defined as either the symbol 'unknown or a person. A person comprises a symbol ("name"), another symbol ("eye-color"), a number ("birth-year"), and two family-trees ("mother" and "father").

Since self-referential data structures are built from compound and mixed types, they are written and documented by using both the "compound-data" and the "mixed-data" patterns above.


Last modified: Thu Jun 1 16:39:12 EDT 2000
Stephen Bloch / sbloch@adelphi.edu