In this project we'll develop an interpreter for a small piece of the C++ language. C++ is a huge language, so what we can handle in a few weeks will be a teeny tiny fraction, but it'll allow us to see how an interpreter can handle variables, type declarations, assignment, I/O, and sequence.
For every function you write, you must include a contract and a well-chosen set of test cases.
Start by downloading the following three files into the directory where you're working on this program:
read
function does for Scheme.reader.ss
unless you're
curious, but you should look at the first 25 lines (a bunch of comments
describing the output format) and the last 60 (a bunch of test cases
demonstrating how to use the reader). This file provides four functions
that you'll want to use:
read-string-as-expr
which takes in a string, hoping
it represents a C++ expression, and reads it into a nested list.read-input-as-expr
which takes in nothing and reads
from input, hoping it represents a C++ expression, into a nested list.read-string-as-stmt
which takes in a string, hoping
it represents a C++ statement, and reads it into a nested list.read-input-as-stmt
which takes in nothing and reads
from input, hoping it represents a C++ statement, into a nested list.(require "syntax.ss") ; to get the definition of syntax-tree (require "reader.ss") ; to get the definitions of the reading functionsRemember that in the PLAI language, you need to use
test
and test/exn
instead of check-expect
and
check-error
.
Write a function parse
that takes in a nested list
(as described in reader.ss
) and returns a syntax-tree
(as described in syntax.ss
). For example,
(check-expect (parse '(+ 3 (* 4 5))) (binop-expr '+ (num-lit 3) (binop-expr '* (num-lit 4) (num-lit 5)))) (check-expect (parse '(if (> (ident "x") 3) (= (ident "y") (+ (ident "x") 2)) (block (int "z" (+ (ident "x") 2)) (cout<< "Defined z to be ") (cout<< (ident "z"))))) (if-else-stmt (binop-expr '> (ident "x") (num-lit 3)) (assign-expr (ident "y") (binop-expr '+ (ident "x") (num-lit 2))) (block (list (decl-with-init 'int (ident "z") (binop-expr '+ (ident "x") 2)) (output-stmt "Defined z to be ") (output-stmt (ident "z"))))))The parser should catch as many user errors as possible, such as unrecognized types, mismatched types, etc: anything that gets past the parser should at least have a chance of working, depending on the values of variables at run-time. Think of it as like a compiler: it can and should catch things like assigning an int value to a string variable, but it can't catch division by zero because that depends on run-time values.
For this part of the assignment, you may assume there are no variables, and therefore no declaration statements, input statements, or assignment expressions. You may encounter the operators +, -, *, /, %, >, and == in any reasonable combination. You will not see the = (assignment) operator, because there are no variables to assign. Remember that the > and == operators in C++ return ints: 0 for false and 1 for true.
Write a function eval-expr
(similar to
calc
from project 1) that takes in an
abstract syntax tree representing an expression and returns a
Scheme object (or an error message) for the value. For now,
you may assume there are no strings, so everything is type
int
. Here are three sample test cases.
(check-expect (eval-expr (binop-expr '+ (num-lit 3) (binop-expr '* (num-lit 4) (num-lit 5))) 23) (check-expect (eval-expr (parse-expr '(+ 3 (* 4 5)))) 23) (check-expect (eval-expr (parse-expr (read-string-as-expr "3+4*5"))) 23)In the first example, we give
eval-expr
a syntax tree
constructed by hand, and it evaluates the expression to 23. parse-expr
on a nested-list
representation, such as might be produced by the reader; again, it
should evaluate to 23. read-string-as-expr
converts it to nested-list
form, parse-expr
converts it to syntax-tree form, and
eval-expr
evaluates it to 23.
string "3+4*5" and converts it into a syntax tree, which
eval-expr
then evaluates to 23.
Write a function exec-stmt
that takes in an
abstract syntax tree representing a statement and does what it
says. We still have no variables, and therefore no declaration
statements, input statements, or assignment expressions. And we
still have no strings, only ints.
However, you have to handle output statements, expression
statements (just evaluate the expression and throw away the
result), if
and if/else
statements,
and compound statements (aka blocks). Remember that in a C++
if
or if/else
statement, 0 is treated
as false, and anything else is treated as true.
You may need to use the built-in void
function,
which takes no arguments, does nothing, and returns nothing.
You can use (void)
as the answer to a
cond
clause when you don't want to do anything.
If you've gotten to this point successfully, save your
work as version1.ss
, then go on with version 2,
which adds...
Write a function expr-type
that takes in
a syntax-tree representing an expression (which may
include string literals) and returns either 'int
,
'string
to indicate the type of the expression,
or an error message to indicate that it has no type,
e.g.
"hello" > 4because the > operator doesn't work on a string and a number.
Note that the > and == operators should work on two strings, or on two ints, but not on one of each. Likewise, the + operator should work on two strings, or on two ints, but not on one of each.
Modify your eval-expr
and
exec-stmt
functions so they work correctly with
strings as well as numbers. (The > and == operators on
strings should do the appropriate string comparison; the +
operator should do string concatenation as in Java.)
Save your work as version2.ss
, then go on with
version 3, which adds...
Now we'll start adding variables to the language. First, we'll allow variables to appear in an expression. We still won't allow assignment or input statements yet.
Modify your eval-expr
and exec-stmt
functions so they take in an environment as an extra
parameter. The environment (which you need to define) specifies
what variables are defined, with their types and values.
Since we don't have declaration statements yet, you'll have to test
these functions by filling in environments yourself.
Modify your exec-stmt
function so it handles
declaration statements correctly. A declaration statement in C++,
like int x = 7;
,
is analogous to a with
-expression from Project 1;
its body is the rest of the block in which the declaration
statement appears. But as with the "multiple-branch with
"
in Project 1, you can't have the same variable declared twice in
the same block. However, you can have the same variable
declared twice in different blocks, even if one is inside the other:
int x = 7; if (x > 4) { int x = 6; cout << "x is "; cout << x; } else { int x = 5; cout << "x is "; cout << x; } cout << "\nAfter the if/else, x is "; cout << x;is perfectly legal C++, and your program should handle it correctly: it should output "x is 6" on one line, and on the next, "After the if/else, x is 7".
C++ also allows variables to be declared without an initial
value, e.g. int xyz;
;
the actual value is unpredictable (unlike in Java,
which guarantees the "default" value). To simulate this,
your program should initialize such variables to a random
integer (for ints) or a gibberish string (for strings).
Hint: You'll probably need to change the contract for
exec-stmt
so it not only takes in an environment, but also
returns an environment.
Save your work as version3.ss
. Version 4
adds assignment and input to the language.
Modify your program so that variable names and types are still kept in an environment, but variable values are kept in a separate data structure called a store (as described in PLAI). Declaration statements may change the environment (by adding new variable bindings), and they can change the store (by filling in initial values, e.g. "int x = 7;"); assignment expressions can change the store but they cannot change the environment. Eventually (see References, below) it'll be possible to have two variable names refer to the same location in the store.
Remember that in C++ (and in C and Java, for that matter), assignment is an expression, which can appear anywhere that any other expression can appear, e.g.
if (x = 3) { y = (x = 2) * 5; } else { cout << y = x = x + 1; }
Input statements (of the form "cin >> x;") should also change the store but not the environment.
Hint: some functions will need to return both a value and a store, or both an environment and a store. There is a way for a Scheme function to return two things, and if you wish you can look it up and try it; if not, define a struct with two fields, and return an instance of this struct.
Modify your program to handle reference-variables, which behave as follows:
int x = 7; int &y = x; // y refers to the same memory location as x x = 12; cout << y; // should output 12
Note: I modified the reader and syntax files
on Oct. 12 to handle this syntax; to get this working, you'll
need to download the new versions of
reader.ss
and
syntax.ss
.
You may want to put the parser in its own file,
parser.ss
. Remember, the parser should catch as many user
errors as possible:
anything that gets past the parser should at least have a chance of
working, depending on the values of variables at run-time.
As mentioned above, I recommend doing this assignment in discrete versions: version 1 handles expressions and statements, but only one data type, and no variables; version 2 handles data types, but still no variables; version 3 handles variables, but still no assignment or input; version 4 handles assignment and input; and version 5 handles reference variables.
It is quite possible that you won't get all 5 versions working. If not, turn in as much as you've got, with an explanation of where you got stuck and what you've tried.