CSC 270
Homework 6
In this assignment, we'll implement a number of operations on
mathematical sets: membership, union, intersection, difference,
Cartesian product, etc.
C, C++, or Java version
What I want you to do
Define a data type StringSet
that represents a set of
strings. It should support the following operations:
- a way to build an empty StringSet
- a function that tells how many elements are in the StringSet
- a function that tells whether a given string is in the
StringSet
- a way to add a specified string to a StringSet, thus increasing its
size by one unless the string is already in the set.
(Alternatively, you could provide a way to build a StringSet containing
a single specified string.)
- a way to build, given a positive integer, a large StringSet
containing the numerals "1", "2", "3", ... up to the given integer.
- a way to compute the intersection of two StringSets (i.e. a set of
strings that are in both StringSets), without
modifying either of them
- a way to compute the union of two StringSets (i.e. a set of strings
that are in either of the StringSets), without modifying
either of them
- a way to compute the difference of two StringSets (i.e. a set of
strings that are in the first of the StringSets and not the second),
without modifying either of them
- a way to compute the Cartesian product of two StringSets (i.e. a
set of strings formed by combining a left parenthesis, a string from
the first set, a comma, a string from the second set, and a right
parenthesis), without modifying either of them. For example, the
Cartesian product of the sets {"ann", "bob", "charlie"} and {"light",
"dark"} would be {"(ann, light)", "(ann, dark)", "(bob, light)", "(bob,
dark)", "(charlie, light)", "(charlie, dark)"}.
There are a lot of possible implementations (listed
below); I want you to write
your class definition in such a way as to hide the
implementation, so the person using your class shouldn't need
to know whether you're using singly-linked lists, doubly-linked lists,
etc.
- Unsorted array of strings
- Simple but not terribly efficient
- Sorted array of strings
- More hassle to maintain, but much more efficient to search (e.g.
using binary search)
- Unsorted linked list of strings
- Sorted linked list of strings
- In a linked list, sorting isn't so helpful because you can't really
do a binary search on it efficiently anyway. Instead, you might
try...
- Binary search tree
- A binary tree in which each non-leaf node has a "key",
and all the items alphabetically less than the key are descended from
the left child, while all the items alphabetically greater than the key
are descended from the right child.
- Bit-vector
-
If there are 100 possible set elements, a set is represented by
an array of 100 booleans, each true if the corresponding element is in the
set and false if not. This implementation can be extremely efficient,
but it only works when you have a limited collection of possible elements,
e.g. single characters. In this implementation, you won't be able to do
the Cartesian product or the set of numerals up to a specified integer, but
you can do the rest of the operations.
- Hash table
- This is more hassle to implement, but it can be an efficient
way to handle sets whose "possible elements" don't
come from a nice limited collection. Java provides a built in HashTable
class, so you could start with that, and then perhaps try implementing
your own.
Scheme version:
In Scheme, it's easier to allow anything to be in the set,
and for Cartesian products to simply be two-element lists, or
cons-pairs, rather than strings with parentheses and commas in them.
Other than that, the task is basically the same. However, if you
restrict yourself to strings, you can investigate the various sorted
representations for efficiency.
Prolog version:
I would start by using Prolog's built-in list capabilities, storing
symbols and numbers rather than strings. You'll need a two-place
predicate contains
that succeeds iff the list which is its
first argument contains its second argument as an element; a three-place
predicate intersect
that unifies its third argument with
the intersection of its first and second; and likewise for
union
, difference
, and
cartesian_product
. I would use a two-place
pair
structure, or perhaps simply two-element lists, to
represent the ordered pairs in a Cartesian product. I think this would
be a fun assignment to do in Prolog.
If you wanted to make it more efficient, you could store things as a
binary search tree instead of an unsorted list.
Last modified:
Stephen Bloch / sbloch@adelphi.edu