The HashSet
class is guaranteed to act like a "set", in
that it can't contain multiple copies of the same thing... but it's not
always clear what constitutes "the same thing." By default,
HashSet
uses "intensional equality": two things are "the
same" if they're at the same location in the computer's memory. This is
different from the "extensional equality" that we've been using for most
of the semester, and that makes more sense for Strings, StringLists, and
StringSets: two Strings are "the same" if they contain the same sequence
of characters, two StringLists are "the same" if they contain the same
sequence of strings, and two StringSets are "the same" if every String
that's in one is also in the other.
Consider the following test case:
StringList steveNDeb = new StringList().addAtEnd("Steve").addAtEnd("Deborah"); StringList again = new StringList().addAtEnd("Steve").addAtEnd("Deborah"); HashSetmySet = new HashSet (); t.checkExpect (mySet.add(steveNDeb), true); // should work t.checkExpect (mySet.add(again), false); // may fail t.checkExpect (mySet.size(), 1); // may fail
HashSet
doesn't recognize that steveNDeb
and
again
are "the same", so it's perfectly happy to add both
into the "set".
I know of two ways to fix this:
TreeSet
instead of a
HashSet
. Unfortunately,
StringList
doesn't have a built-in ordering, so if you just
try to create a TreeSet<StringList>
you'll get an
error message; you need to write a class (named something like
LexOrderStringLists
) that
implements Comparator<StringList>
, create an instance
of this class, and pass it into the
TreeSet<StringList>
constructor, as seen in the Gee
book (pages 142-150) or in class on April 23.Write your own hashCode
and
equals
methods for the StringList
class. The
Gee book discusses this briefly on pages 115-117, but doesn't really
explain how to write these methods for non-trivial classes.
The equals
method needs to confirm (using
instanceof
) that the parameter is the right type,
then cast it to that type, then confirm that all the important
parts of this
are equal to the corresponding
parts of the parameter. For example,
class Posn { private int x, y; ... public boolean equals (Object other) { if (other instanceof Posn) { Posn otherPosn = (Posn)other; return (this.x == otherPosn.x) && (this.y == otherPosn.y); } else { return false; // if the other thing isn't a Posn at all, it // certainly isn't the same as this Posn } } }
The hashCode
method is generally
written by calling hashCode
on all the important parts and
combining the resulting integers somehow into one integer. (You can't
call hashCode
on int
or double
;
think of these numbers as their own hash codes.)
The simplest way is to just add them up:
class Employee { private String name, ssn; private int salary; ... public int hashCode () { return this.name.hashCode() + this.ssn.hashCode() + salary; } }This way two Employees with different names or different ssn's or different salaries will almost certainly have different hash codes. Now imagine that one Employee's name were the same as another's ssn, and vice versa; since addition is commutative, they would end up with the same hash code, which makes the
HashSet
class work less well. To fix this, we could say
return 2 * this.name.hashCode() + this.ssn.hashCode() + salary;instead.
Of course, no employee is likely to have a name that looks like a
Social Security number. But you're assigned to do this for
StringList
, and it's quite possible that two
StringList
s could have the same strings in different
orders (think debNSteve
and steveNDeb
);
again, since addition is commutative, they would end up with the
same hash code. Figure out how to solve this.