Last updated on 24 September 2018

Posted by YesDay on 24 September 2018 Tags: java gotchas sortedset

1. SortedSet fails to obey the general contract of the Set interface

Consider the below code which creates a SortedSet using a comparator based on string length

public class SortedSetTest {
    public static void main(String[] args) {
        SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length));
        sortedSet.addAll(Set.of("aa", "bb"));
        System.out.println(sortedSet);
    }
}

The output of the above is

[aa]

While I would expect

[aa, bb]

or

[bb, aa]

The bb element disappears, breaking the Set contract. The comparator is supposed to only sort the elements and not distinguish them from one another, which is what equals does in all the collections.

On the other hand, if I enhance the comparator to always return non-zero for unequal items like below, only then do I get the correct results.

public class SortedSetTest {
    public static void main(String[] args) {
        SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length)
            .thenComparing(String::toString));
        sortedSet.addAll(Set.of("aa", "bb"));
        System.out.println(sortedSet);
    }
}

The output now is [aa, bb] as I would expect.

The question is, why does SortedSet ignore the equals method in first place and removes an unequal object from the set?

The comparator method inside the SortedSet interface is documented as follows:

Returns the comparator used to order the elements in this set, or null if this set uses the natural ordering of its elements.

The above indicates that the comparator is only used to order the elements in the set and not to distinguish them from one another, which is what equals is for in all the collections.

Digging into the javadoc further, it turns out that the above javadoc comment is incorrect, because there is another side note that contradicts it:

Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

Note that, by definition, consistent with equals means:

The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.

2. How the SortedSet could be fixed in a future version of Java

The behaviour described in the previous section is clearly a very poor design choice of the SortedSet interface and along with the inadequate documentation the issue becomes even worse. I hope that the issue will be fixed in a future version of Java.

In the meantime, here are some hints on how the issue could be addressed from both the javadoc and the implementation perspective of the SortedSet:

  • Fix the javadoc

  • Default equals inside SortedSet

  • Fallback to default comparator inside implementation classes

The following sections describe the above in more detail.

2.1. Fix the javadoc

The most obvious improvement is to update the relevant javadoc documentation. In particular, the comparator method inside the SortedSet interface could be documented as follows in order to prevent misuse:

Returns the comparator used to order the elements in this set, or null if this set uses the natural ordering of its elements. Note that the equals method relies on this comparator so as e1.equals(e2) is actually calling c.compare(e1, e2)==0. Refer to the class javadoc for more details in regards to the consistency of the ordering with the equals method.

The above would be very helpful because the comparator in all the rest of interfaces does not replace the equals method and users naturally expect it to only be used for sorting. For example, in the HashSet you can store two objects whose compareTo returns 0 and hashCode returns the same value, while the equals determines that they are unequal.

2.2. Default equals inside SortedSet

Another improvement would be to implement a default equals method inside the SortedSet interface:

@Override
default boolean equals(Object o) {
    return comparator().compareTo(o) == 0;
}

The above would make it much clearer for the user that the comparator can possibly break the Set contract.

2.3. Fallback to default comparator inside implementation classes

While the previous two improvements don’t actually fix the issue, here is how the issue could be fixed for good. The implementation classes of the SortedSet (like TreeSet) could automatically fallback to the default comparator when the custom comparator’s compareTo returns 0, or simply call equals (just like the HashSet does), or at the worst throw an IllegalStateException with a message that the comparator is not consistent with equals. Alternatively, making any of the above configurable during the instantiation of the implementation class (like TreeSet) would also at least provide some reasonable precaution. I believe the usability gain in this case would outweighs by far the slight performance overhead introduced by the fallback comparator.