Search

March 28, 2016

Creating a custom Collector

Creating a custom implementation of java.stream.Collector seems daunting at first, but once you give it a try, you'll see that it can actually be pretty easy.

If you are not used to using lambdas and functional concepts, your first look at the Collector interface will be intimidating. From a pre-Java 8 perspective, there are four interfaces to implement to create a custom Collector implementation: java.util.function.BiConsumer, java.util.function.BinaryOperator, java.util.function.Function and java.util.function.Supplier. Fortunately they are all functional interfaces which will allow us to take some shortcuts with lambdas and functional expressions.

The example I chose to implement is a Collector for SetValuedMap from Apache's Common Collections project. We'd like a static method similar to the standard Collectors.toMap() method which will generate a Collector instance yielding a SetValuedMap implementation.

There are a lot of moving parts to the Collector in terms of generics. First, we will have generic parameters for the type in the existing stream <T>, the type of the key in the map <K> and the type of the values in the map <V>. Then we need to identify the three generic parameters to the Collector interface: <T> is the same as before, the type of objects in the stream; <A>, the accumulation type will be SetValuedMap<K,V>; and <R>, the result type, will also be SetValuedMap<K,V>. It is frequently the case with Collectors that <A> and <R> are the same.

Now that we have our generic ducks in a row, we can start figuring out our Collector implementation. For the supplier, we can use a constructor of a SetValuedMap implementation, e.g. HashSetValuedHashMap::new.

The accumulator will be a lambda function taking in a map and a stream object, it will need to put the stream object in the map. For that we will need to pass in functions converting the stream objects to keys and values respectively (just like in Collectors.toMap()).

The combiner will need to accept two maps and return a map containing entries from both. Again this can be specified with a lambda function.

We don't need anything beyond the identity function for the finisher, which means we ready to create our Collector implementation using the Collector.of() factory method:

This method can be used to create a version of Collectors.groupingBy() which eliminates duplicates simply by passing Function.identity for the valueMapper:

If we wanted to get really fancy we could also pass in a Comparator to use with our set, but I will leave that as an exercise for the reader.

See the file MoreCollectorsTest.java for some examples of these methods in action.