Complete Guide to Collector Interface

1. Introduction

Collector is a mutable reduction operation. After processing the data using Stream, we need a result of this processing. 

We use this operation when we want to collect data into a container like Collection, grouping data, partitioning data, summarizing data (such as sum, min, max, average) and several others.

Below is the example on how to collect results to List.

High-level view of Collector.

The intent of this article is to understand the Collector interface and the functionality that it provides. We will take an example of collecting data into ArrayList.

2. Content

Collector<T, A, R> interface comprises 3 type parameters i.e. T, A and R.

T: Represents the input type of the data that the container will hold, for example: Integer

R: Represents the container, for example: List<Integer>

A: Represents the accumulation type of the container. This is an implementation detail that connects the input T and result R.

Class would look like this

public class 
ArrayListCollectorImpl<T> implements Collector<T, List<T>, List<T>>

T: Represents any input type T

R: Represents result type, i.e. List<T>

A: Represents accumulation type, i.e. List<T>

3. Supplier<A> supplier()

supplier() method creates a container to store the result. 

Supplier returns a Mutable Result Container. A Container that can hold value/values.

In our use-case, we want to collect the data elements of Stream in ArrayList. So the supplier method would look like this 

public Supplier<List<T>> supplier() {
	return new Supplier<List<T>>() {
		@Override
		public List<T> get() {
			return new ArrayList<>();
		}
	};
}

We can use a lambda expression too:

public Supplier<List<T>> supplier() {
	return () -> new ArrayList<>();
}

Best solution would be to use method reference. It looks elegant and conveys its intent.

public Supplier<List<T>> supplier() {
	 return ArrayList::new;
}

6. BiConsumer<A, T> accumulator()

accumulator() method is used to insert a single data element into a result container. 

BiConsumer is used to insert a value into a Mutable Result Container.

In our use-case we can achieve this by adding a value to List using List’s add(e) method.

public BiConsumer<List<T>, T> accumulator() {
	return new BiConsumer<List<T>, T>() {
		@Override
		public void accept(List<T> t, T u) {
			t.add(u);
		}
	};
}

We can achieve same using lambda expression:

public BiConsumer<List<T>, T> accumulator() {
	return (list, val) -> list.add(val);
}

7. BinaryOperator<A> combiner()

combiner() method is interesting. Because it used to combine two partial results into one. Why is that? In case of parallel execution of Stream, there can be multiple accumulators accumulating the result elements. 

BinaryOperator is used to merge two Mutable Result Container into one.

Think of this as a merge operation of merge sort. In merge operation we have to merge two arrays into one. This is the same as a combiner. It can fold multiple different accumulators into one.

This is how you can implemented combiner:

public BinaryOperator<List<T>> combiner() {
	return new BinaryOperator<List<T>>() {
		@Override
		public List<T> apply(List<T> left, List<T> right) {
			left.addAll(right);
			return left;
		}
	};
}

We can achieve same using lambda expression: 

public BinaryOperator<List<T>> combiner() {
	return (left, right) -> {
		left.addAll(right);
		return left;
	};
}

8. Function<A, R> finisher()

finisher() method is optional to implement. We can use it for final transformation on the result container.

For now, we do not want to do any final transformation of the result, hence we can just return Function.identity(). Function.identity() accepts and returns the same argument.

public Function<List<T>, List<T>> finisher() {
	return Function.identity();
}

9. Set<Characteristics> characteristics()

The Collector interface also hosts some Collector Characteristics. They are:

  1. CONCURRENT: Signifies that the collector is going to be concurrent. This means that mutable containers can accept elements from multiple threads. We can set this characteristic for ConcurrentHashMap collectors.
  1. UNORDERED: Signifies that the order of elements inserted in this mutable container does not preserve element encounter order. We can set this characteristic for HashSet collectors.
  1. IDENTITY_FINISH: Signifies that the finisher() method will return identity. So, the unchecked cast from A to R must succeed. For our use-case it will succeed as our container, i.e. A is List<> and result, i.e. R is also List<>.  
public Set<Characteristics> characteristics() {
	return EnumSet.of(Characteristics.IDENTITY_FINISH);
}

10. public static<T, R> Collector<T, R, R> of(Supplier<R> supplier,
BiConsumer<R, T> accumulator,
BinaryOperator<R> combiner,
Characteristics… characteristics)

of(..) is a static factory method provided for creating custom collectors without a finisher() function.

Let us use it with our implemented collector methods.

Supplier<List<Integer>> supplier = ArrayList::new;

BiConsumer<List<Integer>, Integer> accumulator 
			= (list, val) -> list.add(val);

BinaryOperator<List<Integer>> combiner = (left, right) -> {
	left.addAll(right);
	return left;
};

Characteristics characteristics = Characteristics.IDENTITY_FINISH;

Collector<Integer, List<Integer>, List<Integer>> toList = 
		Collector.of(supplier, 
					accumulator, 
					combiner, 
					characteristics);

List<Integer> result = Stream.of(1, 2, 3, 4, 5).collect(toList);

Assert.assertEquals(5, result.size());

11. public static <T, A, R> Collector<T, A, R> of(Supplier<A> supplier,
BiConsumer<A, T> accumulator,
BinaryOperator<A> combiner,
Function<A, R> finisher,
Characteristics… characteristics)

of(..) is a static factory method provided for creating custom collectors with a finisher() function.

Supplier<List<Integer>> supplier = ArrayList::new;

BiConsumer<List<Integer>, Integer> accumulator 
	= (list, val) -> list.add(val);

BinaryOperator<List<Integer>> combiner = (left, right) -> {
	left.addAll(right);
	return left;
};

Function<List<Integer>, List<Integer>> finisher = Function.identity();

Characteristics characteristics = Characteristics.IDENTITY_FINISH;

Collector<Integer, List<Integer>, List<Integer>> toList = 
		Collector.of(supplier, 
					accumulator, 
					combiner, 
					finisher, 
					characteristics);

List<Integer> result = Stream.of(1, 2, 3, 4, 5).collect(toList);

Assert.assertEquals(5, result.size());

12. Conclusion

The design of Collector is powerful and elegant. With a handful of methods, it can do a lot. We took an example of designing ArrayList collector. The Collector class already has ArrayList collector implemented.

List<Transaction> dataSet = Transactions.getDataSet();
List<Transaction> result = 
				dataSet
					.stream()
					.filter(txn -> txn.country() == CountryCode.US)
					.collect(Collectors.toList());

Assert.assertFalse(result.isEmpty());

As of Java 14(Zulu 14.28.21), there are 44 collectors implemented in Collectors class. These Collectors are public static.

In the next article we will discuss how to collect data into collections like List(ArrayList), Set(HashSet), Map(HashMap, ConcurrentHashMap, ConcurrentSkipListMap) and in other collections(TreeSet, PriorityQueue).

Leave a Reply

Your email address will not be published. Required fields are marked *