Pipeline

Pipeline #

Stream computations are specified as a pipeline, which consists of:

  1. a source (e.g. a collection or an I/O channel),
  2. a sequence of zero or more intermediate operations, each of which transforms a stream into another stream (e.g. filtering or modifying the stream’s elements),
  3. one terminal operation (e.g. collecting the elements of the transformed stream).

Intermediate operations are generally specified via callback methods (the terminal operation may as well). More precisely, the Java interface Stream provides instance methods (like filter or map) that take a callback method as argument, and transform the stream accordingly.

Example #

Consider the following classes Unit, Butterfly, Caterpillar and Unicorn.

The pipeline below:

  • creates a stream out of a list of units,
  • retains only butterflies,
  • extracts the color of each butterfly,
  • collects these colors as a set.
List<Unit> units = getUnits();

Set<String> butterflyColors = units.stream()                        // create a stream
                              .filter(u -> u instanceof Butterfly)  // retain butterflies
                              .map(b -> b.color)                    // map each butterfly to its color
                              .collect(Collectors.toSet());         // collect the colors as a set

Explanation #

This pipeline can be decomposed as follows:

Source #

The instance method Collection.stream generates a stream out of a collection (e.g. out of a List or a Set):

units.stream()

In this example, because units has type List<Unit>, the stream that is returned by units.stream() has type Stream<Unit>.

Intermediate operations #

Filter #

.filter(u -> u instanceof Butterfly)

The instance method Stream.filter retains certain elements of the stream.

Let us assume that the stream has type Stream<$\mathit{T}$> (for instance, in our example, $\mathit{T}$ is Unit).

The method filter takes as argument a callback function of type

$\qquad \mathit{T} \to$ Boolean

(equivalently, in Java’s terminology, the callback function must implement the native functional interface Predicate<$\mathit{T}$>).

In this example, the callback function is

u -> u instanceof Butterfly

which has type

$\qquad$ Unit $\to$ Boolean

The method filter returns a Stream<$\mathit{T}$> (e.g. in this example a Stream<Unit>) that consists of the elements for which the callback method evaluates to true (in this case, it retains only butterflies).

Map #

.map(b -> b.color)

The instance method Stream.map applies a function to each element of the stream.

Let us assume that the stream has type Stream<$\mathit{T}$> (for instance, in our example, $\mathit{T}$ is Unit).

The method map takes as argument a callback function of type

$\qquad T \to T'$

where $T’$ can be any type (equivalently, in Java’s terminology, the callback function must implement the native functional interface Function<$T$,$T’$>, seen earlier).

In this example, the callback function is

b -> b.color

which has type

$\qquad$ Unit $\to$ String

Let us name this callback function $f$.

The method map returns a Stream<$\mathit{T’}$> (e.g. in this example a Stream<String>) that consists of all objects $f(a)$ such that $a$ belongs to the original stream.

Terminal operation #

.collect(Collectors.toSet());

The instance method Stream.collect takes as argument a so-called Collector, which is in charge of collecting the elements of the stream into a Collection (e.g. List or Set), or a Map, a String, etc.

In this example, we call the static method Collectors.toSet, which returns a collector that produces a set. Because the stream has type Stream<String>, the instruction .collect(Collectors.toSet()) returns a set with type Set<String>.

Reading a pipeline #

What do the following methods compute?

Set<Unit> method1(Set<Unit> set1, Set<Unit> set2) {
    return set1.stream()
            .filter(u -> set2.contains(u))
            .collect(Collectors.toSet());
}

Set<Unit> method2(Set<Unit> set1, Set<Unit> set2) {
    return set1.stream()
            .filter(set2::contains)
            .collect(Collectors.toSet());
}

Both methods compute the intersection of set1 and set2.

What do the methods method1 to method4 below compute?

Stream<Unit> method1(List<Unit> units) {
    return units.stream()
            .filter(u -> u.health > 0);
}

List<Unit> method2(Stream<Unit> stream) {
    return stream.map(u -> transform(u))
            .toList();   // collects the stream into a list
}

Unit transform(Unit u) {
    if (u instanceof Caterpillar) {
        return new Butterfly(u.color, u.health);
    }
    return u;
}

List<Unit> method3(List<Unit> units) {
    return method2(method1(units));
}

List<Unit> method4(List<Unit> units) {
    return units.stream()
            .filter(u -> u.health > 0)
            .map(u -> u instanceof Caterpillar ?
                    new Butterfly(u.color, u.health) :
                    u
            )
            .toList();
}