The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

Shay8 (2) [Avatar] Offline
#1
Hi,

Although I haven't finished reading the book—I've just finished chapter 6—so far, I've been enjoying it. I've found the book to be easy and fun to read, and it provides thorough and clear explanations.

I would, however, like to point out an error in the book in the way Collector.Characteristics.CONCURRENT is explained and in the way it is used.

It says on page 149, 6.5.2:

“CONCURRENT —The accumulator function can be called concurrently from multiple threads, and then this collector can perform a parallel reduction of the stream. If the collector isn’t also flagged as UNORDERED , it can perform a parallel reduction only when it’s applied to an unordered data source.”

Later on same page it says:

“Finally, it’s CONCURRENT , but following what we just said, the stream will be processed in parallel only if its underlying data source is unordered.”

But following Oracle's steam package API documentation, the meaning of “concurrent” is not a synonym for “parallel”, but is tied to “concurrent reduction” in this context.

Below is the API documentation regarding Collector.Characteristics.CONCURRENT:

“Indicates that this collector is concurrent, meaning that the result container can support the accumulator function being called concurrently with the same result container from multiple threads.
If a CONCURRENT collector is not also UNORDERED, then it should only be evaluated concurrently if applied to an unordered data source.”

From what I understand, a Collector implementation that has this characteristic actually tells the user implementation (Stream.collect(Collector)) that instead of creating a new result container confined to each thread and merging them later in another thread (in the case of parallel stream), it can use the same result container for the whole reduction process. Having this characteristic in the Collector is a promise that the container supplied by the supplier() is concurrently modifiable (usually one of the package java.util.concurrent).

Following this, it seems like the example code of ToListCollector in the book is broken, because the implementation of characteristics() returns CONCURRENT, while instance returned by the supplier() is of java.util.ArrayList. If the stream is parallel and the source of the stream is unordered then concurrent reduction will be performed:
“The Stream.collect(Collector) implementation will only perform a concurrent reduction if
The stream is parallel; The collector has the Collector.Characteristics.CONCURRENT characteristic, and; Either the stream is unordered, or the collector has the Collector.Characteristics.UNORDERED characteristic.”

The statement that I quoted above from the book, “Finally, it’s CONCURRENT , but following what we just said, the stream will be processed in parallel only if its underlying data source is unordered.”, is not correct because the stream is processed in parallel if it is a parallel stream, and not only in the case of an unordered source/collector. In the stream package doc they say explicitly in the context of ordering: “However, most stream pipelines, such as the 'sum of weight of blocks' example above, still parallelize efficiently even under ordering constraints”.

See:
https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.Characteristics.html
https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html

Thanks,
Shay