Campbell Ritchie (59) [Avatar] Offline
version4, page 155.
I found your example very interesting, showing how different attempts at parallelisation work or don't work. I didn't use the Maven plugin but wrote a method running the different methods for summating 1...10000000. I had difficulty getting any of the Streams to run faster than the loop. Maybe that varies from machine to machine.
What intrigues me is that your harness automatically calls automatically calls System.gc(), so I tried that and it took 130ms for the loop (all techniques 20×) and something around 7″ for iterate. But I had System.gc() amongst those 20×; without garbage collection it was slightly faster, an effect more pronounced when I tried the without gc() version first.
Is this to do with garbage collection? Is GC necessary at all? Are we actually hampering execution with GC? Are the Integer objects created on the heap, or does escape analysis allow them to be created on the stack, so the memory is repeatedly overwritten and there is nothing for the GC to do?
I tried the accumulators you suggested. A Stream<Integer> with iterate and parallel was never as fast as 1½ minutes. And the overhead from putting a ReentrantLock onto my accumulator was also pretty heavy.
A loop took 131 ms to add to 50000005000000 20×
Sequential iterate took 3972 ms to add to 50000005000000 20× without gc()
Sequential iterate took 7796 ms to add to 50000005000000 20× . . .

Campbell Ritchie (59) [Avatar] Offline
Did I say Integer when I meant Long?