501466 (2) [Avatar] Offline
#1
Having read chapters 4 and 5, I have a number of remarks. Not all are meant as a suggestion for the author to change the text - if one were to include all the aspects of all the features discussed in the chapters, the text would most most probably become unreadable.


General remark:

Be a bit more sparing with the exclamation marks.


Page 92, top: "comma separated variable" should be "comma separated value"


Page 96: Listing 4.4 - perhaps add "integer, parameter :: im = ..."

Strictly speaking such arrays are not static arrays. They have a
statically defined size, but unless they have the implicit or explicit
SAVE attribute, they will automatically appear and disappear with the
entering and leaving of the subprogram/scope (BLOCK!) that contains
them.


Page 99: The remark on implicit save

You may emphasize that this is a fundamental difference with a language
like C where this syntactically identical construct is just syntactic
sugar, a shortcut.


Page 100: recommendation

While I would indeed avoid SAVEd variables as much as possible, I would
recommend that a programmer use the SAVE attribute explicitly. That
message is, I think, contained in the text, but in a rather obscure way.

&#pi shoud become the regular Greek letter of course (two occurrences
on this page).


Page 105: deallocate an uninitialized array

One possibility might be that you gain information that the array has
the wrong size before you actually come to use it. It is a trifle
construed, I admit.


Page 106: frequent reallocation

Reallocating a large array may not be all that time-consuming, but I can
imagine the copying of the contents is - and then there is the matter of
memory fragmentation. Most of the time completely out of control of an
ordinary programmer, but still it might be of concern.


Page 108: Style Tip

One advantage of NOT using stat and errmsg is that if things go wrong
the runtime library will print an error message and you do not have to
program anything yourself. What the tip is lacking is that you need to
handle the error as well, not just put the keywords in.


Page 109: Listing 4.22

Perhaps a matter of style conform the tip on page 108: use "status = 'old'" and
handle accordingly?

Page 109: first paragraph

I understand that you skip a detailed discussion of I/O, but it is an
important aspect in general. And with parallel programming yet another
concern for race conditions smilie.

Actually, I recently ran into some complications with reading CSV files.
I use list-directed input but there are some vatiations between
compilers of which I was not aware and the CSV files themselves were not
entirely what I expected. Well, there are variations in the way CSV
files are formatted and a poor programmer like me has in general very
little control over that.


Page 129: names like Github and git

Maybe it is a good idea to use italics for such names? Then they stand
out as not being an ordinary word, especially "git".


Page 141: synchronisation

Such synchronisation means that ALL images should perform the
allocation/deallocation. A code fragment like:

if ( this_image() /= 1 ) then
allocate( work(1000)[*] ! Not on the main image, it does not
! need a work array, so do not waste
! memory
endif

would cause a dead-lock, right?


Page 143: figure 5.8

The loop over the images has me puzzled - is it really useful? After
all, the loop is already run on all images.


Page 144: last full paragraph

"Hopefully, we should be able" -> "Hopefully, we will be able"


Page 148: code fragment

The calculation of the tile size is correct only if the size "im" of the
problem is a multiple of the number of images. That implicit assumption
makes the code a lot easier, but I recommend it be made explicit in the
text. (In my experience this type of conditions seldom or never holds.)


Page 149: Figure 5.10 should be Figure 5.11


Page 150: Figure 5.11

Just a remark: the implementation of the serial algorithm now makes it
necessary to do multiple synchronisation steps. An alternative could
have been to calculate the entire new state (vector u and h) on the
basis of the old state. Then you would need only a single
synchronisation step.

Of course, that is a significant change wrt the numerical method. I am
not sure of the consequences.


Page 152: "produces bit-for-bit the same results"

This is in general not guaranteed. The underlying problem, of course, is
that floating-point arithmetic is not mathematically exact and actually
lacks strict associativity. Alas, as it is the cause for much a heated
discussion.
Milan Curcic (34) [Avatar] Offline
#2
Hi Arjen,

Thank you very much for your detailed suggestions. My comments below:


Page 92, top: "comma separated variable" should be "comma separated value"
Page 144: last full paragraph "Hopefully, we should be able" -> "Hopefully, we will be able"
Page 149: Figure 5.10 should be Figure 5.11


Fixed, thank you!


Strictly speaking such arrays are not static arrays. They have a
statically defined size, but unless they have the implicit or explicit
SAVE attribute, they will automatically appear and disappear with the
entering and leaving of the subprogram/scope (BLOCK!) that contains
them.


By static I mean that once declared, the array won't change in size until it is destroyed (out of scope).
They will also be allocated on the stack by most compilers that I know unless instructed otherwise,
though not dictated by the standard, so I try to avoid the discussion of stack vs. heap in the book.
If the compiler can determine the size at compile-time (for example, `integer, parameter :: im = 100; real :: x(im)`),
it will be a static array by all means (please let me know if I'm wrong!). There are cases in which this is not
true, for example determining `im` at run-time (user input) but passing it to a subroutine that declares a "static" array
of size `im`. I think covering these situations may be too much at this point in the book, but I may revisit this
at a later time if an example calls for it. Thank you for pointing this out, it's an important nuance.


Page 105: deallocate an uninitialized array

One possibility might be that you gain information that the array has
the wrong size before you actually come to use it. It is a trifle
construed, I admit.


Good point, I added that as an example.


Page 108: Style Tip

One advantage of NOT using stat and errmsg is that if things go wrong
the runtime library will print an error message and you do not have to
program anything yourself. What the tip is lacking is that you need to
handle the error as well, not just put the keywords in.


Here I argue that if you care about having this kind of control, you should use these. Resorting to runtime library dealing with this is the easiest, but gives no control. But I did miss the part that just including the keywords is not enough. I clarify this in the Style Tip:

.Style Tip
****
If you want control over what happens if (de)allocation fails,
use `stat` and `errmsg` in your `allocate` and `deallocate`
statements to catch any errors that may come up. Of course, you'll
still need to tell the program what to do if an error occurs,
for example, stop the program with a custom message, print a warning
message and continue running, or try to recover is some other way.
****


Page 109: Listing 4.22

Perhaps a matter of style conform the tip on page 108: use "status = 'old'" and
handle accordingly?


It's the style I like, but I here chose to be less explicit for simplicity and defer the explanation until later. We'll go over these in detail in Chapter 8.


Page 129: names like Github and git

Maybe it is a good idea to use italics for such names? Then they stand
out as not being an ordinary word, especially "git".


I will check with my editors how they prefer to do this. Git and Github may be common nouns nowadays.



Page 141: synchronisation

Such synchronisation means that ALL images should perform the
allocation/deallocation. A code fragment like:

if ( this_image() /= 1 ) then
allocate( work(1000)[*] ! Not on the main image, it does not
! need a work array, so do not waste
! memory
endif

would cause a dead-lock, right?


Yes, it will. A short program to try it:

$ cat sync_on_alloc.f90 
program test
  implicit none
  integer, allocatable :: work(:)[:]
  if (this_image() == 1) allocate( work(1000)[*])
  print *, this_image(), allocated(work)
end program test
$ caf sync_on_alloc.f90 
$ cafrun -n 4 ./a.out 
           2 F
           3 F
           4 F



Images 2, 3, 4 do not enter the if-block and don't allocate the coarray. The image 1 enters the if-block and hangs because the allocation of coarrays requires all images to participate. This program hangs indefinitely.


Page 143: figure 5.8

The loop over the images has me puzzled - is it really useful? After
all, the loop is already run on all images.


Now that you say this, this is confusing to me as well! There is no significance for the loop to go over the images. I will change it to something less meaningful.


Page 148: code fragment

The calculation of the tile size is correct only if the size "im" of the
problem is a multiple of the number of images. That implicit assumption
makes the code a lot easier, but I recommend it be made explicit in the
text. (In my experience this type of conditions seldom or never holds.)


You're right, I will make this more clear and explicit in the text.


Page 150: Figure 5.11

Just a remark: the implementation of the serial algorithm now makes it
necessary to do multiple synchronisation steps. An alternative could
have been to calculate the entire new state (vector u and h) on the
basis of the old state. Then you would need only a single
synchronisation step.

Of course, that is a significant change wrt the numerical method. I am
not sure of the consequences.


Good catch, and indeed this is a specific numerical scheme (forward-backward -- forward for momentum, backward for mass)
that does have a higher CFL limit in the stability analysis for the linearized system. That is not the reason I picked it though,
but to be able to reuse the arrays for u and h and not have separate arrays for old and new state. While the amount of code
is not significantly different, I try to trim as much as possible to maintain simplicity and focus on Fortran aspects of the problem.


Page 152: "produces bit-for-bit the same results"

This is in general not guaranteed. The underlying problem, of course, is
that floating-point arithmetic is not mathematically exact and actually
lacks strict associativity. Alas, as it is the cause for much a heated
discussion.


This is an important point for a book on parallel programming, and I'm sure I will address it a bit later.
I agree that this is not guaranteed in general, although it is true in this case. Operations like collective sum
that don't guarantee an order of operations would definitely break bit-for-bit reproducibility.

Cheers!
milan
501466 (2) [Avatar] Offline
#3
It is easy to come up with all manner of details. It is much less easy to include only those details that really matter. After all:
In der Beschraenkung zeigt sich der Meister.