pminten (16) [Avatar] Offline
#1
The error handling chapter contains a lot of really good information, but I'm missing a bit of background on how to think about error handling. Error handling, especially at a distributed systems scale, requires a different mindset. You describe the let-it-fail thinking but fault-tolerancy goes beyond that and I don't really have the feeling the book gives me a feel for what fault-tolerancy looks like.

For example there's no discussion of graceful degradation, but that's a key aspect of fault-tolerancy (together with self-repair). I'm also missing a mindset of "errors are not exceptional", that is, errors are not only something that can occur, errors are something that will occur often. If you have a distributed system of a 100 nodes there will be plenty of broken hard drives.

Let-it-fail / intentional programming does not mean having to worry less about errors, it means thinking about errors in a different way. Instead of focussing on error handling the focus shifts to recovery.

I guess what I'm trying to say is, I miss a bit of background on the kind of thinking that leads to the error handling mechanisms of Erlang and that is useful to use them optimally. There are bits and pieces of this thinking spread around the chapter but I'm missing them put together in a "big picture".
sjuric (86) [Avatar] Offline
#2
Re: Fault-tolerancy background
Are you referring to chapter 7, or both chapters 7 and 8?

Chapter 8 treats fault-tolerance in more details, though I admit I didn't make it as explicit as you nicely put it that we should focus on recovery. A paragraph or two should probably be added there.
pminten (16) [Avatar] Offline
#3
Re: Fault-tolerancy background
More to chapter 8 I guess. Chapter 7 is more of a technical basics of error handling chapter while chapter 8 shows how to use supervisors in practice.
sjuric (86) [Avatar] Offline
#4
Re: Fault-tolerancy background
OK, thanks for pointing it out. I'll consider adding some background.