453132 (5) [Avatar] Offline
P = {0: {0: [(0.3333333333333333, 0, 0.0, False),
 (0.3333333333333333, 0, 0.0, False),
 (0.3333333333333333, 4, 0.0, False)]

This is confusing me a lot. The first two tuples are the same
(0.3333333333333333, 0, 0.0, False),

Please explain the reason.

peb (3) [Avatar] Offline
It is a probabilistic, so if action is LEFT (0) in state 0, then next state could be 0 with 66% prob. and 4 with 33% prob.

I admit that in the text so far to this point, it is unclear whether we are talking about the env only or a combined env+model. I had assumed env is deterministic, but looking ahead at policy, this suggests the env is probabilistic. That should be made more clear.

So it seems that the icy lake means actions might have uncertain effects. One might chose to go LEFT but end up going DOWN with 33% prob.

One would hope that the final text is updated so that the book can be read in one pass. Currently it seems to require two passes. I assume this is why feedback is sought here.

441935 (4) [Avatar] Offline
I'm joining the plea of the previous readers , please update the text, as in one place it says that probability of the designated action is 66% and the remaining 33% each which is a little bit confusing?

Also, the diagram for the MDP for Frozen lake could be a bit clearer, there are lots of arrows with 33% without a clear explanation as to why?