The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

453132 (5) [Avatar] Offline
#1
P = {0: {0: [(0.3333333333333333, 0, 0.0, False),
 (0.3333333333333333, 0, 0.0, False),
 (0.3333333333333333, 4, 0.0, False)]


This is confusing me a lot. The first two tuples are the same
(0.3333333333333333, 0, 0.0, False),


Please explain the reason.

peb (3) [Avatar] Offline
#2
It is a probabilistic, so if action is LEFT (0) in state 0, then next state could be 0 with 66% prob. and 4 with 33% prob.

I admit that in the text so far to this point, it is unclear whether we are talking about the env only or a combined env+model. I had assumed env is deterministic, but looking ahead at policy, this suggests the env is probabilistic. That should be made more clear.

So it seems that the icy lake means actions might have uncertain effects. One might chose to go LEFT but end up going DOWN with 33% prob.

One would hope that the final text is updated so that the book can be read in one pass. Currently it seems to require two passes. I assume this is why feedback is sought here.

441935 (4) [Avatar] Offline
#3
I'm joining the plea of the previous readers , please update the text, as in one place it says that probability of the designated action is 66% and the remaining 33% each which is a little bit confusing?

Also, the diagram for the MDP for Frozen lake could be a bit clearer, there are lots of arrows with 33% without a clear explanation as to why?
Miguel Morales (15) [Avatar] Offline
#4
Okay, so each action has 3 different possibilities, 33% to the intended direction, 66% orthogonal: If your agent says "right" it will go right, up or down with 33% chance.

The reason why those two are the same is because the position of state 0 is the top left corner. Action 0 is "left". If you select left on the top left corner you will end up bouncing back to 0 either from the left, or from the top, or end up in state 4... all from the same "left" action!
Miguel Morales (15) [Avatar] Offline
#5
Nice feedback, guys. Yes, the environment (also known as MDP) can be stochastic... some are deterministic, but that's not always the case. Frozen Lake is highly stochastic. I should probably intro with a deterministic problem and then increase the complexity. Either way, I will make the improvements.
Miguel Morales (15) [Avatar] Offline
#6
These diagrams have received lots of similar feedback. I'll update them.