The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

289572 (1) [Avatar] Offline
#1
I have problems to understand update_q() method of class QLearningDecisionPolicy in Listing 8.10. The method takes action variable but does not use it. The point of the function is to update the NN model. Intuitively one needs both state and action variables to calculate q-values (rewards). Now reward is given for a chosen action and NN is updated by calculating q-values of current action and results from next possible actions.

I would do optimization step such that rewards of all actions are calculated and this forms q-vector that NN is optimized against.

At least this needs some more explanation in my opinion if logic is correct.

This is interesting topic that I have not found much in other books!

br,

\leif
Nishant Shukla (52) [Avatar] Offline
#2
I agree the logic in the RL chapter needs to be clarified. I'm going to take a deeper look and fix it up by next week. It's definitely a neat and novel example, so it's worth my time doing it right! smilie