57334 (1) [Avatar] Offline
#1
page 139
action_q_vals[0, next_action_idx] = reward + self.gamma * next_action_q_vals[0,
next_action_idx]

I think action_q_vals[0, next_action_idx] should be action_q_vals[0,current_action_idx] ???

Thanks
367062 (2) [Avatar] Offline
#2
I also wonder if this is right an makes sense (see my topic: https://forums.manning.com/posts/list/41375.page)