The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

308374 (11) [Avatar] Offline
#1
I am wondering what exactly we are trying to show by plotting the average values of rewards. In the Bandit.py we picked 3 epsilons and I see that epsilon 0.1 and 0.01 are overlapping after 1000 pulls. What is the conclusion by looking at this plot?
Thanks
Phil Tabor (9) [Avatar] Offline
#2
308374 wrote:I am wondering what exactly we are trying to show by plotting the average values of rewards. In the Bandit.py we picked 3 epsilons and I see that epsilon 0.1 and 0.01 are overlapping after 1000 pulls. What is the conclusion by looking at this plot?
Thanks


Good question. The point of trying 3 different epsilons is so that you can see the dependence of the average reward on the exploration factor epsilon.

The overlap just means that in the long run, the delta in performance between the 0.1 and 0.01 learning rate disappears. (note, I'm going from memory here. I don't have the plot in front of me. I assume by overlap you mean that they have the same average reward for a given pull).