The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

627201 (9) [Avatar] Offline
Hi, I tried to reproduce the code from chapter 9 from the memory and noticed, that the calculation of the error in chapter 9 for batch gradient decent is different from the calculation in chapter 8.

In chapter 8 it is calculated as follows:

layer_2_delta = (labels[batch_start:batch_end]-layer_2)/batch_size

In chapter 9 it is calculated in the following way:

layer_2_delta = (labels[batch_start:batch_end]-layer_2) / (batch_size * layer_2.shape[0])

I was not able to find any explanation for the reasons, why delta calculation is different, but the results obtained are as well differ as well.
It would be good to get the clarification for this part of code

Prabh (4) [Avatar] Offline
I agree. Dividing the layer_2_delta by layer_2.shape[0] is not explained anywhere in the text. I think ultimately what is happening is that all of these eventually get absorbed in the alpha value. Take a look at alpha it is 2. Your overall point is correct though, this should be explained given that the book is supposed to be accessible to anyone.