Hi, I tried to reproduce the code from chapter 9 from the memory and noticed, that the calculation of the error in chapter 9 for batch gradient decent is different from the calculation in chapter 8.
In chapter 8 it is calculated as follows:
layer_2_delta = (labels[batch_start:batch_end]-layer_2)/batch_size
In chapter 9 it is calculated in the following way:
layer_2_delta = (labels[batch_start:batch_end]-layer_2) / (batch_size * layer_2.shape[0])
I was not able to find any explanation for the reasons, why delta calculation is different, but the results obtained are as well differ as well.
It would be good to get the clarification for this part of code
|