First of all, amazing book! I've learned so much from it. There is one thing I'm having trouble with though.
On page 174, layer_2_delta is calculated as follows
layer_2_delta = (labels[batch_start:batch_end] - layer_2) / (batch_size * layer_2.shape[0])
I'm confused on why the division is done. Running the code without the division tells me that it provides some moderation to the weight updates, but I don't understand the intuition behind picking that exact value of batch_size * layer_2.shape[0].
Thanks!
|