The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

627201 (9) [Avatar] Offline
#1
Hi,

for the calculation of layer_1 and layer 2 on page 190 the following code is used:

layer_1 = sigmoid(np.sum(weights_0_1[x],axis=0)) #embed + sigmoid
layer_2 = sigmoid(np.dot(layer_1,weights_1_2)) # linear + softmax
layer_2_delta = layer_2 - y # compare pred with truth
layer_1_delta = layer_2_delta.dot(weights_1_2.T) #backprop


Due to the usage of the non-linear activation, I would expect, that layer_1_delta would contain a sigmoid2deriv part, namely

def sigmoid2deriv(output):
    return output*(1-output)

layer_1_delta = layer_2_delta.dot(weights_1_2.T)*sigmoid2deriv(layer_1)



Using such a function in the backpropagation leads for me as well to higher performance on the test set on the same example

Kind Regards
Prabh (4) [Avatar] Offline
#2
I also agree with your point here. The author is clearly using a non-linear activation and so one needs to multiply the derivative of the activation with the normal layer_1_delta. I sure hope these issues get fixed in the final release.