for the calculation of layer_1 and layer 2 on page 190 the following code is used:

layer_1 = sigmoid(np.sum(weights_0_1[x],axis=0)) #embed + sigmoid
layer_2 = sigmoid(,weights_1_2)) # linear + softmax
layer_2_delta = layer_2 - y # compare pred with truth
layer_1_delta = #backprop

Due to the usage of the non-linear activation, I would expect, that layer_1_delta would contain a sigmoid2deriv part, namely

def sigmoid2deriv(output):
    return output*(1-output)

layer_1_delta =*sigmoid2deriv(layer_1)

Using such a function in the backpropagation leads for me as well to higher performance on the test set on the same example

Kind Regards
I also agree with your point here. The author is clearly using a non-linear activation and so one needs to multiply the derivative of the activation with the normal layer_1_delta. I sure hope these issues get fixed in the final release.