Chapter 3 had an example showing how to calculate the prediction with a hidden layer in the network, but chapters 4 and 5 about gradient decent don't show an example with a hidden layer.
For the output nodes, it's obvious how to calculate the error, since the correct output can be compared against the predicted output. In all of the examples, this correct output was just given to us and didn't need to be calculated. However, for the hidden layers, how do we calculate the 'correct' value to compare against the hidden node's predicted value to continue the gradient descent back another layer.
Hopefully my question is clear. For a network with 3 input nodes, 3 hidden nodes, and 3 output nodes, how do we calculate the goal values for the 3 hidden nodes so that we can gradient decent to the weights between the input and hidden nodes?
