422904 (7) [Avatar] Offline
Chapter 3 had an example showing how to calculate the prediction with a hidden layer in the network, but chapters 4 and 5 about gradient decent don't show an example with a hidden layer.

For the output nodes, it's obvious how to calculate the error, since the correct output can be compared against the predicted output. In all of the examples, this correct output was just given to us and didn't need to be calculated. However, for the hidden layers, how do we calculate the 'correct' value to compare against the hidden node's predicted value to continue the gradient descent back another layer.

Hopefully my question is clear. For a network with 3 input nodes, 3 hidden nodes, and 3 output nodes, how do we calculate the goal values for the 3 hidden nodes so that we can gradient decent to the weights between the input and hidden nodes?
422904 (7) [Avatar] Offline
To answer my own question: read the middle of chapter 6, starting at page 117.

When I'd posted this question, I'd just finished chapter 5 on Gradient Descent, and was confused that it didn't actually cover gradient descent through a hidden layer. I was so confused, I actually stopped at chapter 5 and read through other sources until I figured it out on my own, and then was happily surprised halfway through chapter 6 that it was covered after all.

I think it'd be helpful to end chapter 5 with a teaser "we still don't know how to handle hidden layers, but let's look at that next with Backpropagation", as the beginning of chapter 6 starts with a brand new example problem, it suggests that chapter 5 as finished 100% with gradient descent when that isn't actually true.