The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

422904 (7) [Avatar] Offline
Chapter 3 had an example showing how to calculate the prediction with a hidden layer in the network, but chapters 4 and 5 about gradient decent don't show an example with a hidden layer.

For the output nodes, it's obvious how to calculate the error, since the correct output can be compared against the predicted output. In all of the examples, this correct output was just given to us and didn't need to be calculated. However, for the hidden layers, how do we calculate the 'correct' value to compare against the hidden node's predicted value to continue the gradient descent back another layer.

Hopefully my question is clear. For a network with 3 input nodes, 3 hidden nodes, and 3 output nodes, how do we calculate the goal values for the 3 hidden nodes so that we can gradient decent to the weights between the input and hidden nodes?
422904 (7) [Avatar] Offline
To answer my own question: read the middle of chapter 6, starting at page 117.

When I'd posted this question, I'd just finished chapter 5 on Gradient Descent, and was confused that it didn't actually cover gradient descent through a hidden layer. I was so confused, I actually stopped at chapter 5 and read through other sources until I figured it out on my own, and then was happily surprised halfway through chapter 6 that it was covered after all.

I think it'd be helpful to end chapter 5 with a teaser "we still don't know how to handle hidden layers, but let's look at that next with Backpropagation", as the beginning of chapter 6 starts with a brand new example problem, it suggests that chapter 5 as finished 100% with gradient descent when that isn't actually true.