The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

393402 (2) [Avatar] Offline
#1
I realize this isn't a book about ML in general, so feel free to discount the rest of this post as off-topic.

The "Underfit" row in figure 3.3 and the description that follows might be worth another sentence or three to explain how one gets a thumbs-up on train and a thumbs-down on test. The "Underfit" box in figure 3.4 shows one of the classic explanations for underfitting, which is applying a linear model to non-linear data, but that doesn't really explain why test or validation would outperform train.

The SO thread below addresses the issue to some extent. Most of the response boils down to "are you really, really sure that's what you are seeing?" And that would be my first thought as well if I showed better train than validation or test.
https://stats.stackexchange.com/questions/59630/test-accuracy-higher-than-training-how-to-interpret

Thanks for reading. I'm enjoying the book.
Nishant Shukla (52) [Avatar] Offline
#2
Good point! I've just added a paragraph going in further detail about overfitting and underfitting.

Thank you for your suggestion!
edzzn (4) [Avatar] Offline
#3
I also think that some parts needs more explanation. Because we are supposing that the person who reads this book has no idea of Tensorflow and for me using functions like this in Listing 3.2:

train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) 


Is just magic if the only description is:

Define the operation that will be called on each iteration of the learning algorithm


I think that at least a couple of lines in what Gradient Descent is should be included. And maybe a bit on why it just works.

Why using a
.minimize(cost)
makes my code work? what is going behind the scenes?

I have a little of experience with Tensorflow so it is not a problem for me. But if I was totally new I would be really confused.