The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

sean.settle (5) [Avatar] Offline
#1
Section 1.4.1 says validation will be explained in the next chapter, but a quick search didn't find any mention of validation in
Chapter 2. Figure 3.3 is the depicts training and testing, but would be helpful to include some mention of validation. Around Listing 3.4 would also a perfect opportunity to go into a little more detail about how training, testing, and validating all fit together by discussing a little bit on finding the optimal lambda (the regularization parameter), and then measuring how well the model generalizes. This would go a long way to address what lambda to use and why.
Amnon David (10) [Avatar] Offline
#2
I might be missing something but if anything, the regularization on the y=x^2 scatter yields a more overfitting curve than without regularization. Increasing the X axis in the graph beyond the range (file attached) shows that prediction for values outside the trained range would fail miserably.
Amnon David (10) [Avatar] Offline
#3
Amnon David wrote:I might be missing something but if anything, the regularization on the y=x^2 scatter yields a more overfitting curve than without regularization. Increasing the X axis in the graph beyond the range (file attached) shows that prediction for values outside the trained range would fail miserably.


answering myself... what I missed was that linear regression is about interpolation and not extrapolation. There is no way that simple polynomial optimization within some boundary can predict what goes on outside that boundary. The idea with regression is to find the most suitable polynomial for the cluster of points within the given range.