The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

443715 (1) [Avatar] Offline

It didn't become clear to me why you divide the loss + regularization by 2*x_train.size

Is this just an arbitrary value?

cost = tf.div(tf.add(tf.reduce_sum(tf.square(Y-y_model)),
tf.mul(reg_lambda, tf.reduce_sum(tf.square(w)))),

Thank you very much.
Amnon David (10) [Avatar] Offline
The reason for dividing by the training size is to "standardize" the cost function so that it doesn't depend on the training size, This way you can compare apples to apples if you want to compare the cost dynamics when using other models/training sets.

Regarding the multiplication by 2, that's just a way to make things look nice when taking the derivative of something squared...

Explained here too: