443715 (1) [Avatar] Offline
#1
Hi!

It didn't become clear to me why you divide the loss + regularization by 2*x_train.size

Is this just an arbitrary value?

cost = tf.div(tf.add(tf.reduce_sum(tf.square(Y-y_model)),
tf.mul(reg_lambda, tf.reduce_sum(tf.square(w)))),
2*x_train.size)


Thank you very much.
Amnon David (10) [Avatar] Offline
#2
The reason for dividing by the training size is to "standardize" the cost function so that it doesn't depend on the training size, This way you can compare apples to apples if you want to compare the cost dynamics when using other models/training sets.

Regarding the multiplication by 2, that's just a way to make things look nice when taking the derivative of something squared...

Explained here too:
https://math.stackexchange.com/questions/884887/why-divide-by-2m/884901