ebert (13) [Avatar] Offline
In chapter 3, there is a difference in the way the same training algorithm is executed in two scripts that I don't quite understand.

In listing 3.3, the code for running the algorithm is:

for epoch in range(training_epochs):
    for (x, y) in zip(trX, trY):
        sess.run(train_op, feed_dict={X: x, Y: y})

However in list 3.5, applying regularization, a similar step (inside the reg_lambda loop) is coded as:

for reg_lambda in np.linspace(0,1,100):
    for epoch in range(training_epochs):
       sess.run(train_op, feed_dict={X: x_train, Y: y_train})

Why it is not necessary to do a loop through x, y values in the second case?


Finally, I've found out why the difference in other tutorial. You can always decide how many points to include in each training step in the Gradient Descent method: one by one, which is the case in the first script (stochastic gradient descent), or all of them, which is the case in the second one (batch gradient descent). There is also possible to proceed slice by slice (mini-batch).

Not every approach is equal: in each one there is a different trade-off between speed to reach a good value of model parameters and computational resources used.

I don't know if this topic will be included in next chapters. If it's not the case, maybe it would be relevant to comment something on it.
Nishant Shukla (52) [Avatar] Offline
Hi ebert,

You bring up a great point that I did not have a chance to address. I think it would be a good idea if I more explicitly outlined the differences.
Sorry for the late response, but I hope you're finding your read informative!