diegonc (5) [Avatar] Offline
Hello fellow readers and writer,

Currently I'm reading the section about gradient descent with multiple inputs and outputs
and despite of the nice drawings I'm failing to implement a neural network for the MNIST
dataset that is explained later in the book. (eventually getting NaNs all over the place smilie)

After re-reading the previous sections, my understanding is that the following steps are
being performed while learning a multi/multi network:

Assume m inputs and n outputs:

  • Prediction is computed by performing the product of the weights matrix (wm) of
    size n x m with the inputs vector (iv) of size m x 1; resulting in a prediction vector
    (pv) of size n x 1

    pv_i = wm_i1 + ... + wm_im

  • The prediction is compared to the goal vector (gv) of the current input by performing
    an element-wise subtraction; this results in a delta vector (dv) of size n x 1 where its
    elements follow the formula:

    dv_i = pv_i - gv_i

  • A weights delta matrix (dm) is calculated by the outer product of the delta vector (n x 1)
    and the inputs vector (m x 1). This step produces a matrix (n x m) where elements follow the formula:

    dm_ij = dv_i * iv_j

  • Finally, the new weights matrix (nwm) is calculated from the element-wise subtraction of
    the initial weights matrix and the alpha scaled weights delta matrix

    nwm_ij = wm_ij - alpha * dm_ij

  • Now, the code samples in the book seem to imply slightly different computations which, in my
    opinion, are at least inconsistent.

    It starts by defining the weight as a matrix where the rows are the vectors of weights applied
    to compute one output.

               #toes %win #fans
    weights = [ [0.1, 0.1, -0.3] #hurt?
              , [0.1, 0.2, 0.0]  #win?
              , [0.0, 1.3, 0.1]  #sad?

    Then the weight deltas are computed as the outer product of the input and the delta vector...

    def outer_prod(vec_a, vec_b):
      out = zeros_matrix(len(a),len(b))
      for i in range(len(a)):
        for j in range(len(b)):
          out[i][j] = vec_a[i]*vec_b[j]
      return out
    weight_deltas = outer_prod(input,delta)

    ...resulting in a matrix that looks like what's transcribed below:

                     | in_0*d_0   in_0*d_1   in_0*d_2 |
    weight_deltas =  | in_1*d_0   in_1*d_1   in_1*d_2 |
                     | in_2*d_0   in_2*d_1   in_2*d_2 |

    And finally the weights are updated:

    for i in range(len(weights)):
      for j in range(len(weights[0])):
        weights[i][j] -= alpha * weight_deltas[i][j]

    However, note that the meaning of the elements being subtracted are mixed. For instance, the index (0,1)
    of the matrix, is computed from the input corresponding to the toes (in_0) and the delta corresponding to
    the win? output (d_1) and the original weight that relates %win with hurt? output.

    That doesn't make any sense to me. But, as I was having some trouble training the neural network, I thought
    it would be a good idea to reach out to the community. Maybe some of you figured some way to make it work.
    What have you tried? Is your network recognising characters yet? smilie

    Joey (17) [Avatar] Offline
    try using numpy to transpose your weight_deltas like:

    import numpy as np
    weight_deltas = np.array(outer_prod(input, delta)).T

    This will correctly adjust the weights and the errors reaching 0