Hello fellow readers and writer,

Currently I'm reading the section about gradient descent with multiple inputs and outputs

and despite of the nice drawings I'm failing to implement a neural network for the MNIST

dataset that is explained later in the book. (eventually getting NaNs all over the place

)

After re-reading the previous sections, my understanding is that the following steps are

being performed while learning a multi/multi network:

Assume

*m* inputs and

*n* outputs:

Prediction is computed by performing the product of the weights matrix (**wm**) of

size n x m with the inputs vector (**iv**) of size m x 1; resulting in a prediction vector

(**pv**) of size n x 1

`pv_i = wm_i1 + ... + wm_im`

The prediction is compared to the goal vector (**gv**) of the current input by performing

an element-wise subtraction; this results in a delta vector (**dv**) of size n x 1 where its

elements follow the formula:

`dv_i = pv_i - gv_i`

A weights delta matrix (**dm**) is calculated by the outer product of the delta vector (n x 1)

and the inputs vector (m x 1). This step produces a matrix (n x m) where elements follow the formula:

`dm_ij = dv_i * iv_j`

Finally, the new weights matrix (**nwm**) is calculated from the element-wise subtraction of

the initial weights matrix and the alpha scaled weights delta matrix

`nwm_ij = wm_ij - alpha * dm_ij`

Now, the code samples in the book seem to imply

*slightly* different computations which, in my

opinion, are at least inconsistent.

It starts by defining the weight as a matrix where the rows are the vectors of weights applied

to compute one output.

#toes %win #fans
weights = [ [0.1, 0.1, -0.3] #hurt?
, [0.1, 0.2, 0.0] #win?
, [0.0, 1.3, 0.1] #sad?
]

Then the weight deltas are computed as the outer product of the input and the delta vector...

def outer_prod(vec_a, vec_b):
out = zeros_matrix(len(a),len(b))
for i in range(len(a)):
for j in range(len(b)):
out[i][j] = vec_a[i]*vec_b[j]
return out
weight_deltas = outer_prod(input,delta)

...resulting in a matrix that looks like what's transcribed below:

| in_0*d_0 in_0*d_1 in_0*d_2 |
weight_deltas = | in_1*d_0 in_1*d_1 in_1*d_2 |
| in_2*d_0 in_2*d_1 in_2*d_2 |

And finally the weights are updated:

for i in range(len(weights)):
for j in range(len(weights[0])):
weights[i][j] -= alpha * weight_deltas[i][j]

However, note that the meaning of the elements being subtracted are mixed. For instance, the index (0,1)

of the matrix, is computed from the input corresponding to the toes (in_0) and the delta corresponding to

the win? output (d_1) and the original weight that relates %win with hurt? output.

That doesn't make any sense to me. But, as I was having some trouble training the neural network, I thought

it would be a good idea to reach out to the community. Maybe some of you figured some way to make it work.

What have you tried? Is your network recognising characters yet?

Cheers