Page 52:

In box 3 & 5, prediction of 0.85 is actually 0.935

in box 4, prediction of 0.85 is actually 0.765

Page 82,83 and 84:

0.14 delta is actually -0.14]]>

it was explained before chapter 12 in the book, that if one does not add non-linear layer in the network, then the additional layers of the network could be expressed as non-deep network

However, in the RNN example in chapter 12, I do not see any non-linear functions being applied for the hidden layer of the RNN. It is mentioned only in conclusion, that the neural network trained is a linear recurrent neural network.

I believe that mentioning it before (together with code with RNN) would make a content more clear]]>

def softmax(x):

e_x = np.exp(x - np.max(x))

return e_x / e_x.sum(axis=0)

[/code]

Typically softmax is defined with substraction np.max(x)

It would be good, if it was clarified in text, why in this case, such substraction is used (and whether it needed, as both in usual softmax and this softmax would lay in range 0-1, but the probability of predicting certain value get changed)

]]>

[quote]

The most important object is our layers list, which has two vectors (layer['state'] and

layer['previous->hidden']). In order to backpropagate, we're going to take our output

gradient and add a new object to each list called layer['state_delta'] which is going to

represent the gradient at that layer.

[/quote]

First of all, there are no layer list in the code

Secondly, the variables in the code on the same page are being named as:

layer['hidden'] instead of layer['state'], and layer['hidden_delta'] instead of layer['state_delta']

I'm not able to find, to which variable the layer['previous->hidden'] refers to.]]>

However, we've replaced

the identity matrix with a matrix called "recurrent", which is initialized to be all zeros (and will

be learned through training)

[/quote]

But I'm not able to find the code for the initialization of matrix with all zeros nowhere in the book and as well the provided code for chapter 12.

Possibly, the book or the code should be corrected

]]>

[b]The Surprising Power of Averaged Word Vectors

[/b]

The execution of the following line would lead to error:

[code]

import numpy as np

norms = np.sum(weights_0_1 * weights_0_1,axis=1)

[/code]

due to the reason, that weights_0_1 are not declared in this notebook

]]>

[quote]

Thus, because we're taking an average of a noisy signal (i.e.

[/quote]]]>

Could you please dedicate a sentence to a "presumably" fictitious nature of the "toes" statistics?

I know it might sound odd that I personally spent hours investigating this (annoying) matter; even called all my friends that know anything about sports to no avail. A simple sentence along these lines "the number of Toes is a humor stat describing the feet of the athletes with a complete set believed to give the best stability while cornering or ...., this isn't a published stat so just follow along without overthinking it, just look at your feet and rejoice its completeness!". Because believe it or not as the chapter progresses the plot surrounding the "toes" gets thicker and thicker until one gets thoroughly confused (mostly me).

Thank you!]]>

[code]from __future__ import division[/code]

Maybe it should be mentioned on the book? Thanks!]]>

My post entitled "Calculating Error and Delta Updated" concerns the same issue.

What do you think of my conclusion?

Regards

Anthony]]>

In chapter 6 the error is error = (goal_prediction - prediction) ** 2 and delta = prediction - goal_prediction. So now the error is calculated in the opposite way ie the prediction is subtracted from the goal.

The question is does it matter whether you subtract prediction from goal, or subtract goal from prediction?

I have noticed that if the node delta is prediction - true, then the weight delta has to be SUBTRACTED from the weights. This seems to give the derivative because the sign of the weight delta is that of the slope, so you need to subtract it from the weights so they move in the opposite direction of the slope.

However if the node delta is true - prediction, then the weight delta has to be ADDED to the weights. This seems to give the weight delta directly so you just add it to the weights.

Is this fact of any consequence?

Regards

Anthony Perkins]]>

There isn't. The only other "from scratch" books I could find are these:

https://www.oreilly.co.jp/books/9784873117584/

https://www.oreilly.co.jp/books/9784873118369/

The first one has some mention of RL but I couldn't find any english translation/versions of it.

[/quote]

The first one has a translation in Korean and I have one, however it does not cover RL. Though I cannot read second one for my lack of understanding Japanese, it seems that it's a book on NLP. [/quote]

Thanks! I could only find Chinese translation of the original text -- hopefully someone will come up with English translations for both the first and second book. They seem to have good reviews. ]]>

pred = (8.5 * 0.1) + (0.65 * 0.2) + (1.2 * -0.1) = 0.85 + 0.13 -0.12 = 0.86

true = 1

delta = pred - true = 0.86 - 1 = -0.14

So, the delta value must be -0.14 and not 0.14.

[b]Also, wlrec[0] is different in the code and the figure.[/b]]]>

This not only makes the text look fuzzy instead of crisp, but also makes the book a lot harder to use. I was happily copy-pasting examples out of version 10...

And with the book labeled "finished" (despite major line-break problems and typos), I worry that this will be the final product.

-- Jds]]>

In chapter 6 the error is error = (goal_prediction - prediction) ** 2 and delta = prediction - goal_prediction. So now the error is calculated in the opposite way ie the prediction is subtracted from the goal.

The question is does it matter whether you subtract prediction from goal, or subtract goal from prediction?

I have noticed that if the node delta is prediction - true, then the weight delta has to be SUBTRACTED from the weights. This seems to give the derivative because the sign of the weight delta is that of the slope, so you need to subtract it from the weights so they move in the opposite direction of the slope.

However if the node delta is true - prediction, then the weight delta has to be ADDED to the weights. This seems to give the weight delta directly so you just add it to the weights.

Is this fact of any consequence?

Regards

Anthony Perkins]]>

Go to https://www.manning.com/books/grokking-deep-learning and click on "author's letter on completing manuscript". ]]>

The chapter "we" waited the most. People are disappointed.

I'm so sad to see this from the books description:

And at the end, you'll even build an A.I. that will learn to defeat you in a classic Atari game.

Because I plan to purchase Grokking Deep Reinforcement Learning,

will the description for the other MEAP books can be trusted?

]]>

Furthermore, the description on page 181-182 (Intro to Embedding Layer) discusses One-Hot Encoding of words and how to sum the vectors to avoid matrix vector multiplication. But the code uses Word indexes instead of one hot encoding. Also, the concept of Embeddings is quite confusing especially what is the significance of their hidden size (100 in the example code).]]>

[code]layer_2_delta = (layer_2 - walk_stop[0:1])

= -0.02129555 - 1

= -1.02129555[/code]

Therefore, layer_2_delta should be [b]-1.02[/b] instead of 0.14.

At [b](3) LEARN: Backpropagate From layer_2 to layer_1[/b],

[code]layer_1_delta = layer_2_delta.dot(weights_1_2.T)

= [-1.02129555] * [ 0.07763347, -0.16161097, 0.370439]

= [-0.07928672, 0.16505257, -0.3783277]

layer_1_delta *= relu2deriv(layer_1)

= [-0.07928672, 0.16505257, -0.3783277] * [0, 1, 0]

= [0, 0.16505257, 0]

[/code]

Hence, layer_1_delta for the 2nd column (row in the diagram) should be [b].17[/b] instead of -.17.

]]>

1. It seems that there is a problem with the bold arrows. They should be coming in from the [u]triple inputs[/u] to [b]"win?"[/b] prediction instead coming out of just [b]"win & loss"[/b] input to the [u]triple predictions[/u].

2. Error for the [b]"win?"[/b] isn't [i].96[/i] but [i].0004[/i].

3. There is no function named zeros_matrix granting that w_sum function was just omitted for the convenience. I imported [b]numpy[/b], and used [b]numpy.zeros[/b] for this one.

]]>

[b](1) An Empty Network With Multiple Outputs[/b]

Because we are using [i]scalar_ele_mul[/i] function in this example, [i]pred[/i] should use this function as well, so the correct code would be:

[code]def neural_network(inputs, weights):

pred = scalar_ele_mul(input, weights)

return pred

[/code]

[b](2) PREDICT: Make a Prediction and Calculate Error and Delta & (3) COMPARE: Calculating Each "Weight Delta" and Putting It on Each Weight[/b]

The values of [i]Delta[/i] & [i]Error[/i] should be switched for both [i]win?[/i] & [i]sad?[/i] in the diagram:

[code]pred[1] = 0.65 * 0.2 = 0.13

delta[1] = 0.13 - 1.0 = -0.87

error[1] = (-0.87) ** 2 = 0.757

pred[2] = 0.65 * 0.9 = 0.585

delta[2] = 0.585 - 0.1 = 0.485

error[2] = 0.485 ** 2 = 0.235

[/code]

[b](3) COMPARE: Calculating Each "Weight Delta" and Putting It on Each Weight & (4) LEARN: Updating the Weights[/b]

[i]weight_deltas[/i] should be[i] input [/i]multiplied by [i]deltas[/i], not [i]weights[/i], so the correct code would be

[code]weight_deltas = scalar_ele_mul(input, delta)

[/code]

]]>

There are a couple of minor mistakes in the code for [i](5) COMPARE + LEARN: Comparing our Errors and Setting our New Weight[/i]:

1) In Python, [b]OR[/b] operator is [b]or[/b], not [b]||[/b] like C descendants or Java.

2) The last [b]if[/b] statement should compare [b]e_up[/b] against [b]e_dn[/b].

So, the correct code would be:

[code]if(error > e_dn or error > e_up):

if(e_dn < e_up):

weight -= lr

if(e_up < e_dn):

weight += lr[/code]]]>

In the example of dot products:

[code]h = np.zeros((5,4)).T # matrix with 4 rows and 5 columns

i = np.zeros((5,6)) # matrix with 6 rows & 5 columns

j = h.dot(i)

print(j.shape) # outputs (4,6)

[/code]

As [b]i[/b] isn't transposed like [b]h[/b] is, the comment regarding [b]i[/b] should be [b]matrix with 5 rows & 6 columns[/b] instead. It's a minor mistake, but I just want to let you know.]]>

In [b]vect_mat_mul[/b] function in pg.41 of MEAP 11:

[code]def vect_mat_mul(vect,matrix):

assert(len(vect) == len(matrix))

output = [0,0,0]

for i in range(len(vect)):

output[i] = w_sum(vect,matrix[i])

return output[/code]

[b]assert[/b] is comparing the length of the vector to the length of the matrix which actually is the number of the rows in the matrix. However, because the row in the matrix represents each prediction, shouldn't the length of inputs, the vector, be compared to the length of the weights per prediction, the number of the columns in the matrix, instead, something like the below?

[code]def vect_mat_mul(vect,matrix):

assert(len(vect) == len(matrix[0]))

output = [0,0,0]

for i in range(len(vect)):

output[i] = w_sum(vect,matrix[i])

return output[/code]

To me, it's like multiplying a 3 x 3 matrix to a 3 x 1 column vector, so comparing the number of the rows in the matrix to the number of rows in the column vector doesn't seem right. The number of columns in a matrix should match with the number of rows in a column vector to successfully multiply them together.]]>

Can you please reissue this MEAP thanks ]]>

bad understanding of why those code works.]]>

https://forums.manning.com/posts/list/43420.page;jsessionid=5CC885B413FB3E134959D0D9D5F71C10]]>

]]>

]]>

After training I thought I'd be able to use the learned weights to make predictions:

[code]prediction = streetlights[1].dot(weights)[/code]

Even if I increase the number of iteration during training, the above prediction never comes close to the correct answer (i.e. [tt]walk_vs_stop[1][/tt]).

After training is complete isn't that how to use the learned weights to make a prediction? TIA]]>

[code]pred = neural_network(input,weight) [/code]

while the variable supposed to be weights as initialized in the 2nd line:

[code]weights = np.array([0.1, 0.2, 0])[/code]

So, the correct code supposed to be:

[code]pred = neural_network(input,weights) [/code]]]>

[code]for i in range(a):[/code]

I believe it supposed to be:

[code]for i in range(len(a)):[/code]]]>

[code]layer_1 = np.mean(weights_0_1[left_context+right_context],axis=0)[/code]

The same question goes later when updating weights_0_1

[code]weights_0_1[left_context+right_context] -= layer_1_delta * alpha[/code]

My guess is that it helps to link between nearby words in a sentence, but I'm not sure why np.mean will do the work?

]]>

I'm just excited.]]>

function definition (which has an argument with the same name) is confusing and feels out of place.

I suggest you open an errata so they can incorporate your fix. Well done ;-)]]>

[quote]MEAPs from the Grokking series are only available in PDF. They are PDF-only products during the meap. A final edition will be available in all formats.[/quote]]]>

@page 41 the vector "vector of zeros" should be defined, maybe:

[code]output = [0]*len(vector)[/code]]]>

why computing the average of *deltas layer_2* with np.outer and not using dot product instead ? (before weights_1_2 updating) # line 11 of your code

thanks

Edit: a ok, sorry: it's a "vector * vector" product., not vector / matrix...that's ok]]>

On page 174, layer_2_delta is calculated as follows

[code]

layer_2_delta = (labels[batch_start:batch_end] - layer_2) / (batch_size * layer_2.shape[0])

[/code]

I'm confused on why the division is done. Running the code without the division tells me that it provides some moderation to the weight updates, but I don't understand the intuition behind picking that exact value of batch_size * layer_2.shape[0].

Thanks!]]>

I really appreciate the use of code without dependencies to get the concepts across. I've been slowly amassing the notebooks as I work through each example. I use python 3 and am not largely familiar with python 2. I'm attributing much of the debugging to version changes and MEAP.

I'm on the edge of my seat waiting for more chapters and fingers crossed the Dec 2017 publish date is still on track.

Thanks Andrew for a wonderful look at the nuance of neural networks.

]]>

[code]left_context = review[max(0,target_i-window):target_i]

right_context = review[target_i+1:min(len(review),target_i+window)][/code]

The right context will only have one word as context.

It should be:

[code]right_context = review[target_i+1:min(len(review),target_i+1+window)][/code]

That is the ending index has to be target_i+1+window (and not target_i+window)

The result may only improve slightly between one word and two-word right context, but wanted to state the correction.

]]>

When I'd posted this question, I'd just finished chapter 5 on Gradient Descent, and was confused that it didn't actually cover gradient descent through a hidden layer. I was so confused, I actually stopped at chapter 5 and read through other sources until I figured it out on my own, and then was happily surprised halfway through chapter 6 that it was covered after all.

I think it'd be helpful to end chapter 5 with a teaser "we still don't know how to handle hidden layers, but let's look at that next with Backpropagation", as the beginning of chapter 6 starts with a brand new example problem, it suggests that chapter 5 as finished 100% with gradient descent when that isn't actually true.]]>

[quote]

A parametric model is characterized by having a fixed number of parameters whereas a non-parametric model's number of parameters is infinite (determined by data).

As an example, let's say the problem was to fit a square peg into the correct (square) hole. Some humans (such as babies) just jam it into all the holes until it fits somewhere (parametric). A teenager, however, might just count the number of sides (4) and then search for the hole with an equal number (non-parametric).

[/quote]

Why is the teenage example called non-parametric when it has a known number of sides (4), which sounds to me like parametric's "fixed number of parameters"?

Could this be explained a bit better? Another analogy, maybe?

]]>

[tt]input: 0.5

weight: 0.5

output: 0.4

[/tt]

but the output of input x weight should be = [i]0.25[/i], not 0.4. The raw error of 0.55 seems to be used correctly, as the mean squared error is shown to be 0.30 (instead of 0.3025)]]>

In step 4, our givens are:

[tt]input = 8.5

weight = .1

goal = 1.0

[/tt]

so our calculations should be:

[tt]predicted value = .85

raw error = -0.15

weight delta = input x raw error = 8.5 x -0.15 = -1.275

[/tt]

this also means that step 5 should show a new weight of 0.11275]]>

Should be

"Later, we will unpack how other forms of unsupervised learning are also just a form of clustering and

why these clusters are useful for supervised learning."]]>

aaaaand it got updated. :)]]>

this lines of code on the page 86

delta = pred - true

[quote][size=18]weight_deltas = ele_mul(delta,weights)[/size][/quote]

i believe it should be weight_deltas = ele_mul(delta,inputs)

as the image on page 86 shows.

]]>

Thanks for writing the book though - incredibly valuable!]]>

This:

[code]

def w_sum(a, b):

assert(len(a) == len(b))

output = 0

for i in range(a):

output += (a[i] * b[i])

return output

[/code]

Should be:

[code]

def w_sum(a, b):

assert(len(a) == len(b))

output = 0

for i in range(len(a)): #The len(a) was missing

output += (a[i] * b[i])

return output

[/code]]]>

+ 1

I got quite confused while following this chapter. First it explains shortly what deep learning is, and then drops it completely. Everything after that is titled "machine learning". So, are these topics in the scope of this book, or just mentioned for completeness?

I would have expected a structure like that:

What is machine learning?

What kind of machine learning areas are there?

What kind of areas does deep learning cover?

Every chapter that refers to deep learning should be somehow recognizable via name. Names are important.]]>

weights = [ [0.1, 0.1, -0.3], #hurt?

[0.1, 0.2, 0.0], #win?

[0.0, 1.3, 0.1]] #sad?

def neural_network(input, weights):

pred = vect_mat_mul(input,weights)

return pred

toes = [8.5, 9.5, 9.9, 9.0]

wlrec = [0.65,0.8, 0.8, 0.9]

nfans = [1.2, 1.3, 0.5, 1.0]

##

# Fixes

# 1) variable names a, b become vect and matix

# 2) changed range(vect) to range(len(vect))

##

def vect_mat_mul(vect,matrix):

assert(len(vect) == len(matrix))

output = 0

for i in range(len(vect)):

# print (vect[i] , "x",matrix[i])

output += ( vect[i] * matrix[i])

return output

##

# Call neural_network on first input for each set of weights

# and print result

# Fixes

# 1) added variable weight to refer to current set of weights

##

input = [toes[0],wlrec[0],nfans[0]]

for i in range(len(weights)):

weight = weights[i]

pred = neural_network(input,weight)

print(pred)

[/code]

This prints the results

0.555

0.9800000000000001

0.9650000000000001

]]>

delta = pred - true = 0.86 - 1 = - 0.14]]>

import numpy as np

weight_deltas = np.array(outer_prod(input, delta)).T

This will correctly adjust the weights and the errors reaching 0]]>

it's only been listed as:

"What's new?

Chapters 1-9 have been updated."

so there are some update's on the previous chapter. Like page 20, "Non-Parametric Learning", is new.

]]>

By the way, enjoying the book :)]]>

Actually I do get 20 with the code below. If you had have reported that you got 3 I would have suggested that maybe you have the return statement as part of the for loop - in which case it would break out after one iteration and return 3, but to get 6 it would have had to break after 2 iterations! Hmmm...

[code]def w_sum(a,b):

assert len(a) == len(b), "Number of Inputs does not equal number of Weights"

output = 0

for i in range(len(a)):

output += (a[i] * b[i])

return output

print(w_sum([3,1,5],[1,2,3]))[/code]

# 20]]>

[code]

>>> knob_weight = 0.5

>>> input = 0.5

>>> goal_pred = 0.8

>>> pred = input * knob_weight

>>> error = (pred - goal_pred) ** 2

>>> error

0.30250000000000005

>>>

[/code]]]>

This is just one example, but this problem occurs a number of times on page 46. ]]>

A few comments regarding the Python snippet which introduces the Jupyter use in the Chapter 3 section "What is a Neural Network?" First, it wasn't immediately clear that the Python snippet was actually two columns. You might consider keeping it a single column of code (or include a picture of a screenshot after it's entered into a Jupyter notebook). Additionally, it might be worth explicitly mentioning that it's merely Python code that it being typed into Jupyter.

Lastly (and I'm probably not the first to notice or mentioned this), when I typed the snippet into Jupyter and ran it, it didn't print the value of 'pred' (which is what the text says it will do). I had to add a "print(pred)" statement to see any output. Screenshot attached.

Thanks.

--Troy]]>

however, weight is not defined in both programs. Instead, a weights list is defined.

Hence, the code might be:

pred = neutral_network(input, weights)

]]>

https://www.slideshare.net/ManningBooks/how-do-neural-networks-make-predictions-73138393 qualifies it:

"This network takes in one datapoint at a time (average number of toes [b]-on the playersâ€™ feet- on the baseball team[/b])"]]>

He makes through 1D network for the MNIST problem here:

https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork/blob/master/part2_neural_network_mnist_data.ipynb

This shows what happens with the letter '0':

https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork/blob/master/part2_mnist_data_set.ipynb

The part I had some trouble finding was how the 10 target neurons are set with the correct answers.

He also triggers the network to generate its own idea of a number. This is my major interest- it's the heart of style transfer.

https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork/blob/master/part3_neural_network_mnist_backquery.ipynb

]]>

Try running the following:

def vec_Mat_Mult(Mat, Vec):

#Vec is a vector e.g. a list it will be a row vector with n rows

#Mat is a matrix e.g. a list of list matrix m x n

#Mat x Vec in m x 1 : (m x n) x (n x 1)

z = []

for i in range(len(Mat)): #loop over rows of matrix

assert (len(Vec) == len(Mat[i])) #the number of columns of Mat

#must match rows of Vec

wtd_sum = 0

for j in range(len(Vec)): #loop over columns of Matrix

wtd_sum += Mat[i][j]*Vec[j] #perform sum

z.append(wtd_sum)

return z

import numpy as np

vec = [1, 2, -20]

mat = [[1, 0, 0],

[0, 1, 1],

[-1, 2, 1],

[0, 0, 1]]

result = vec_Mat_Mult(mat, vec)

result_2 = np.array(mat).dot(np.array(vec))

print(result)

print(result_2)]]>

[ 0, 0, 1 ],

[ 1, 1, 1 ],

[ 0, 1, 1 ],

[ 1, 0, 1 ] ] )

If

walk_vs_stop = np.array( [ 0 ],

[ 1 ],

[ 0 ],

[ 1 ],

[ 1 ],

[ 0 ] ] )

Both lack the leading bracket to make it a list of lists.]]>

[code]for i in range(a): //TypeError: 'list' object cannot be interpreted as an integer[/code]

[b]Python 3[/b]

fixed with:

[code]for i in range(0, len(a)):[/code]

]]>

I'm actually considering focusing more on expanding the book to include the more general methods that state-of-the-art approaches seek to solve (memory, attention, representation), thus giving the best of both worlds. It introduces something that people can actually train on their laptops, while also introducing concepts that will stand the test of time as important to the DL community, whereas something like "Deep Residual Networks" could be replaced again at ICML this year. Coverage on a trendy topic like Deep Residual Nets might be better suited for a blogpost.

I'd love to hear anyone's thoughts on the matter here.[/quote]

I'm also in total agreement. The approaches involving memory, attention, and representation align with more general strategies developed by biological intelligence through evolution to solve real-world vision problems unlike the more brute-forced approaches requiring immense amounts of data and computing power.]]>