The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

Please list errors found in the published version of Deep Learning with Python here. If necessary, we'll publish a comprehensive list for everyone's convenience. Thank you!

Listing 3.9 on page 74:
epochs = range(1, len(acc) + 1) acc is not defined here. Suggested correction:
epochs = range(1, len(loss_values) + 1)

Listing 3.10 on page 75:
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc') acc and val_acc are not defined here. Suggested correction:
plt.plot(epochs, acc_values, 'bo', label='Training acc')
plt.plot(epochs, val_acc_values, 'b', label='Validation acc')

Listing 5.25 on page 161
There is an extraneous comment in the code:
<1> Its shape is (1, 150, 150, 3) Suggested correction:
# Its shape is (1, 150, 150, 3)

Listing 6.7 on page 187:
from keras.layers import Flatten, Dense should be:
from keras.layers import Flatten, Dense, Embedding because Embedding is needed two lines later.

Page 203, line 7:
function an[d] a multiplication operation

Listing 6.30 on page 209:
<1> temperature (in degrees Celsius) Suggested correction:
# temperature (in degrees Celsius)

Listing 6.34 on pages 211-212:
The last two lines should be:
val_steps = (300000 - 200001 - lookback) // batch_size test_steps = (len(float_data) - 300001 - lookback) // batch_size Currently, they are missing the division by batch_size at the end.

In section 2.2.2 Vectors (1D tensors), on page 31, the code example defines a Numpy array with four entries ([12, 3, 6, 14]), yet the following paragraph discusses a vector having five entries. This could be corrected by adding one more entry to the array in the code.

In section 2.2.5 Key attributes, in the middle of page 33, it says, "More precisely, it’s an array of 60,000 matrices of 28 × 8 integers." That should instead be 28 × 28 integers.

At fold 2 and fold 3 in Figures 3.11 and 4.2, there are two "validation" rectangle
in each row. I think each row should have one "validation" rectangle.
This is from the published PDF version purchased at the end of Nov. 2017.

Page 213 from the hardcopy, in text under evaluate_naive_method, it was written that it yields a MAE of 0.28 and it has to be multiplied with standard deviation value to get the average absolute error.
But, the validation data is never normalized. In page 210, the normalization is done only for training data, which is right.
FYI, I ran the code as is and it gives a result of ~2.57 directly. No need to multiply with std?

Page 33 has the line: More precisely, it’s an array of
60,000 matrices of 28 × 8 integers. That should read: More precisely, it’s an array of
60,000 matrices of 28 × 28 integers.

Section 2.3.3 on page 42 has the line:
Because the rows and x and the columns of y must have the same size, This should read:
Because the rows of x and the columns of y must have the same size,

The last sentence of section 3.5.7 on page 84 says:
The network is able to cram most of the necessary information into these eight-dimensional representations, but not all of it. I believe that should be:
The network is able to cram most of the necessary information into these four-dimensional representations, but not all of it.

In Table 4.1, shouldn't the Last-layer activation for Multiclass, multilabel classification problems be "softmax" as well, like the Multiclass, single-label classification problems?

This is not so much an error as a confusingly written section of the book.

Page 36, section 2.2.10: the dimensions of the tensors disagree with the earlier description and with Figure 2.3 on page 35.

People with training in linear algebra (and experience in programming languages like Matlab and Python) will assume that the first axis points down from the top-left corner of the tensor, the second right from the top-left corner, and the third into the page. This is also what Figure 2.3 suggests.

According to this convention, the dimensions should be (3, 390, 250) for the stock price dataset and (128, 280, 1000000) for the tweets. Either that or the figure should be changed to represent the actual ordering of the data.

When I add the TensorBoard callback as specified in section 7.2.2 my model.fit() fails at the end of the 1st epoch with this error:

018-01-16 20:19:43.503627: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: You must feed a value for placeholder tensor 'embed_input' with dtype float and shape [?,500]
[[Node: embed_input = Placeholder[dtype=DT_FLOAT, shape=[?,500], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

model.fit() works fine when I remove the callback:

Errata in Listing 6.3 Using Keras for word-level one-hot encoding

In 6.3 the code uses the texts_to_matirx(samples, mode='binary') to get the results of word-level one-hot encoding, but the results are not one-hot encoding and totally different with 6.1

answers are not one-hot encoded and they are always be zeros

the following code is work

answers = np.zeros(shape=(num_samples, answer_vocabulary_size))
indices = np.random.randint(0, answer_vocabulary_size, size=num_samples)
for i, x in enumerate(answers):
x[indices[i]] = 1

In describing the meaning of positive or negative derivatives it is said that:

if a [the derivative] is negative, it means a small change of x around p will result in a decrease of f(x)

and similar phrasing for if a is positive.

This is not true and it should be phrased that if a is negative, increasing x by a small amount (not just changing it in any direction by a small amount) will result in a decrease of f(x)

As it currently stand it does not accurately describe the relationship that:
Derivative>0: increase x -> increase f(x)
Derivative<0: increase x -> decrease f(x)

the Conv2D padding should be "same" but the default is "valid"

layers.concatenate( [branch_a, branch_b, branch_c, branch_d], axis=-1) gets error "`Concatenate` layer requires inputs with matching shapes except for the concat axis" if padding is "valid".

In section 2.4.4, second paragraph, "...such a chain of functions can be derived using the following identity..." the word 'derived' should be 'differentiated.'

The training plots and validation accuracy are duplicated between the 'Feature extraction with augmentation' figure 5.17 and 5.18 and 'Fine tuning the last convolutional block' Figures 5.20 and 5.21.

The results quoted above 5.17 are also wrong, they are from the 'fine tuning the last convolutional blocks' section. I get a validation accuracy of 90.4%, and test set accuracy of 88.4% with augmentation.

The text of the section 2.4 should better explain the difference between 1) backpropagation (the general idea which states that the output error should be propagated backward throughout the network in order to update the weights, probably invented by Paul Werbos in 1974 and strongly influenced by ideas from control theory and cybernetics back to 50's, 2) the reverse mode of auto-differentiation or automatic differentiation for gradient backpropagation (the mechanism introduced by the Finnish Seppo Linnainmaa in 1974 and rediscovered by LeCun ~ 1982, Parker ~ 1982, Hinton and Rumelhart around 1985), 3) the chain-rule invented by Leibniz in 1676, and 4) gradient descent (a family of optimization methods, Cauchy invented the gradient descent method in 1847). Those concepts and their relations are pretty confusing for many people.

In 5.1. Introduction to convnets, in the following sentence before Figure 5.4.

For instance, with 3 × 3 windows, the vector output[i, j, :] comes from the 3D patch input[i-1:i+1, j-1:j+1, :]. The full process is detailed in figure 5.4.

Has to be:

For instance, with 3 × 3 windows, the vector output[i, j, :] comes from the 3D patch input[i-1:i+2, j-1:j+2, :]. The full process is detailed in figure 5.4.

The last sentence in the first paragraph on page 288 reads:
Minimizing this loss causes style(generated_image) to be close to style(reference_image), and content(generated_image) is close to content(generated_image), thus achieving style transfer as we defined it.

I believe it should read:
Minimizing this loss causes style(generated_image) to be close to style(reference_image), and content(generated_image) to be close to content(original_image), thus achieving style transfer as we defined it.

That page just lists an image for the book, but I can't find a confirmed errata for the book. Is there any place where we can confirm whether the errors reported are actually errors or not?

There's a note there - a single item. We don't have anymore at this time. As soon as the list is updated, I'll make a note here in the forum to let you know. Thanks!

I've got similar result as you when applying the Jupyter notebook code, either for the "feature extraction" or the "fine tuning" code, the validation acc is ~90%.
But I saw your attachment "These plots are produced when fine tuning the last conv and FC layers, not just FC layers" shows a similar result as the book itself, I want to know what do you mean by "both the last conv and FC layers, not just FC layers"? is there any modification to the code:

conv_base.trainable = True
set_trainable = False
for layer in conv_base.layers:
if layer.name == 'block5_conv1':
set_trainable = True
if set_trainable:
layer.trainable = True
else:
layer.trainable = False

Tim Gasser wrote:Chapter 5.3.1 Feature Extraction:

The training plots and validation accuracy are duplicated between the 'Feature extraction with augmentation' figure 5.17 and 5.18 and 'Fine tuning the last convolutional block' Figures 5.20 and 5.21.

The results quoted above 5.17 are also wrong, they are from the 'fine tuning the last convolutional blocks' section. I get a validation accuracy of 90.4%, and test set accuracy of 88.4% with augmentation.

The bag-of-2-grams and the bag-of-3-grams generated from the sentence “The cat sat on the mat” that are shown are rather the union of 1-gram and 2-grams: {"The", "The cat", "cat", "cat sat", "sat", "sat on", "on", "on the", "the", "the mat", "mat"} and the union of 1-gram, 2-grams and 3-grams: {"The", "The cat", "cat", "cat sat", "The cat sat", "sat", "sat on", "on", "cat sat on", "on the", "the", "sat on the", "the mat", "mat", "on the mat"}.

Those could be useful representation of sentences but by definition the bag-of-2-grams should be { "The cat", "cat sat", "sat on", "on the", "the mat"} and bag-of-3-grams: {"The cat sat", "cat sat on", "sat on the", "on the mat"}

Below a small Python code using the NLTK library to generate ngrams:

import nltk
from nltk import ngrams
from nltk.tokenize import word_tokenize
sentence = "The cat sat on the mat"
sentence_bigrams = {" ".join(bigram) for bigram in ngrams(word_tokenize(sentence), 2)}
print(sentence_bigrams)
sentence_trigrams = {" ".join(trigram) for trigram in ngrams(word_tokenize(sentence), 3)}
print(sentence_trigrams)

{'The cat', 'on the', 'cat sat', 'sat on', 'the mat'}
{'The cat sat', 'sat on the', 'cat sat on', 'on the mat'}

WeiHua wrote:Errata in Listing 6.3 Using Keras for word-level one-hot encoding

In 6.3 the code uses the texts_to_matirx(samples, mode='binary') to get the results of word-level one-hot encoding, but the results are not one-hot encoding and totally different with 6.1

471288 wrote:FYI, I ran the code as is and it gives a result of ~2.57 directly. No need to multiply with std?

You're right. If we run the normalization code before (Listing 6.32), then, the naive method will return ~0.29 that you could multiply by the temperature standard deviation which is ~ 8.85 C giving ~2.57 C

answers are not one-hot encoded and they are always be zeros

the following code is work

answers = np.zeros(shape=(num_samples, answer_vocabulary_size))
indices = np.random.randint(0, answer_vocabulary_size, size=num_samples)
for i, x in enumerate(answers):
x[indices[i]] = 1

Given the shapes of the input and weight matrices, the following line does not make sense:

output = relu(dot(W, input) + b)

The correct code would be

output = relu(dot(input, W) + b)

For instance, take the example in Section 3.4. Here, the input matrix has shape (batch_size, 10000) and the weight matrix for the first layer (obtained using model.get_weights()) has the shape (10000, 16). Now, when you do the matrix multiplication A * B, the number of columns in A must match the number of rows in B. Therefore, W * input does not make sense, but input * W does.

class Dense(Layer):
"""Just your regular densely-connected NN layer.
`Dense` implements the operation:
`output = activation(dot(input, kernel) + bias)`
where `activation` is the element-wise activation function
passed as the `activation` argument, `kernel` is a weights matrix
created by the layer, and `bias` is a bias vector created by the layer
(only applicable if `use_bias` is `True`).

"For balanced-classification problems, where every class is equally likely, accuracy and area under the receiver operating characteristic curve (ROC AUC) are common metrics. For class-imbalanced problems, you can use precision and recall."

Why don't you recommend ROC AUC for imbalanced problem.

Let me share a modest contribution to the François Chollet's book community. Here on my GitHub code repo you will find 4 companion Notebooks for the Chapter 7 «Advanced deep-learning best practices».

* 7.1-Keras functional API
* 7.2-Inspecting and monitoring DL models
* 7.3-Getting the_most out of your_models
* 7.4-Test Hyperas

(p216)
Below 6.3.2, first bullet is "loopback = 720 - Observations will go back 5 days".
But listing 6.34 use loopback = 1440. so it's better to change "loopback = 1440 - .... 10 days".

On page 41 (section 2.3.3) others have noted that "This operation returns a vector of 0s with the same shape as y" would be better with "same shape as x". However, x is a matrix, not a vector. I believe it should state "This operation returns a vector of 0s whose dimension equals the rows of matrix x" -- which is what the code correctly achieves.

>>> x = np.random.randint(5, size=(3,2))
>>> x
array([[3, 1],
[1, 1],
[0, 4]])
>>> y = np.array([ 10,20])
>>> y
array([10, 20])
>>> z = np.dot(x,y)
>>> z
array([50, 30, 80])
>>> x.shape
(3, 2)
>>> z.shape
(3,)

Notice that if x had 6 rows, the dot product vector would have 6 elements. Thus the dimension of the dot product vector (z) is not dependent on "shape as y" or "shape as x" (a matrix). Rather, it is equal to the number of rows of x.

(p349) In jupyter_notebook_config.py,
'IPKernelApp.ip' should be 'NotebookApp.ip' and
'Serves the notebooks locally' should be 'Serves the notebooks remotely'

There's a note there - a single item. We don't have anymore at this time. As soon as the list is updated, I'll make a note here in the forum to let you know. Thanks!
gclub

Under Listing 5.33 in P167,
"L2 norm (the square root of the average of the square of the values in the tensor)"
should be
"L2 norm (the square root of the sum of the square of the values in the tensor)"

4th code line in P322,
"50model.compile(..." should be "model.compile(..."

On p.141, Listing 5.14, Training the convnet using data-augmentation generators, batch_size=32 in the validation_generator definition should be batch_size=20. We want to cover exactly the validation set of 1000 samples over the 50 validation steps (i.e. 50x20=1000).