The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

Susan Harkins (424) [Avatar] Offline
#1
Please list errors found in the published version of Deep Learning with Python here. If necessary, we'll publish a comprehensive list for everyone's convenience. Thank you!

Susan Harkins
Errata Editor
Diogo (1) [Avatar] Offline
#2
Listing 3.9 on page 74:
epochs = range(1, len(acc) + 1)
acc is not defined here. Suggested correction:
epochs = range(1, len(loss_values) + 1)

Listing 3.10 on page 75:
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')

acc and val_acc are not defined here. Suggested correction:
plt.plot(epochs, acc_values, 'bo', label='Training acc')
plt.plot(epochs, val_acc_values, 'b', label='Validation acc')


Listing 5.25 on page 161
There is an extraneous comment in the code:
<1> Its shape is (1, 150, 150, 3)
Suggested correction:
# Its shape is (1, 150, 150, 3)

Listing 6.7 on page 187:
from keras.layers import Flatten, Dense
should be:
from keras.layers import Flatten, Dense, Embedding
because Embedding is needed two lines later.

Page 203, line 7:
function an[d] a multiplication operation

Listing 6.30 on page 209:
<1> temperature (in degrees Celsius)
Suggested correction:
# temperature (in degrees Celsius)

Listing 6.34 on pages 211-212:
The last two lines should be:
val_steps = (300000 - 200001 - lookback) // batch_size
test_steps = (len(float_data) - 300001 - lookback) // batch_size
Currently, they are missing the division by batch_size at the end.
dcer (8) [Avatar] Offline
#3
In section 2.2.2 Vectors (1D tensors), on page 31, the code example defines a Numpy array with four entries ([12, 3, 6, 14]), yet the following paragraph discusses a vector having five entries. This could be corrected by adding one more entry to the array in the code.
dcer (8) [Avatar] Offline
#4
In section 2.2.5 Key attributes, in the middle of page 33, it says, "More precisely, it’s an array of 60,000 matrices of 28 × 8 integers." That should instead be 28 × 28 integers.
Ryutaroh Matsumoto (1) [Avatar] Offline
#5
At fold 2 and fold 3 in Figures 3.11 and 4.2, there are two "validation" rectangle
in each row. I think each row should have one "validation" rectangle.
This is from the published PDF version purchased at the end of Nov. 2017.
Mark Thomas (7) [Avatar] Offline
#6
Small typo on page 6 - last paragraph.
For RG read RGB
471288 (1) [Avatar] Offline
#7
Page 213 from the hardcopy, in text under evaluate_naive_method, it was written that it yields a MAE of 0.28 and it has to be multiplied with standard deviation value to get the average absolute error.
But, the validation data is never normalized. In page 210, the normalization is done only for training data, which is right.
FYI, I ran the code as is and it gives a result of ~2.57 directly. No need to multiply with std?
gugat (2) [Avatar] Offline
#8
Listing 2.3.3 on page 41:

It says

"This operation returns a vector of 0s with the same shape as y"

Suggested

"This operation returns a vector of 0s with the same shape as x"
Mark Thomas (7) [Avatar] Offline
#9
Page 33 has the line: More precisely, it’s an array of
60,000 matrices of 28 × 8 integers.

That should read: More precisely, it’s an array of
60,000 matrices of 28 × 28 integers.
mgalloy (1) [Avatar] Offline
#10
Small typo on page xvi of print version, first sentence of last paragraph:

"After reading this book, you'll have a solid understand of what deep learning is..."

should be

"After reading this book, you'll have a solid understanding of what deep learning is..."
Mark Thomas (7) [Avatar] Offline
#11
Section 2.3.3 on page 42 has the line:
Because the rows and x and the columns of y must have the same size,
This should read:
Because the rows of x and the columns of y must have the same size,
Mark Thomas (7) [Avatar] Offline
#12
The last sentence of section 3.5.7 on page 84 says:
The network is able to cram most of the necessary information into these eight-dimensional representations, but not all of it.
I believe that should be:
The network is able to cram most of the necessary information into these four-dimensional representations, but not all of it.
Andrea Marchini (3) [Avatar] Offline
#13
Listing 6.2 on page 182:
token_index = dict(zip(range(1, len(characters) + 1), characters))
When using
token_index.get(character)

you don't get the index. Suggested correction:
token_index = dict(zip(characters, range(1, len(characters) + 1)))
Andrea Marchini (3) [Avatar] Offline
#14
Figure 6.14 on page 203 and Figure 6.15 on page 204. There is c t when you should find c t + 1 (see attachment)
JK (2) [Avatar] Offline
#15
Page 87. Figure 3.11 3-fold cross-validation
Fold 2 Validation Validatoin Training
Fold 3 Validation Training Validation

I think Validation in red should be "Training".
Andrea Marchini (3) [Avatar] Offline
#16
Missing parentheses in Listing 6.6 Loading the IMDB data for use with an Embedding layer on page 187:

x_train = preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen

523417 (2) [Avatar] Offline
#17
In Table 4.1, shouldn't the Last-layer activation for Multiclass, multilabel classification problems be "softmax" as well, like the Multiclass, single-label classification problems?
Susan Harkins (424) [Avatar] Offline
#18
Mark Thomas (7) [Avatar] Offline
#19
The NOTE at the top of page 135 states that:

the size of the feature maps decreases (from 148 × 148 to 7 × 7)

whereas, for consistency with the previous paragraph, and the following code, that should be:

the size of the feature maps decreases (from 150 × 150 to 7 × 7)
Mark Thomas (7) [Avatar] Offline
#20
Mark Thomas wrote:The NOTE at the top of page 135 states that:

the size of the feature maps decreases (from 148 × 148 to 7 × 7)

whereas, for consistency with the previous paragraph, and the following code, that should be:

the size of the feature maps decreases (from 150 × 150 to 7 × 7)


Hmmm. Not sure now - the model definition specifies 150 x 150, but the resulting model summary shows 148 x 148. Why is that?
539519 (2) [Avatar] Offline
#21
This is not so much an error as a confusingly written section of the book.

Page 36, section 2.2.10: the dimensions of the tensors disagree with the earlier description and with Figure 2.3 on page 35.

People with training in linear algebra (and experience in programming languages like Matlab and Python) will assume that the first axis points down from the top-left corner of the tensor, the second right from the top-left corner, and the third into the page. This is also what Figure 2.3 suggests.

According to this convention, the dimensions should be (3, 390, 250) for the stock price dataset and (128, 280, 1000000) for the tweets. Either that or the figure should be changed to represent the actual ordering of the data.
539519 (2) [Avatar] Offline
#22
p. 75, Figure 3.8: The y-axis is "Loss" when it should be "Accuracy."

p.82, Figure 3.10: Same as above.
539195 (1) [Avatar] Offline
#23
When I add the TensorBoard callback as specified in section 7.2.2 my model.fit() fails at the end of the 1st epoch with this error:

018-01-16 20:19:43.503627: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: You must feed a value for placeholder tensor 'embed_input' with dtype float and shape [?,500]
[[Node: embed_input = Placeholder[dtype=DT_FLOAT, shape=[?,500], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

model.fit() works fine when I remove the callback:

tb = TensorBoard(log_dir=os.path.join(my_dir, 'tensorboard'), histogram_freq=1, embeddings_freq=1)


What is the fix for this?


Mark Thomas (7) [Avatar] Offline
#24
Page 173, 3rd paragraph, has a typo: 224 x 244 instead of 224 x 224
WeiHua (11) [Avatar] Offline
#25
typo Listing 3.20

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()


plt.ylabel('Loss') should be plt.ylabel('Accuracy')

WeiHua (11) [Avatar] Offline
#26
Listing6.2 Character-level one-hot encoding (toy example)

the code in the book is wrong but that in 6.1-one-hot-encoding-of-words-or-characters.ipynb is right
WeiHua (11) [Avatar] Offline
#27
Listing 6.21 Numpy implementation of a simple RNN

final_output_sequence = np.concatenate(successive_outputs, axis=0)


final_output_sequence.shape = (6400,) but the comment is "The final output is a 2D tensor of
shape (timesteps, output_features)."

I think
final_output_sequence = np.stack(successive_outputs, axis=0)
maybe right
WeiHua (11) [Avatar] Offline
#28
Errata in Listing 6.3 Using Keras for word-level one-hot encoding

In 6.3 the code uses the texts_to_matirx(samples, mode='binary') to get the results of word-level one-hot encoding, but the results are not one-hot encoding and totally different with 6.1

results in 6.1 are :
array([[[ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]],

       [[ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]]])

results in 6.3:
array([[ 0.,  1.,  1., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.]])
WeiHua (11) [Avatar] Offline
#29
Errata in Listing 6.34 Preparing the training, validation, and test generators
val_steps = (300000 - 200001 - lookback)
test_steps = (len(float_data) - 300001 - lookback)

should be
val_steps = (300000 - 200001 - lookback) // batch_size
test_steps = (len(float_data) - 300001 - lookback) // batch_size
WeiHua (11) [Avatar] Offline
#30
Listing 7.1 Functional API implementation of a two-input question-answering model
embedded_text = layers.Embedding(
64, text_vocabulary_size)(text_input)

embedded_question = layers.Embedding(
32, question_vocabulary_size)(question_input)


should be

embedded_text = layers.Embedding(
text_vocabulary_size,64)(text_input)

embedded_question = layers.Embedding(
question_vocabulary_size,32)(question_input)
WeiHua (11) [Avatar] Offline
#31
Listing 7.2 Feeding data to a multi-input model

answers = np.random.randint(0, 1, size=(num_samples, answer_vocabulary_size))

answers are not one-hot encoded and they are always be zeros

the following code is work
answers = np.zeros(shape=(num_samples, answer_vocabulary_size))
indices = np.random.randint(0, answer_vocabulary_size, size=num_samples)
for i, x in enumerate(answers):
    x[indices[i]] = 1 
Oisín Moran (3) [Avatar] Offline
#32
Section 2.4.1 pg 47

In describing the meaning of positive or negative derivatives it is said that:
if a [the derivative] is negative, it means a small change of x around p will result in a decrease of f(x)

and similar phrasing for if a is positive.

This is not true and it should be phrased that if a is negative, increasing x by a small amount (not just changing it in any direction by a small amount) will result in a decrease of f(x)

As it currently stand it does not accurately describe the relationship that:
Derivative>0: increase x -> increase f(x)
Derivative<0: increase x -> decrease f(x)
Oisín Moran (3) [Avatar] Offline
#33
In Listing 6.23 pg. 200

There is only one import statement:

from keras.layers import Dense


while there should be four:

from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN, Dense
WeiHua (11) [Avatar] Offline
#34
pg244, code of Inception modules is wrong

the Conv2D padding should be "same" but the default is "valid"

layers.concatenate( [branch_a, branch_b, branch_c, branch_d], axis=-1) gets error "`Concatenate` layer requires inputs with matching shapes except for the concat axis" if padding is "valid".
WeiHua (11) [Avatar] Offline
#35
pg250, the comment of EarlyStopping is "Monitors the model's validation accuracy" but the code monitors the training accuracy

keras.callbacks.EarlyStopping(monitor='acc',patience=1,)

should be
keras.callbacks.EarlyStopping(monitor='val_acc',patience=1,)
WeiHua (11) [Avatar] Offline
#36
pg252, code of class ActivationLogger gets a error "write() argument must be str, not bytes"

f = open('activations_at_epoch_' + str(epoch) + '.npz', 'w') 

should be
 f = open('activations_at_epoch_' + str(epoch) + '.npz', 'wb')

Oisín Moran (3) [Avatar] Offline
#37
Appendix A.4 pg. 344

$ watch -n 5 NVIDIA-smi -a --display=utilization

should read:
$ watch -n 5 nvidia-smi -a --display=utilization

(incorrect capitalization of nvidia)
536547 (1) [Avatar] Offline
#38
Code snip from "Listing 5.7. Using ImageDataGenerator to read images from directories "


The first " target_size=(150, 150)" is missing a comma " target_size=(150, 150) , "
303763 (9) [Avatar] Offline
#39
Page 227: "At this point, you could retrain this model for the right number of epochs (eight) and run it on the test set."

Is "the right number of epochs" four instead of eight?
Andrey Melentyev (1) [Avatar] Offline
#40
Not entirely sure, but I think 2.4.3 (page 51) might have a typo in the Nesterov momentum example:

velocity = past_velocity * momentum + learning_rate * gradient


should instead be

velocity = past_velocity * momentum - learning_rate * gradient
drpositron (9) [Avatar] Offline
#41
In section 2.4.4, second paragraph, "...such a chain of functions can be derived using the following identity..." the word 'derived' should be 'differentiated.'
8forty (4) [Avatar] Offline
#42
Page 36 section 2.2.10: "...every minute is encoded as a 3D vector...", should be 1D vector.
Tim Gasser (1) [Avatar] Offline
#43
Chapter 5.3.1 Feature Extraction:

The training plots and validation accuracy are duplicated between the 'Feature extraction with augmentation' figure 5.17 and 5.18 and 'Fine tuning the last convolutional block' Figures 5.20 and 5.21.

The results quoted above 5.17 are also wrong, they are from the 'fine tuning the last convolutional blocks' section. I get a validation accuracy of 90.4%, and test set accuracy of 88.4% with augmentation.


Rayed Bin Wahed (7) [Avatar] Offline
#44
Figure 3.8 Training and validation accuracy
The y-axis is labelled Loss. It should be Accuracy

The same mistake is repeated in Figure 3.10
Claude COULOMBE (15) [Avatar] Offline
#45
2.3.4 Tensor reshaping

"Naturally, the reshaped tensor has the same total number of coefficients as the initial tensor."

should be:

"Naturally, the reshaped tensor has the same total number of elements as the initial tensor."
Claude COULOMBE (15) [Avatar] Offline
#46
2.4.4. Chaining derivatives: The Backpropagation algorithm

... called the chain rule: f(g(x)) = f'(g(x)) * g'(x)

should be

... called the chain rule: f(g(x))' = f'(g(x)) * g'(x)

Maybe you could add that the chain-rule has been first used by Leibniz in 1676.
Claude COULOMBE (15) [Avatar] Offline
#47
The text of the section 2.4 should better explain the difference between 1) backpropagation (the general idea which states that the output error should be propagated backward throughout the network in order to update the weights, probably invented by Paul Werbos in 1974 and strongly influenced by ideas from control theory and cybernetics back to 50's, 2) the reverse mode of auto-differentiation or automatic differentiation for gradient backpropagation (the mechanism introduced by the Finnish Seppo Linnainmaa in 1974 and rediscovered by LeCun ~ 1982, Parker ~ 1982, Hinton and Rumelhart around 1985), 3) the chain-rule invented by Leibniz in 1676, and 4) gradient descent (a family of optimization methods, Cauchy invented the gradient descent method in 1847). Those concepts and their relations are pretty confusing for many people.
Andrey C. (14) [Avatar] Offline
#48
In 5.1. Introduction to convnets, in the following sentence before Figure 5.4.

For instance, with 3 × 3 windows, the vector output[i, j, :] comes from the 3D patch input[i-1:i+1, j-1:j+1, :]. The full process is detailed in figure 5.4.


Has to be:

For instance, with 3 × 3 windows, the vector output[i, j, :] comes from the 3D patch input[i-1:i+2, j-1:j+2, :]. The full process is detailed in figure 5.4.
303763 (9) [Avatar] Offline
#49
The last sentence in the first paragraph on page 288 reads:
Minimizing this loss causes style(generated_image) to be close to style(reference_image), and content(generated_image) is close to content(generated_image), thus achieving style transfer as we defined it.

I believe it should read:
Minimizing this loss causes style(generated_image) to be close to style(reference_image), and content(generated_image) to be close to content(original_image), thus achieving style transfer as we defined it.
546576 (1) [Avatar] Offline
#50
Susan Harkins wrote:An errata list is available at https://manning-content.s3.amazonaws.com/download/6/afc781e-23e4-4ed6-ab8f-9dbdce20718c/Chollet_DeepLearninginPython_err1.html. Thank you!

Susan Harkins
Errata Editor


Hi Susan,

That page just lists an image for the book, but I can't find a confirmed errata for the book. Is there any place where we can confirm whether the errors reported are actually errors or not?

Thanks!
Susan Harkins (424) [Avatar] Offline
#51
There's a note there - a single item. We don't have anymore at this time. As soon as the list is updated, I'll make a note here in the forum to let you know. Thanks!

Susan Harkins
548294 (2) [Avatar] Offline
#52
Hi, Tim:

I've got similar result as you when applying the Jupyter notebook code, either for the "feature extraction" or the "fine tuning" code, the validation acc is ~90%.
But I saw your attachment "These plots are produced when fine tuning the last conv and FC layers, not just FC layers" shows a similar result as the book itself, I want to know what do you mean by "both the last conv and FC layers, not just FC layers"? is there any modification to the code:
conv_base.trainable = True

set_trainable = False
for layer in conv_base.layers:
    if layer.name == 'block5_conv1':
        set_trainable = True
    if set_trainable:
        layer.trainable = True
    else:
        layer.trainable = False


Tim Gasser wrote:Chapter 5.3.1 Feature Extraction:

The training plots and validation accuracy are duplicated between the 'Feature extraction with augmentation' figure 5.17 and 5.18 and 'Fine tuning the last convolutional block' Figures 5.20 and 5.21.

The results quoted above 5.17 are also wrong, they are from the 'fine tuning the last convolutional blocks' section. I get a validation accuracy of 90.4%, and test set accuracy of 88.4% with augmentation.


550543 (1) [Avatar] Offline
#53
gugat wrote:Listing 2.3.3 on page 41:

It says

"This operation returns a vector of 0s with the same shape as y"

Suggested

"This operation returns a vector of 0s with the same shape as x"


+1 Noticed this oversight as well...
502941 (1) [Avatar] Offline
#54
Listing 3.27 K-fold validation.

Text shows:
num_val_samples = len(train_data) //k

When *(I believe) it should show
num_val_samples = math.floor(len(train_data) / k)

Otherwise the length of train_data will be 0 for all k except 0, and 404 for k == 0.

*Note math.floor works fine because 4 | 404
551074 (5) [Avatar] Offline
#55
Hello,

I just bought a hard copy of this book. Could you please let me know where I can download the errata list? Thanks.
Susan Harkins (424) [Avatar] Offline
#56
We have one still in editing; it should be available in a few days. I'll post a link when it's up. Thanks!

Susan Harkins
Errata Editor
501510 (2) [Avatar] Offline
#57
Errata in Deep Learning with Python Figure p109
It seems that the figure page 109 is not correct. Indeed, the second row does not show a 50% dropout ratio.
Susan Harkins (424) [Avatar] Offline
#58
An updated errata page for Errata in Deep Learning with Python is available at https://manning-content.s3.amazonaws.com/download/f/5133515-6524-4b8f-b52b-f2c6484d9a24/Chollet_DeepLearninginPython_err2.html. Thanks!

Susan Harkins
Errata Editor
551074 (5) [Avatar] Offline
#59
Hello Susan, thanks for the update. Do you usually update the pdf version of the book with all the errata corrected?
Susan Harkins (424) [Avatar] Offline
#60
Currently, we update only when we reprint.

Susan H.
556676 (1) [Avatar] Offline
#61
Listing 4-7. Loss Functions page 42 - 43

it seems last code line should be not the
print "Squared Error [[0.01,0.01,0.01]],[[0.99,0.99,0.01]]:", f_sigmoid([[0.01,0.01,0.01]],[[0.99,0.99,0.01]])

but rather
print ("Squared Error [[0.01,0.01,0.01]],[[0.99,0.99,0.01]]:", f_squared_error([[0.01,0.01,0.01]],[[0.99,0.99,0.01]]))

Haesun Park (24) [Avatar] Offline
#62
I agree that this is Nesterov momentum and '- learning_rate * gradient' is right. smilie

Andrey Melentyev wrote:Not entirely sure, but I think 2.4.3 (page 51) might have a typo in the Nesterov momentum example:

velocity = past_velocity * momentum + learning_rate * gradient


should instead be

velocity = past_velocity * momentum - learning_rate * gradient
Claude COULOMBE (15) [Avatar] Offline
#63
JK wrote:Page 87. Figure 3.11 3-fold cross-validation
Fold 2 Validation Validatoin Training
Fold 3 Validation Training Validation

I think Validation in red should be "Training".
Claude COULOMBE (15) [Avatar] Offline
#64
Section 6.1 - Understanding n-grams...
Section 6.1 - Understanding n-grams...

The bag-of-2-grams and the bag-of-3-grams generated from the sentence “The cat sat on the mat” that are shown are rather the union of 1-gram and 2-grams: {"The", "The cat", "cat", "cat sat", "sat",   "sat on", "on", "on the", "the", "the mat", "mat"} and the union of 1-gram, 2-grams and 3-grams: {"The", "The cat", "cat", "cat sat", "The cat sat",   "sat", "sat on", "on", "cat sat on", "on the", "the",   "sat on the", "the mat", "mat", "on the mat"}.

Those could be useful representation of sentences but by definition the bag-of-2-grams should be { "The cat", "cat sat", "sat on", "on the", "the mat"} and bag-of-3-grams: {"The cat sat", "cat sat on", "sat on the", "on the mat"}

Below a small Python code using the NLTK library to generate ngrams:

import nltk
from nltk import ngrams
from nltk.tokenize import word_tokenize
sentence = "The cat sat on the mat"
sentence_bigrams = {" ".join(bigram) for bigram in ngrams(word_tokenize(sentence), 2)} 
print(sentence_bigrams)
sentence_trigrams = {" ".join(trigram) for trigram in ngrams(word_tokenize(sentence), 3)}
print(sentence_trigrams)

{'The cat', 'on the', 'cat sat', 'sat on', 'the mat'}
{'The cat sat', 'sat on the', 'cat sat on', 'on the mat'}




Susan Harkins (424) [Avatar] Offline
#65
ptah (5) [Avatar] Offline
#66
WeiHua wrote:Errata in Listing 6.3 Using Keras for word-level one-hot encoding

In 6.3 the code uses the texts_to_matirx(samples, mode='binary') to get the results of word-level one-hot encoding, but the results are not one-hot encoding and totally different with 6.1

results in 6.1 are :
array([[[ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]],

       [[ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]]])

results in 6.3:
array([[ 0.,  1.,  1., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.]])


this is seriously bugging me. is there not a keras way to do proper one hot encoding
Claude COULOMBE (15) [Avatar] Offline
#67
471288 wrote:FYI, I ran the code as is and it gives a result of ~2.57 directly. No need to multiply with std?


You're right. If we run the normalization code before (Listing 6.32), then, the naive method will return ~0.29 that you could multiply by the temperature standard deviation which is ~ 8.85 C giving ~2.57 C
564108 (1) [Avatar] Offline
#68
On page 301, Listing 8.24

return z_mean + K.exp(z_log_var) * epsilon


should be

return z_mean + K.exp(z_log_var / 2) * epsilon

Haesun Park (24) [Avatar] Offline
#69
On page 68, above listing 3.1,

'about 80MB of data will be downloaded' should be 'about 17MB of data will be downloaded'


On page 69, in first bullet below 3.4.2,

'(samples, word_indices)' should be '(samples, max_length)' or '(samples, sequence_length)' like chapter 6.
ptah (5) [Avatar] Offline
#70
Are these feeding into the livebook?
Susan Harkins (424) [Avatar] Offline
#71
The production process for the LiveBook corrects known errata, when possible.

Susan Harkins
Errata Editor
Jool (5) [Avatar] Offline
#72
[redacted]
Claude COULOMBE (15) [Avatar] Offline
#73
WeiHua wrote:Listing 7.2 Feeding data to a multi-input model

answers = np.random.randint(0, 1, size=(num_samples, answer_vocabulary_size))

answers are not one-hot encoded and they are always be zeros

the following code is work
answers = np.zeros(shape=(num_samples, answer_vocabulary_size))
indices = np.random.randint(0, answer_vocabulary_size, size=num_samples)
for i, x in enumerate(answers):
    x[indices[i]] = 1 


Nice! Another solution could be:

answers = np.random.randint(answer_vocabulary_size, size=(num_samples))
answers = keras.utils.to_categorical(answers, answer_vocabulary_size)
hysic (1) [Avatar] Offline
#74
Page 255, Line 2, I think "10,000 words" should be "2,000 words", because Listing 7.7 shows `max_features = 2000`.
Haesun Park (24) [Avatar] Offline
#75
In section 4.4.2,

"L2 regularization— The cost added ... the weight coefficients (the L2 norm of the weights)."

should be

"L2 regularization— The cost added ... the weight coefficients (the squared L2 norm of the weights)."
Haesun Park (24) [Avatar] Offline
#76
Section 4.4.2, Below of listing 4.6

"l2(0.001) means every coefficient in the weight matrix of the layer will add 0.001 * weight_coefficient_value to the total loss of the network."

but keras.regularizer.L1L2 is implemented by `regularization += K.sum(self.l2 * K.square(x))`,

so it should be changed like '0.001 * weight_coefficient_value^2'.
222629 (1) [Avatar] Offline
#77
Pages 38 and 70

Given the shapes of the input and weight matrices, the following line does not make sense:

output = relu(dot(W, input) + b)


The correct code would be

output = relu(dot(input, W) + b)


For instance, take the example in Section 3.4. Here, the input matrix has shape (batch_size, 10000) and the weight matrix for the first layer (obtained using model.get_weights()) has the shape (10000, 16). Now, when you do the matrix multiplication A * B, the number of columns in A must match the number of rows in B. Therefore, W * input does not make sense, but input * W does.

Just checked the source code (https://github.com/keras-team/keras/blob/master/keras/layers/core.py) and it confirms what I mention above:

class Dense(Layer):
    """Just your regular densely-connected NN layer.
    `Dense` implements the operation:
    `output = activation(dot(input, kernel) + bias)`
    where `activation` is the element-wise activation function
    passed as the `activation` argument, `kernel` is a weights matrix
    created by the layer, and `bias` is a bias vector created by the layer
    (only applicable if `use_bias` is `True`).
Haesun Park (24) [Avatar] Offline
#78
(p112) In 4.5.2 Chossing a measure of success,

"For balanced-classification problems, where every class is equally likely, accuracy and area under the receiver operating characteristic curve (ROC AUC) are common metrics. For class-imbalanced problems, you can use precision and recall."

Why don't you recommend ROC AUC for imbalanced problem. smilie
569183 (3) [Avatar] Offline
#79
Listing 2.6

for the line:

digit = train_images[4]

ndim of digit is 1, shape is 784 (i.e. 28*28 )

In order to work, digit has to be reshaped to ( 28, 28 )

digit = digit.reshape((28,28 ))

...
Haesun Park (24) [Avatar] Offline
#80
(163p) In Listing 5.28,

"Return a list of five Numpy arrays" should be "Return a list of eight Numpy arrays"

Thanks
Haesun Park (24) [Avatar] Offline
#81
(p170) In Listing5.39,

results = np.zeros((8 * size + 7 * margin, 8 * size + 7 * margin, 3))


should be

results = np.zeros((8 * size + 7 * margin, 8 * size + 7 * margin, 3), dtype='uint8')


for matplotlib 2.2,

because matplotlib 2.2 clip the float value to [0, 1].
569183 (3) [Avatar] Offline
#82
Listing 3.28

mae_history = history.history['val_mean_absolute_error']

should be

mae_history = history.history['mean_absolute_error']
569183 (3) [Avatar] Offline
#83
listing 3.26

I would suggest build_model() to take shape as an argument. i.e. build_model(shape)

The existing code references train_data.shape[1], which is outside the scope of build_model

The amended code looks like this:

def build_model(shape):

...

model.add(layers.Dense(num_hidden_nodes, activation='relu', input_shape=(shape,)))
Claude COULOMBE (15) [Avatar] Offline
#84
Good news! Companion Notebooks for Chapter 7...
Let me share a modest contribution to the François Chollet's book community. Here on my GitHub code repo you will find 4 companion Notebooks for the Chapter 7 «Advanced deep-learning best practices».

* 7.1-Keras functional API
* 7.2-Inspecting and monitoring DL models
* 7.3-Getting the_most out of your_models
* 7.4-Test Hyperas
Haesun Park (24) [Avatar] Offline
#85
(p139) 2nd bullet below Listing 5.11,

"width_shift and height_shift" should be "width_shift_range and height_shift_range".

Thanks.
Haesun Park (24) [Avatar] Offline
#86
(p163)

In listing 5.29,
first_layer_activation[0, :, :, 4]
should be
first_layer_activation[0, :, :, 3]
because fourth channel's index is 3.

In listing 5.30,
first_layer_activation[0, :, :, 7]
should be
first_layer_activation[0, :, :, 6]
because seventh channel's index is 6.

In listing 5.31, "Tiles each filter into ..." should be "Tiles each activation into ..."

Thanks.
Haesun Park (24) [Avatar] Offline
#87
In Figure 6.15,

output_t = activation(Wo•input_t + Uo•state_t + Vo•c_t + bo)

should be

output_t = activation(Ct) * activation(Wo•input_t + Uo•state_t + bo)

(https://en.wikipedia.org/wiki/Long_short-term_memory#LSTM_with_a_forget_gate)
Haesun Park (24) [Avatar] Offline
#88
p196, at 5th line from bottom
"..(of shape (input_features,), and.." should be "..(of shape (input_features,)), and.."

Figure 6.10, "bo" should be "b"

p199, in summary() outputs
"simplernn" should be "simple_rnn"

Listing 6.25
output_t = activation(dot(state_t, Uo) + dot(input_t, Wo) + dot(C_t, Vo) + bo)

should be
output_t = activation(c_t) * activation(dot(input_t, Wo) + dot(state_t, Uo) + bo)


In Figure 6.14,
output_t = activation(Wo•input_t + Uo•state_t + Vo•c_t + bo) 

should be
output_t = activation(c_t) * activation(Wo•input_t + Uo•state_t + bo) 

Haesun Park (24) [Avatar] Offline
#89
(p216)
Below 6.3.2, first bullet is "loopback = 720 - Observations will go back 5 days".
But listing 6.34 use loopback = 1440. so it's better to change "loopback = 1440 - .... 10 days".
Haesun Park (24) [Avatar] Offline
#90
(p230) Listing 6.48
To make timeseries twice as long, lookback should be 1440, not 720.
Haesun Park (24) [Avatar] Offline
#91
In #2 comment of listing 8.8

"without its convolutional base" should be "with its convolutional base only".

Thanks.
Haesun Park (24) [Avatar] Offline
#92
(p293) In first sentence, "gradient-ascent process" should be "gradient-descent process" for consistency.
Haesun Park (24) [Avatar] Offline
#93
In title of Fig 8.13, "z_log_sigma" should be "z_log_var" for consistency.
Haesun Park (24) [Avatar] Offline
#94
(p300) Two "exp(z_log_variance)" should be "exp(0.5 * z_log_variance)".

(p301) In Listing 8.24, "K.exp(z_log_var)" should be "K.exp(0.5 * z_log_var)".
584426 (1) [Avatar] Offline
#95
page 174, listing 5.42: 'african_e66lephant_output' should be 'african_elephant_output'
589017 (1) [Avatar] Offline
#96
On page 41 (section 2.3.3) others have noted that "This operation returns a vector of 0s with the same shape as y" would be better with "same shape as x". However, x is a matrix, not a vector. I believe it should state "This operation returns a vector of 0s whose dimension equals the rows of matrix x" -- which is what the code correctly achieves.
>>> x = np.random.randint(5, size=(3,2))
>>> x
array([[3, 1],
       [1, 1],
       [0, 4]])

>>> y = np.array([ 10,20])
>>> y
array([10, 20])

>>> z = np.dot(x,y)
>>> z
array([50, 30, 80])

>>> x.shape
(3, 2)
>>> z.shape
(3,)


Notice that if x had 6 rows, the dot product vector would have 6 elements. Thus the dimension of the dot product vector (z) is not dependent on "shape as y" or "shape as x" (a matrix). Rather, it is equal to the number of rows of x.
Haesun Park (24) [Avatar] Offline
#97
(p349) In jupyter_notebook_config.py,
'IPKernelApp.ip' should be 'NotebookApp.ip' and
'Serves the notebooks locally' should be 'Serves the notebooks remotely'
591136 (1) [Avatar] Offline
#98
There's a note there - a single item. We don't have anymore at this time. As soon as the list is updated, I'll make a note here in the forum to let you know. Thanks!
gclub
Haesun Park (24) [Avatar] Offline
#99
Under Listing 5.33 in P167,
"L2 norm (the square root of the average of the square of the values in the tensor)"
should be
"L2 norm (the square root of the sum of the square of the values in the tensor)"

4th code line in P322,
"50model.compile(..." should be "model.compile(..."
Markus Jalsenius (1) [Avatar] Offline
#100
On p.141, Listing 5.14, Training the convnet using data-augmentation generators, batch_size=32 in the validation_generator definition should be batch_size=20. We want to cover exactly the validation set of 1000 samples over the 50 validation steps (i.e. 50x20=1000).