[tt]acc[/tt] is not defined here. Suggested correction:

[tt]epochs = range(1, len(loss_values) + 1)[/tt]

Listing 3.10 on page 75:

[tt] plt.plot(epochs, acc, 'bo', label='Training acc')

plt.plot(epochs, val_acc, 'b', label='Validation acc')[/tt]

[tt]acc[/tt] and [tt]val_acc[/tt] are not defined here. Suggested correction:

[tt]plt.plot(epochs, acc_values, 'bo', label='Training acc')

plt.plot(epochs, val_acc_values, 'b', label='Validation acc')[/tt]

Listing 5.25 on page 161

There is an extraneous comment in the code:

[tt]<1> Its shape is (1, 150, 150, 3)[/tt]

Suggested correction:

[tt]# Its shape is (1, 150, 150, 3)[/tt]

Listing 6.7 on page 187:

[tt]from keras.layers import Flatten, Dense[/tt]

should be:

[tt]from keras.layers import Flatten, Dense, Embedding[/tt]

because [tt]Embedding[/tt] is needed two lines later.

Page 203, line 7:

function an[d] a multiplication operation

Listing 6.30 on page 209:

[tt]<1> temperature (in degrees Celsius)[/tt]

Suggested correction:

[tt]# temperature (in degrees Celsius)[/tt]

Listing 6.34 on pages 211-212:

The last two lines should be:

[tt]val_steps = (300000 - 200001 - lookback) // batch_size[/tt]

[tt]test_steps = (len(float_data) - 300001 - lookback) // batch_size[/tt]

Currently, they are missing the division by [tt]batch_size[/tt] at the end.

]]>

y = np.random.random((10, 32))

z = np.dot(x, y)

[/code]

or

[code]

y = y.transpose()

z = np.dot(x,y)

[/code]]]>

plt.imshow(digit, cmap=plt.cm.binary)

does not work on Windows machine]]>

(I'm using the provided github code, there are no typing errors.)

The problems start with chapter 5.1. There are no errors, the code is running. Problem is, the network isn't learning. The accuracy stays at 0.1 which indicates random choices. This is true for the training data and the test data.

Have any of you encountered the same problems? Any idea what could be wrong?

[u]Edit:[/u]

I forgot some relevant information. I'm using Ubuntu 16.04 in a VM I've created for this purpose. All I did was a system update, Python 3.6 installation, then I installed Keras/Tensorflow, Ipython etc. in a virtualenv.]]>

Was:

[code]acc = history.history['acc']

val_acc = history.history['val_acc'][/code]

Replace by:

[code]acc = history.history['binary_accuracy']

val_acc = history.history['val_binary_accuracy'][/code]

You can find the correct dict keys easily if you print history. history.]]>

Take a look at this picture:

[url]https://highlevelsynthesisblog.files.wordpress.com/2017/05/convolutionexample.png[/url]

This illustrates one step. In the next layer the convolution would include 9 d-blue values.

]]>

I would want this notion explained at or before the point where it says, "It is thus reasonable to learn a new embedding space with every new task." I am not suggesting that the text get into the algorithm for doing it (I personally would like that but you may feel it is out of scope). Just don't beg the question of how semantic distance gets into the picture at all.

/jg]]>

This is inaccurate; a bag is an unordered collection, but it allows duplicates, which sets do not. One might try:

The term 'bag' means a collection that is unordered where duplicates are significant. For example {"the", "cat", "on", "the", "mat"} is the same bag of words as {"cat", "mat", "on", "the", "the"}, but not the same as {"cat", "mat", "on", "the"}]]>

That's true, but it's also true that the original model has the best score, by quite a bit, on epoch 3, which confuses the point and begs the question, "Explain to me again why L2 is better?" It does not seem like a good example.]]>

I'm currently reading Chapter 3. After we encode the labels as an integer tensor in 3.5.6, the output layer of the nn should be a layer with 1 neuron, right? Also what kind of activation function should we use for the final layer? I tried linear layer and RELU, but they didn't work. Can someone help me with these? Thank you so much!]]>

The line:

from keras.utils import to_categorical

should be replaced by:

from keras.utils import np_utils

Therefore, the lines:

train_labels = to_categorical(train_labels)

test_labels = to_categorical(test_labels)

should be replaced by:

train_labels = np_utils.to_categorical(train_labels)

test_labels = np_utils.to_categorical(test_labels)

Can you confirm this?

My code for chapter 2 works fine if these changes are made.

I am using Keras version 2.0.6

]]>

When the font size of the main text is increased in the Epub version, the font size of the code listings also remains the same!

The Kindle version has the code listings in a font size that follows the font size of the main text. It is readable and works well. See attached screenshot for a comparison.]]>

Similar to the previous comment, I think the [b]input dimension[/b] to Embedding layer should be vocabulary_size.

Thus the line:

[code]embedded_posts = Embedding(256, vocabulary_size)(posts_input)

[/code]

should be replaced with:

[code]embedded_posts = Embedding(output_dim=256, input_dim=vocabulary_size)(posts_input)

[/code]]]>

Looks like my counting was right for normal Conv2D (37 weights) but I was wrong about SeparableConv2D (41 weights not 45).

I'm not sure why its 41, but in any case, 37 < 41, so I don't understand the claim that these use fewer trainable parameters.]]>

In the comments above the call to model.compile, it says "#Since we monitor 'acc', it should be part of the metrics of the model."

Does validation loss also need to be part of the metrics of the model, since we are monitoring it with the ModelCheckpoint callback?]]>

Excerpt From

Deep Learning with Python MEAP V06

Francois Chollet

This material may be protected by copyright.[/quote]]]>

After having read the section, I get the principle but it's still hand wavy.

The current figure (6.24) doesn't add much to what the text already says.

If you could include another figure that builds on the figure of your SimpleRNN (Figure 6.12) or LSTM (Figure 6.14), that would be awesome!]]>

As someone who is reading about recurrent dropout for the first time, I was confused as to exactly what parts of the network are being droped with regards to "dropout" and "recurrent_dropout" parameters. Can you give an example for an LSTM layer?]]>

In section 3.3.1, Tensorflow now support Windows (python 3.5).[/quote]

Also at the end of second paragraph in section 1.3.4, Nervana Systems was acquired by Intel for over $400M. "M" is missing.

]]>

One more point you might consider adding is that the use of the word "rank" is also confusing.

When I first picked up numpy, it was confusing to me bc in linear algebra the rank of a matrix means something very different from the number of axes in a tensor.]]>

Please see attachment.

-Patrick

]]>

Please see attachment.

-Patrick

]]>

Please see attachment.

-Patrick

]]>

Please see attachment.

-Patrick]]>

[quote]With broadcasting, you can generally apply two-tensor element-wise operations if

one tensor has shape (a, b, ... n, n + 1, ... m) and the other has shape (n, n + 1,

... m) . The broadcasting would then automatically happen for axes a to n - 1 .[/quote]

[tt]a[/tt], [tt]b[/tt] and [tt]m[/tt] are used as [i]independent[/i] dimension sizes. Similarly [tt]n[/tt] and [tt]n+1[/tt] should refer to independent dimension sizes, but the notation used strongly implies the constraint that the second of those dimensions is bigger than the first by exactly one element.

The whole idea is best expressed in mathematical notation with subscripts: (d_1, d_2, ... d_n, d_{n+1}, ... d_m) [I am using a TeX-like syntax here, with underscores introducing subscripts]. However, given the author's stated intention of [i]avoiding[/i] mathematical notation, it's not clear to me how to express the correct meaning neatly within those constraints.

In any case, as it currently stands, what is written is very misleading, as it mixes two different meanings for the same notation/syntax in the same set of comma-separated values:

1. the size of a dimension,

2. the number (identity) of a dimension.

Put another way

1. is the value of some component of [tt]array.shape[/tt]

2. is the value of the index used in accessing a component of [tt]array.shape[/tt].

Some notational distinction needs to be made between these two distinct meanings.]]>

os.mkdir(dst) / shutil.copyfile(src, dst)]]>

]]>

import os

imdb_dir = '/Users/fchollet/Downloads/aclImdb'

train_dir = os.path.join(imdb_dir, 'train')

labels = []

data = []

for label_type in ['neg', 'pos']:

dir_name = os.path.join(train_dir, label_type)

for fname in os.listdir(dir_name):

if fname[-4:] == '.txt':

f = open(os.path.join(dir_name, fname))

texts.append(f.read())

f.close()

if label_type == 'neg':

labels.append(0)

else:

labels.append(1)]]>

The code example 7.20 for Early Stopping monitor='acc'

The default setting (from the keras.io docs and the master branch) is to monitor='val_loss'

My understanding is the default settings in the code are generically optimal.

If so, then why does the book monitor 'acc'?]]>

It is minor error in the comments:

[code] # Directory with our validation cat pictures test_cats_dir = os.path.join(test_dir, 'cats') os.mkdir(test_cats_dir)

# Directory with our validation dog pictures test_dogs_dir = os.path.join(test_dir, 'dogs') os.mkdir(test_dogs_dir)[/code]

should be

[code] # Directory with our test cat pictures test_cats_dir = os.path.join(test_dir, 'cats') os.mkdir(test_cats_dir)

# Directory with our test dog pictures test_dogs_dir = os.path.join(test_dir, 'dogs') os.mkdir(test_dogs_dir)[/code]

validation -> test]]>

[url]https://github.com/fchollet/deep-learning-with-python-notebooks[/url]]]>

the input tensor and a tensor named W, an addition (+) between the resulting 2D tensor

and a vector b, and finally a relu operation. relu(x) is simply max(x, 0).]]>

I've double checked it.

Was wondering if it's just me or there's something more to this.

Thanks

]]>

Thanks.]]>

Instead of

my_slice = train_images[:, 14:, 14:]

Since we want the bottom right 14×14 block. Code is currently correct but misleading since it is only a coincidence that we can index from position 14

]]>

Moreover, i'm enjoying the book and discovering how cool keras is. As well, as different ways of solving the problems e.g the MNIST dataset.]]>

The TensorFlow library wasn't compiled to use {SSE4.1, SSE4.2, AVX} instructions, but these are available on your machine and could speed up CPU computation.

See TensorFlow issue [url]https://github.com/tensorflow/tensorflow/issues/8037[/url]

These warnings can be suppressed by setting

TF_CPP_MIN_LOG_LEVEL = 2

or by compiling from source.

Recommend you address or run the reader through compiling from source.

Cheers,

Eric]]>

Background: For business reasons, we can't use AWS --- but need strong deep learning in the cloud.]]>

[quote]Since the Dense layers on top are randomly initialized...[/quote]

He refers to the Dense layers being "on top" in several locations but this doesn't make sense to me. I visualize the network this way: [url]http://imgur.com/a/eiJ92[/url] (the forum won't seem to link the image inline correctly)

The image data is on top, it gets passed into the pre-built VGG16 model which then passes it [i]down[/i] through the Dense layers. I've always viewed the output of a network to be at the bottom. Am I mistaken?

]]>

should be

[quote]which gets Dropout applied to the output of layer right after it[/quote]

i.e.

"before" should be "after"

]]>

should be changed to

[code]layer_output *= np.random.randint(0, high=2, size=layer_output.shape)[/code]

i.e.

1. need module name 'random'

2. high parameter value needs to be 2, otherwise get all 0s.

Same issue in [b]Listing 4.11[/b]]]>

Is num_samples the number of questions based on various texts each of which needs an "answer"?

And max_length - is this the maximum number of question/text/answers that can be fed to this neural net?]]>

[code]import keras[/code]

one can accomplish the same with:

[code]import tensorflow as tf

keras = tf.contrib.keras[/code]

As I understand it, this version of Keras is being optimized to run with TensorFlow.

Eventually it will be moved from "contrib" to a standard part of TensorFlow, at which point you'll import it this way:

[code]import tensorflow as tf

keras = tf.keras[/code]

]]>

[code]import numpy as np

from keras.preprocessing.image import ImageDataGenerator

from keras.models import Sequential

from keras.layers import Dropout, Flatten, Dense

from keras import applications

# dimensions of our images.

img_width, img_height = 150, 150

top_model_weights_path = 'bottleneck_fc_model.h5'

#train_data_dir = 'data/train'

#validation_data_dir = 'data/validation'

train_data_dir = '/home/icg/Keras/dogs_and_cats/data/train'

validation_data_dir = '/home/icg/Keras/dogs_and_cats/data/validation'

nb_train_samples = 2000

nb_validation_samples = 800 #800

epochs = 50

batch_size = 16 # 16

def save_bottlebeck_features():

datagen = ImageDataGenerator(rescale=1. / 255)

# build the VGG16 network

model = applications.VGG16(include_top=False, weights='imagenet')

generator = datagen.flow_from_directory(

train_data_dir,

target_size=(img_width, img_height),

batch_size=batch_size,

class_mode=None, # this means our generator will only yield batches of data, no labels

shuffle=False) # our data will be in order, so all first 1000 images will be cats, then 1000 dogs

# the predict_generator method returns the output of a model, given

# a generator that yields batches of numpy data

bottleneck_features_train = model.predict_generator(

generator, nb_train_samples // batch_size)

np.save(open('bottleneck_features_train.npy', 'w'),

bottleneck_features_train)

generator = datagen.flow_from_directory(

validation_data_dir,

target_size=(img_width, img_height),

batch_size=batch_size,

class_mode=None,

shuffle=False)

bottleneck_features_validation = model.predict_generator(

generator, nb_validation_samples // batch_size)

np.save(open('bottleneck_features_validation.npy', 'w'),

bottleneck_features_validation)

def train_top_model():

train_data = np.load(open('bottleneck_features_train.npy'))

train_labels = np.array(

[0] * (nb_train_samples / 2) + [1] * (nb_train_samples / 2))

validation_data = np.load(open('bottleneck_features_validation.npy'))

validation_labels = np.array(

[0] * (nb_validation_samples / 2) + [1] * (nb_validation_samples / 2))

model = Sequential()

model.add(Flatten(input_shape=train_data.shape[1:]))

model.add(Dense(256, activation='relu'))

model.add(Dropout(0.5))

model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop',

loss='binary_crossentropy', metrics=['accuracy'])

model.fit(train_data, train_labels,

epochs=epochs,

batch_size=batch_size,

validation_data=(validation_data, validation_labels))

model.save_weights(top_model_weights_path)

print; print 'Saving bottleneck features...'

save_bottlebeck_features()

print; print 'Training top model...'

train_top_model()

# I got 90% test set accuracy...[/code]]]>

This really doesn't affect the machine learning content of the book but you might consider rephrasing to something like: "at the LHC, but more recent studies are being turned towards Keras-based deep neural networks due to their higher performance and ease of training on large datasets". The current sentence sounds like a universal adoption has been made and this is quite far from the truth. ]]>

Number of elements in output vector remains the same as in input tensor, so, it is not about reduction]]>

should be

[i]"But thinking of the vector being repeated 32 times alongside a new axis is a helpful mental model"[/i]

to match an example from the text (one paragraph above).

]]>

Epoch 1/10

...

Epoch 2/10

Guess the code and output are not matching.]]>

What's new?

Chapter 6, "Deep Learning for Text and Sequences"

Chapter 7, "Advanced Neural Network Design"

Chapter 8, "Generative Deep Learning"

Chapter 9, "Conclusion"]]>

[b]32[/b] should be replaced by [b]20[/b] to be consistent with the batch_size used in the code.]]>

I think it should be

[code]layer_output *= np.random.randint(0, high=2, size=layer_output.shape)[/code]]]>

validation_score = model.evaluate(training_data)[/code]

This should be:

[code]model = get_model()

model.train(training_data)

validation_score = model.evaluate(validation_data)[/code]

Just after:

[code]model = get_model()

test_score = model.evaluate(training_data + validation_data)

[/code]

This should be:

[code]model = get_model()

model.train(training_data + validation_data)

test_score = model.evaluate(test_data)[/code]

]]>

[quote]“We saw earlier that the derivative of a function f(x) of a single coefficient could be interpreted as the slope of the curve of f. Likewise, gradient(f)(W0) can be interpreted as the tensor describing the curvature of f(W) around W0.”

[/quote]

I think the gradient is more like a measure of the slope of a hyperplane tangent to the (embedded) manifold than a curvature. Even for a plane curve, the (extrinsic) curvature is a function of the second derivative. In higher dimensions, curvature involves 2-forms and connections. Might be relevant if one gets into information geometry, rather than piece-wise linear models.]]>

[code]def naive_matrix_dot(x, y):

# x and y are Numpy matrices

assert len(x.shape) == 2

assert len(y.shape) == 2

# the 1st dimension of x must be

# the same as the 0th dimension of y!

assert x.shape[1] == y.shape[0]

# this operation returns a matrix of 0s

# with a specific shape

z = np.zeros((x.shape[0], y.shape[1]))

# we iterate over the rows of x

for i in range(x.shape[0]):

# and over the columns of y

for j in range(y.shape[1]):

row_x = x[i, :]

column_y = y[:, j]

z[i,j] = naive_vector_dot(row_x, column_y)

return z[/code]]]>

Let's first break down these graphs. We have 2 inputs each, one "black-box-machine" in the middle and 1 output. For the bottom picture this makes perfect sense because we feed a tuple of x and y into this machine, which is a ML-algorithm and out we get a set of rules that we can now use to transform new data.

However for the top picture this breaks apart a little bit because the "classical programming" box does not represent anything. It should be the rules or the entire pipeline that wears the label "classical programming".

Classical Programming: Data --> Rules --> Answers

Machine Learning: (Data, Answers) --> ML Rules --> Rules

And these Rules can then be used to fill the gap in the first classical programming pipeline. However this obviously is less elegant than the current drawing. I haven't been able to come up with something that i'm entirely happy with :( I just wanted to say that the current Machine Learning box represents an ML-algorithm whereas the Classical Programming box doesn't represent any logic whatsoever, because its already contained within the rules-input.]]>

This could perhaps best be done not by reducing the number of graphs, but by keeping them where they are and instead adding to each:

- Picking distinct line types for training vs. test (e.g. dashed vs. dotted)

- Keeping the original (non-regularised) results the same colour in each graph

- Fading out the lines for previously-shown regularisation methods (reduce their opacity to 50% or lower?)

- Having the latest method in a new colour, and at full opacity

For example, the graph in the drop-out section could show the original results in the same dark blue currently used for all results, earlier results as faded (low-opacity) versions of themselves, and of course the drop-out lines in full colour (I like tangerine!). If the graph looks cluttered - well, lower the opacity further on the intermediate results. The key thing is that a single glance now allows the reader to compare all methods.]]>

https://keras.io/datasets/#boston-housing-price-regression-dataset

Make sure the version is updated to the latest:

[code]In [6]: keras.__version__

Out[6]: '2.0.1'

In [7]: from keras.datasets import boston_housing

In [8]: (x_train, y_train), (x_test, y_test) = boston_housing.load_data()

In [9]: x_train

Out[9]:

array([[ 1.23247000e+00, 0.00000000e+00, 8.14000000e+00, ...,

2.10000000e+01, 3.96900000e+02, 1.87200000e+01],

[ 2.17700000e-02, 8.25000000e+01, 2.03000000e+00, ...,

1.47000000e+01, 3.95380000e+02, 3.11000000e+00],

[ 4.89822000e+00, 0.00000000e+00, 1.81000000e+01, ...,

2.02000000e+01, 3.75520000e+02, 3.26000000e+00],

...,

[ 3.46600000e-02, 3.50000000e+01, 6.06000000e+00, ...,

1.69000000e+01, 3.62250000e+02, 7.83000000e+00],

[ 2.14918000e+00, 0.00000000e+00, 1.95800000e+01, ...,

1.47000000e+01, 2.61950000e+02, 1.57900000e+01],

[ 1.43900000e-02, 6.00000000e+01, 2.93000000e+00, ...,

1.56000000e+01, 3.76700000e+02, 4.38000000e+00]])

In [10]: y_train

Out[10]:

array([ 18.72, 3.11, 3.26, 8.01, 14.65, 11.74, 23.6 , 26.42,

16.65, 34.41, 20.31, 34.37, 14.69, 14.19, 27.26, 20.62,

[/code]]]>

Page 42 of the PDF (or page 38 as page number):

"You can take the dot product of two matrices x and y (dot(x, y)) if and only if x.shape[1] == y.shape[0]. [b]The result is a matrix with shape (x.shape[1], y.shape[0])[/b], where coefficients are the vector products between the rows of x and the columns of y."

That should be [b](x.shape[0], y.shape[1])[/b].

Similarly, Listing 2.35 has problems with shapes:

[code]

# this operation returns a matrix of 0s

# with a specific shape

z = np.zeros((x.shape[1], y.shape[0]))

# we iterate over the rows of x

for i in range(x.shape[1]):

# and over the columns of y

for j in range(y.shape[0]):

row_x = x[:, j]

column_y = y[i, :]

z[i, j] = naive_vector_dot(row_x, column_y)

return z[/code]

It should be:

[code] # this operation returns a matrix of 0s

# with a specific shape

z = np.zeros((x.shape[0], y.shape[1]))

# we iterate over the rows of x

for i in range(x.shape[0]):

# and over the columns of y

for j in range(y.shape[1]):

row_x = x[i, :]

column_y = y[:, j]

z[i, j] = naive_vector_dot(row_x, column_y)

return z[/code]

Someone else reported a problem in naive_vector_dot. (terms should be multiplied z += x[i] [b]*[/b] y[i])]]>

So far really enjoying the book.]]>

I found some typos in Listing 2.32 and 2.33.

It implements a dot product in a for loop, but it says

z += x[i] + y[i]

it should be rather

z += x[i] * y[i]

]]>