timi (2) [Avatar] Offline
#1
Very good content, looking forward for the next chapters.

Some notes that I took while reading:

Page 37 - In the tweets dataset, why is the additional dimension of 128 required? A tweet has 140 characters, what's the 128 for? (if it is for unicode, then a clarification/example could help)

Page 38 - below Listing 2.24 "...which takes as input a 2D" - where does the 2D comes from? the internal weights? the line of code only mentions '512' which seems like a single dimension

Page 38 - Listing 2.25 - Why does x have to be a 2D tensor? why can’t it be just a vector? (implementation would then be a single loop)

Page 43 - Figure 2.3 the labels 'row' and 'column' in the figure are confusing - it seems like they should be inverted or I am missing something...

Page 55 - '..gradient updates in total (469 per epoch)..' - how was 469 was derived? couldn't get to it from 5, 128, 2,345

Page 71 - regarding hot encoding - aren't we losing the order of the words by just encoding their hot encoding presence? if so isn't that important?

Page 74 - In the PDF version - Listing 3.42 - the code is cut

General about Sigmoid - would be nice to get some intuition or appendix as to how it/the network guarantees that all probabilities
sum to 1


61687 (2) [Avatar] Offline
#2
Page 70 - In the PDF version - Listing 3.42 - the code is cut;
where we can get the source code?
d-man (2) [Avatar] Offline
#3
Page 38 - below Listing 2.24 "...which takes as input a 2D" - where does the 2D comes from? the internal weights? the line of code only mentions '512' which seems like a single dimension

The author did add a disclaimer that we may not understand everything in this chapter, but some of the details will be covered in the next chapter. So, I assume it is covered there (I haven't made it to chapter 3 yet).

But, here is what this 2D tensor referring to.

512 is the number of units in this network layer and is not related to the shape of the input tensor. The input 2D tensor is of the shape (batch_size, input_dim) i.e. (batch_size, 784)

https://keras.io/layers/core/#dense
196209 (1) [Avatar] Offline
#4
Re: "Page 55 - '..gradient updates in total (469 per epoch)..' - how was 469 was derived? couldn't get to it from 5, 128, 2,345"

Divide the number of samples in the training data by the size of the training batches gives you the number updates per epoch.
e.g. 60000 / 128 = 469 (approximately)

Have to agree this could have been clearer.
309495 (1) [Avatar] Offline
#5
page 62 - pdf version
"Note this will note not be included
evohnave (3) [Avatar] Offline
#6
Version 4, p.89, listing 3.81 has "labels_train" and "labels_test" but should be "train_labels" and "test_labels"
mythicalprogrammer (17) [Avatar] Offline
#7
Listing 3.59 code version 5

The code have a syntax error

average_mae_history = [
np.mean([x[i] for x in all_mae_histories]) for i in range(num_epochs))]

There is an extra ')' in range(num_epochs))
515375 (1) [Avatar] Offline
#8
I found this forum for the solution of most of my problems. I was willing to learn Python in a dynamic way online which I couldn't find for the time being. But this section is exclusively for python learners and geeks so this will surely help me learn this influential programming language. As an honor, I am the first Developer to start learning python from our web development company, as I am always craving to learn new n new things.
312278 (1) [Avatar] Offline
#9
The sample in 2.3.3 (Tensor dot) actually does not work. The shape of x is (64, 3, 32, 10) and y is (32, 10), but for tensor product, y needs to be transposed. So either
y = np.random.random((10, 32))
z = np.dot(x, y)

or
y = y.transpose()
z = np.dot(x,y)