The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

Ravi Annaswamy (7) [Avatar] Offline
Until I saw the Chapter 11, I was getting very frustrated by the slow pace of the book delivery and also getting blank pages,
but reading the chapter made me very happy.

First of all, the author builds entire word 2 vector embedding using simple code (building upon his backprop implementation).
This is the first time I see such a straight forward code without use of Keras, Tensorflow, gensim frameworks. Trask's reputation
as a 'build-from-scratch-and-see-deep-for-yourself' is again proven.

Secondly, by applying it to two classic problems - one the sentiment classification and then the blank filling problem, he highlights the fact - that the embeddings are nothing but features of the words that help a particular purpose (what should I think when I see a word, in order to perform a task without error!). This grasp, you rarely realize or see described anywhere else.

Meaning of a word is just a collection of indicators/contexts/associations/implications that a word invokes, that are useful for a specific task. If you want an embedding that is general purpose for ALL tasks, you may not find one. (This fact was seen in Kaggle Spooky Author identification contest, where a 'custom' embedding performed better than glove embedding precisely because glove is a fill-in-the-blanks embedding where as the task needed a classification embedding.)

Depending on the 'prediction' task - whether it is sentiment classification or a 'nuanced' word prediction - different kinds, and depths of meaning will be learnt. This is such a clean, plain way to show it. Without the plain nuts and bolt implementation in numpy, demonstrating this would not be easy. Nice work Andrew.

Really looking forward to reading the next chapter on NLP (RNN and Seq2Seq and if we are lucky S2Swithattention smilie in your style of 'from scratch nothing by numpy' code and corresponding intuitions.
533674 (3) [Avatar] Offline
I agree! Were you able to get the imdb dataset used in the chapter? I can't seem to find exactly what it is he is using.
Ravi Annaswamy (7) [Avatar] Offline
I tried it on my own dataset, but let me find the imdb dataset and provide you the code in a day or two.

Ravi Annaswamy (7) [Avatar] Offline
The files he is mentioning are downloadable from this following place:

you can download and save the reviews.txt and labels.txt to your folders and code runs fine.

I had to change the following line:
tokens = map(lambda x:set(x.split(" ")),raw_reviews)


tokens = list(map(lambda x:set(x.split(" ")),raw_reviews))

I also modified the print statement so that it only prints once after each iteration instead of every 10 steps..

My code looks like this now:
for iter in range(iterations):
    # train on first 24,000
    for i in range(len(input_dataset)-1000):
        x,y = (input_dataset[i],target_dataset[i])
        layer_1 = sigmoid(np.sum(weights_0_1[x],axis=0)) #embed + sigmoid
        layer_2 = sigmoid(,weights_1_2)) # linear + softmax
        layer_2_delta = layer_2 - y # compare pred with truth
        layer_1_delta = #backprop
        weights_0_1[x] -= layer_1_delta * alpha
        weights_1_2 -= np.outer(layer_1,layer_2_delta) * alpha
        if(np.abs(layer_2_delta) < 0.5):
            correct += 1
        total += 1
    progress = str(i/float(len(input_dataset)))
                +' Progress:'+progress[2:4]\
                +'% Training Accuracy:'\
                + str(correct/float(total)) + '%')

So that I can match the exact output he got:

Iter:0 Progress:95.99% Training Accuracy:0.83325%
Iter:1 Progress:95.99% Training Accuracy:0.8665416666666667%
Test Accuracy:0.85
533674 (3) [Avatar] Offline
Awesome! Thanks so much!
madara (6) [Avatar] Offline
I have a doubt:
why computing the average of *deltas layer_2* with np.outer and not using dot product instead ? (before weights_1_2 updating) # line 11 of your code


Edit: a ok, sorry: it's a "vector * vector" product., not vector / matrix...that's ok