JWG (4) [Avatar] Offline
To me, the crux of word embedding is the use of word co-occurrence frequencies as a proxy for semantic relatedness. This is never brought up in the text. It's made clear that in the result, geometric proximity should approximate semantic relatedness, but no hint is given about how any semantic information gets into the process in the first place. When an uninitiated reader gets to "The Embedding layer is best understood as a dictionary..." he might be forgiven for wondering if this means that the semantic information is obtained from some standard digitized Merriam Webster or the Cyc project or ....

I would want this notion explained at or before the point where it says, "It is thus reasonable to learn a new embedding space with every new task." I am not suggesting that the text get into the algorithm for doing it (I personally would like that but you may feel it is out of scope). Just don't beg the question of how semantic distance gets into the picture at all.