The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

413020 (1) [Avatar] Offline
I am not even sure if the authors have even bothered to re-read what they have typed up as their first draft. I just finished Chapter 1, and here are some thoughts. Apologies for the critical nature, but as I pull my hair I feel it probably is not too inappropriate to share my feedback.

Concepts, eg regular expressions, are introduced without any explanation of what they are. There are references to tokens, n-grams etc that just magically appear with the assumption that the reader (who I assume is not already an expert in NLP when still on chapter 1) already knows what they are. A strange reference to combination locks starts, and just disappears after two paragraphs, with a weird promise that the reader will never look at locks the same way again.

There are references to finite state machines, automata without explaining what these are, and how are these related to NLP. There is a sidebar on the mathematical explanation of formal languages with bulletted sentences that appear to have no connection with each other - what is a 'regular language', what is 'context free', what does 'recursively enumerable' mean? If these concepts are important, they need at least a couple of sentences of explanation. If they are not important to subsequent chapters, what is the point of making the reader feel totally off balance? There is then another weird example of WWII code breaking that seems totally disconnected.

Then the bag-of-words and vectors starts, again, there are just way too many words that serve more to confuse than clarify. Wouldn't it be nice if these ideas were handled head-on, explained clearly with a couple of examples in half a page, rather than just dropping them over 6 pages and have the reader read them 6 times to stay with the authors' line of thought.

In the end, I get that all the authors are trying to say over 34 rambling pages is:
1. You can do NLP using pattern matching and if/else rules, but that is a brittle and a rather limiting approach
2. The better way is statistical analysis that requires text to be converted to numbers, represented as vectors.

It seems that the authors are just mumbling and speaking with themselves, not to a reader new to NLP.

If the rest of the book is as disorganized, well, god help me.
527411 (1) [Avatar] Offline
While I didn't have quite as negative a reaction to the first chapter as OP, I definitely see where they're coming from, and had similar thoughts at times myself while reading.

I'm at section 2.2 (Building your vocabulary with a tokenizer) and though I'm certain I'm learning a lot and enjoying it, I definitely find parts of it needlessly difficult and kind of meandering (when such difficult new material is being introduced, losing the through-line with anecdotes that are kind of arcane/confusing isn't very helpful). I'm also left wondering about the intended audience of the book, as it seems to assume relative familiarity with some foundational concepts in NLP and with data science packages for Python. (P.S. finishing the section in Appendix B on np.arrays and pd.Series and Dataframes would help a lot.)

Anyway, I've stopped reading at the dot product chapter because I'm feeling a little overwhelmed. In order to be able to resume with this NLP book, I've been revisiting Jake Vanderplas' book on Data Science in Python, and I'm generally finding his accessible, direct tone, eager to explain and unpack, a whole lot more approachable than what I'm reading here. But maybe that's by design. J.V.'s book is more an introduction and a handbook, this maybe more of a deep dive?

Still, though, I totally applaud the authors of this NLP book, because while it's certainly not easy, even for someone who's already pretty comfortable with Python, I'm confident that it contains nearly everything, or as much as could be reasonably expected in a sub ~800 page book, that I'd need to know about NLP in Python.