The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

rickumali (134) [Avatar] Offline
#1
I've decided to write a short blog-type post every week here on this forum. These posts are exclusive to this forum, and they will primarily be about the writing of the book.

One of the benefits of Git is that you can obtain the differences between commits. So how does this feature work with Microsoft Word? The short answer: it's somewhat OK.

Git for Windows (http://msysgit.github.io/) came with a piece of software called antiword. This program takes an MS Word document (with the ".doc" extension), and extracts all of its text. Git Bash then has a configuration such that any use of git diff on a Word document would first be converted by antiword. Git then takes the difference between the extracted text. This nifty piece of configuration prevents Git from saying that it cannot obtain the difference between two binary files, but unfortunately, Git cannot apply a diff to a binary Word document.

I found the configuration interesting, but over the course of the writing, I found myself not needing it. If I need to compare files, I resort to Word's "Compare Documents" tool. This allows me to compare any two Word documents. Further, Word's venerable "Track Changes" feature records each change to the document. These changes can then be inspected by anyone else. It's the equivalent of an editor's red pen!

antiword does have its uses, however. For one thing, with the extracted text I can use command line tools to do analysis (word count and frequency). Also, having the chapters as text makes it very easy to search with the grep command line tool across all my files (each chapter is a separate Word document, so globally searching the book isn't possible with Word). Finally: it's faster to open a text file than a Word document, so if I just want to reread something, I usually will check the text file.

There won't be a BLOG post next week, because of the Thanksgiving holiday (in the US). Thank You, everyone, for reading!