Tryggvi Björgvinsson (3) [Avatar] Offline
#1
Hi all,

I'm just emerging from a pretty intense reorganization of the book. You'll be seeing it in the upcoming MEAP update. Here are a few of the big changes I've been working on:

Move chapters around - I've decided against structuring my book around the data project lifecycle. I think, based on reviews, that it caused more confusion than help. Some of the reviews indicated that the chapters were seen as discussions about specific quality attributes. I wanted that discussion to be just a way to learn a lesson via an attribute. Instead I think the structure made it seem like the quality attribute was the real lesson. So each chapter is now supposed to highlight the lesson and chapters have been moved around so that the lessons hang logically together. Also the book should now consist of two parts: Foundations and Tips. The first part (chapter 1-7) represent the foundations of the data quality work. The second part (chapters 8-16) show tips, tricks, gotchas, and good-to-knows when working with data usability.

Two new previously unplanned chapters - I decided to add two new chapters to the book as I had originally set it up. The first of the two (chapter 3 in the MEAP update) discusses the data project life cycle and provides some common quality attributes for each of the steps. The second (chapter 6 in the MEAP update) discusses methodologies and how you can organize your usability work.

Docker - I briefly touched upon this idea in a reply to another post on the forum and decided to go ahead and just do it. All of the code in the book can now be run as Docker containers. This has two major benefits: I can focus on the lesson in the chapter instead of how to write the code. The code is actually just a means to an end (as it always is), not the real lesson and with Docker I can hide environmental setup code so I can focus on the important parts in the lesson. The other benefit is that I can use more real world examples. Instead of setting up availability by serving from two directories in the file system, I can just as easily (for you the readers) use a real world database replication and monitor that. The book now uses a real monitoring solution (Icinga2) instead of a toy example so it should be more apparent how to apply the lessons in real world scenarios. I hope this is for the better but there is one big drawback: It becomes harder to make exercises in each of the chapters so many exercises have been removed.

TL;DR - I have heard that some readers thing the book discusses trivial lessons: “I know this, why teach it?” That’s a problem for a book that combines a couple of ares like The Art of Data Usability does (quality, monitoring, coding). People know different things, so of course some people will know a lot about one or more of the topics, like that you can use warnings in monitoring solutions, but others may not. To serve all reader groups, all chapters (except the first one) now include a new section called “TL;DR” (stands for Too long; Didn’t Read) that describe a situation or a problem that may come up and how the lesson of that chapter can solve it. In doing so it also describes the lesson briefly. Based on the information in that section, readers can now choose if they want to read them or if they already know the lesson of the chapter (and can therefore skip it).

Data packages - The previous version of the book used CSV on the Web for validation of a data file. It's a specification created by a W3C working group. In the MEAP update I've swapped CSV on the Web out for Data packages. They are specifications from a project called Frictionless data (disclaimer: I used to work on that project). The reason for the move is not because I worked on it (in fact the reason I originally chose CSV on the Web was because I hadn't worked on it, and it was a similar project). The reason is that the tooling around Frictionless data is more complete which means I can focus more on the lesson in the chapter instead of writing code from scratch to validate according to a subset of a schema. One other reason for it is that I was not very happy about W3C when they decided to embrace Encrypted Media Extensions (DRM on the web). I do not like DRM, both as a consumer and an author). This is one of the reasons I love Manning, they do not use DRM so they put the same trust in you (readers) as I do (it should be a no-brainer to not use DRM). So the move to data packages is both a practical (better tooling) and political (in my mind I'm punishing W3C as an organisation -- not the great people that worked on CSV on the Web).

Other changes have also been made, such as using more images, edits to the text etc. but those are in general the biggest changes. I'm pretty excited about this MEAP update. The last months have been intense and this is the outcome. I really hope you like it.