The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

285443 (3) [Avatar] Offline
Hi Jeff,

I think this is a very interesting topic, and I'm looking forward to reading the book as you write it. I know this is a first draft, but based on Ch1-3, I think there is a lot of room for improvement. I have chapter-wise feedback below, but here are the high-level problems with the book:

- Lack of extended/running examples. There is no continuity in the examples in the chapters, and significant effort is devoted to new "cute" scenarios. It's made worse by the fact that these new scenarios aren't better at clarifying the problem than the scenario in Ch1.

- Unsure about its difficulty level. Chapter 2 and 3 really suffer from this. Significant amount of pages is devoted to explaining Akka, Scala, Spark, distributed databases (both in general and couchbase in particular). I think you should just call out the features you want and have generous references ("we're using futures (refsmilie), and implicits (ref:y)".

- Very handwavy. I don't disagree with the points made in the book (e.g. "don't overwrite data- have a history", "your ML system shouldn't cause your webapp to fail", "use a distributed database so you have fault tolerance and throughput"). However, in almost every case (except Ch1), these facts are supported by contrived examples (e.g. Ch3, the strawman for mutable data store is a hash map behind a lock and not a database like postgres) or just stated as a fact.

Chapter 1
I really liked the discussed scenario because it's very realistic and made the problems quite clear. This might be asking for too much, but having sample code for us to look at for the strawman implementation would make some of the problems much easier to understand. For example, one of the problems mentioned is about buggy feature extraction bringing the site down. Wouldn't it be great if we could just curl some json and watch the service fail?

I also thought the "when not to use reactive ml" section was a bit weak because it basically implied that any non-prototype system should be reactive. That's a strong statement, and I'm not disputing it, but if that's the intention, some more explanation would be helpful.

Finally, in "Traits", I didn't understand what you mean by "possible worlds". Maybe this will be clearer in subsequent sections?

Chapter 2
My biggest problem is that the examples in this chapter don't really have anything to do with the scenarios outlined in the previous chapter. This means that pages have to be spent explaining the new scenario, and there's also not a lot of relevance/reinforcement to Ch1 scenarios.

The uncertainty section is also a bit confusing, because you use the same word "uncertainty" for three different issues:
- inherent uncertainty in measurement (e.g. as mentioned in Ch1, and later in Ch3)
- the absence or presence of a key in a map (in the Scala section)
- returning a degraded response (tail latency section)

The Scala section's pattern matching and map with Option seemed a bit out of place. It seemed like the goal of this section is the use of Futures to have async computation, timeouts, and control the tail latency, but the section could be organized better and these topics could be called out explicitly.

Akka section occasionally calls out the scenario is Ch1, which is great, but i wish that Ch1 had problem statements clearly written in section headers so that we could concretely refer to the problem being addressed. E.g. instead of just "this would have been a good next step for the developers of Pooch Predictor system in Chapter 1", you could say "this would solve problems mentioned in section 1.1.x, 1.1.y" etc.

I also think you could do a bit more handholding in the Akka code examples:
- what is a prop
- what is a OneForOne strategy?

Finally, I really wish you'd given an Akka solution to a problem in Ch1 instead of making a new problem.

Same with the Spark section. I think an example of feature extraction and model generation for a problem from Ch1 would be better.

Chapter 3
I didn't understand the "hairy data" part in the chapter. I also really wish we'd just discussed the data collection problem from Ch1.

I'm not sure I understood how the following scenarios are related to the concept of "Immutable Facts":
- asking the Vulture Corp to give you data with confidence bounds
- storing data in original form and not in a lossy transform (e.g. enough_prey vs "num_widebeests")

I feel like the contents of the data (whether or not it includes uncertainty values, whether or not a lossy transform) are orthogonal to how you decide to store the data (immutable fact log or just update the data in place).

I didn't like the "persisting data section". First, I was annoyed by an entirely new scenario (cheetah, pangolin etc), and I also found the animal characters to be confusing rather than clarifying (is a pangolin a thread that writes? a new system??). Second, I thought the mutable scenario was quite contrived (mutable hashmap behind a mutex), and didn't agree with its use to reach the conclusion that "distributing your workload doesn't scale if you use shared mutable state". You could have mutable storage in a distributed manner (basically any distributed database falls in this category).

The out of order issue you mentioned could also happen in a distributed immutable log setting. Suppose you ask for the "most recent K" values (as you do in a subsequent section). The definition of most recent depends on the order in which the immutable facts were written, and if they're written in a distributed setting, without explicit synchronization or timestamps, you could again get a different order depending on which distributed database was writing things faster.

I also felt like an inordinate amount of time was spent in Ch3 describing couchbase and its APIs, when we could instead have talked about the desirable characteristics (independent of whether we use couchbase): distributed (for throughput + fault tolerance), non-blocking client api for read/write, schema-free writes.

Applications: please just talk about the original scenario
Reactivities: very open ended, needs a more specific example

Sorry to be harsh on this book, but it's only because I think the topic is important and I think that you have some great insights to share that are just marred by some presentation issues.
Jeff Smith (14) [Avatar] Offline
Thanks for reading the initial draft and providing so much feedback so quickly.

A fair bit of your feedback is really something that I'd prefer Manning to respond to. In particular, things like avoiding running examples, the level of "reality" in an example, and the use of explanation vs. external references are really all book design decisions that lie in their hands, because such concerns fall within their area of expertise: helping readers learn.

Responding in more detail to some of the other items:

"Possible worlds" is a concept from the literature on uncertain data management, but I'm using it a somewhat looser sense here.

I agree that I am using the term uncertainty in a fairly broad sense.

I view pattern matching as a technique for managing uncertainty in type and futures as a technique for managing uncertainty in time.

The reference to hairy data was just a reference to complex, evolving data models, as shown in the several data model which evolve over the course of the chapter.

Correct, I am choosing to conflate the usage of immutable data and an uncertain data model into one example. They're driven by different but sometimes related factors. e.g. It's really bad to mutate or throw away uncertain data, because you'll never be able to reason about the uncertainty bounds of the lost data.

The out order issue that you describe is not relevant if there is only a single writer for a given sensor and a per sensor query. So, I think that the scenario that you describe is an interesting complication beyond my example.

Yes, the material on Couchbase was intended to primarily be an introduction to the desirable characteristics of a database.

Thanks again for all of the feedback. We'll definitely think more about these items in further revisions.

457791 (1) [Avatar] Offline
I am puzzled that how i can get a preview edition if i want to learn more about this fantastic book...