Will you please put some or all of the ideas in Henning, M. (2007). API Design Matters. ACM Queue, 5(4), 24-36 into The Design of Everyday APIs. I could not finish Irresistible APIs and promised never to look at it again because I could not find any of the classical ideas about API design in it, and, of course, no amount of pounding on my desk, the keyboard or the computer could make the examples work.

API design is important to computer programmers because they deal with it every day in their work, using the C++ Standard Library, the Java Documentation, Mathematica, old LISP manuals, The Collected Algorithms of the ACM, etc., etc., etc. It is also important that APIs be maintainable and maintained. You cannot just throw new routines at a library without any thought of consistency of presentation, organization, encouraging use, and people finding them again. There is no point in proselytizing code reuse if programmers can't find the stuff again and the documentation is not clear, concise, and complete.

I any case, I wish you the best of luck with your book. Speed of completing the work may not be important. What may be most important is that it be a great book that you and the rest of the world treasure for a very long time.
It would be better if at least one line in the Curl examples in section 2.4.2 could be made to work.
Real-world Machine Learning is a great book, and I have really enjoyed reading it. The best part of the book is the attention to detail. For example, when machine learning is used as an adjective, it is hyphenated; when it is used as a noun, it is not hyphenated. Few authors today take the time to get that detail right. However, the day might not be complete without one small demur.

When a binary categorical variable, such as gender, is turned into two columns, such as male and female, both Excel Regression and Mathematica LinerModelFit blow up because the two columns are not linearly independent. This is also true of the AutoMPG dataset if region is made three columns instead of two. Every book I have ever read about regression analysis says this is true: a binary categorical variable should be one column of zeros or ones, if for no other reason than then the partial derivative of the regression, which is the column's coefficient, then makes sense. If there are three columns for region, then what does partial of the regression WRT to, say, Europe, mean, the change in MPG WRT a car that is made nowhere?

I don't understand how the software used for the book completes an analyses where gender is two columns in the Titanic dataset or region is three columns in the AutoMPG dataset. What meaning then do you assign to the coefficient of male or female or to the coefficient of Asia, Europe, or America?

Thanks again for the wonderful book.
I am trying to complete the exercises at the end of Chapter 2; I need the HelloWorldFigaro example as the starting point, as per the first exercise. I have unzipped the PPP_SourceCode14.zip file. According to Appendix A, all I have to do is to navigate to the PracticalProbProg/examples directory and type:

sbt console
sbt "runMain chap01.HelloWorldFigaro"

I have typed every conceivable combination of the above characters and all I see are errors. I have been at this for over an hour and a half with no results. Here is the output:

17:26 Fri 02/12 E:\>cd E:\Development\PPP_SourceCode14\PracticalProbProg\examples
17:26 Fri 02/12 E:\Development\PPP_SourceCode14\PracticalProbProg\examples>sbt console
Java HotSpot(TM) Client VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
[info] Loading project definition from E:\Development\PPP_SourceCode14\PracticalProbProg\examples\project
[info] Updating {file:/E:/Development/PPP_SourceCode14/PracticalProbProg/examples/project/}examples-build...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Compiling 1 Scala source to E:\Development\PPP_SourceCode14\PracticalProbProg\examples\project\target\scala-2.10\sbt-0.13\classes...
[info] Set current project to Examples (in build file:/E:/Development/PPP_SourceCode14/PracticalProbProg/examples/)
[info] Updating {file:/E:/Development/PPP_SourceCode14/PracticalProbProg/examples/}Examples...
[info] Resolving jline#jline;2.12.1 ...
[info] Done updating.
[info] Compiling 36 Scala sources and 1 Java source to E:\Development\PPP_SourceCode14\PracticalProbProg\examples\target\scala-2.11\classes...
[info] Starting scala interpreter...
Welcome to Scala version 2.11.6 (Java HotSpot(TM) Client VM, Java 1.8.0_66).
Type in expressions to have them evaluated.
Type :help for more information.

scala> sbt "runMain chap01.HelloWorldFigaro"
<console>:1: error: ';' expected but string literal found.
sbt "runMain chap01.HelloWorldFigaro"

I have typed this with quotes, without quotes, with package name, without package name, etc., etc., etc., etc., etc.....
Nothing works; there are nothing but errors.
I also tried all of this with PPP_SourceCode13.zip. Same result; nothing but errors.

What is the problem? Is there some alternative way to get at the HelloWorld example?

Around location 1009 in the Kindle edition, the book says "The difference between Chain and Apply is that apply returns ordinary scala values, but the function in Chain returns an element, ... ."

But at location 1039 it says, "First you use Apply to package the argument elements into a single element whose value is a tuple of values of the arguments."

According to the documentation, the abstract base class Apply "(r)eturns a list of arguments on which the element depends".

So, what does Apply actually return, an Element, a scala value, or a list of scala values?
This example does not work; temperature is undefined:

import com.cra.figaro.algorithm.sampling.Importance
def greaterThan50(d: Double) = d > 50
println(Importance.probability(temperature, greaterThan50 _))

Avi Pfeffer (0101-01-01 00:00:00+00:00). Practical Probabilistic Programming MEAP v11 (Kindle Locations 900-902). Manning Publications Co.. Kindle Edition.
Figure 1.2 would make more sense if the top part were labeled "Non-referential (no interface with outside world)".
I appreciate your book; I have finished it now and have at least some idea how big data works. However, I do wish you had read my note with the same attention to detail that I paid your book.

First, as I mentioned in my note, the problem that prevents the book's sample code from running is in the Windows-specific code. The code can create directories under Windows, but throws an exception when it tries to write to them. Like you write, the sample code uses a very old version of Hadoop, and it has been only in very recent versions of Hadoop that it works under Windows.

Second, the sample code does not work on Windows; I don't think it ever worked on Windows, or was even tried on Windows. So, like I said, I would have appreciated a copy that worked in my environment.
Thanks to the authors for such a helpful book on Java 8, lambdas, and functional programming. I am almost finished reading it and have enjoyed all my time with it. I particularly liked Ch 12; I had spent many weeks studying the new time and date classes API documentation and thought I knew it all, but Ch 12 made several points I had completely missed. Thanks again.

I wrote a driver for the SubsetsMain example to see how large a set I could compute the number of subsets of and if immutable objects and the garbage collection they engender would scale up to larger problems. With 32GB of memory in the computer I was only able to compute the subsets of a set of size 25 before there was a problem. FWIIW, I used the bad example of concat, put an l.clear() line after the copyList.addAll(l); line in insertAll, and cleared subs in the driver with the lines for(List<Integer> l : subs) l.clear(); subs.clear(); to make SubsetsMain find all the subsets of a set of size 28. The final result was about 3 times faster than the initial attempt: about 5.92 secs vs. 15.22 secs to find the subsets of a set of size 25. The results were (Core i7 3820):

{# Subsets, Input Size, Time/Subset (s), TotalTime (s)},
{2, 1, 0.000847778, 0.00169556},
{4, 2, 6.826*10-6, 0.000027307},
{8, 3, 3.128*10-6, 0.000025031},
{16, 4, 3.537*10-6, 0.000056604},
{32, 5, 1.591*10-6, 0.000050915},
{64, 6, 1.084*10-6, 0.000069403},
{128, 7, 8.26*10-7, 0.000105813},
{256, 8, 6.25*10-7, 0.00016014},
{512, 9, 4.72*10-7, 0.00024206},
{1024, 10, 4.47*10-7, 0.000457951},
{2048, 11, 4.09*10-7, 0.000839388},
{4096, 12, 4.01*10-7, 0.00164265},
{8192, 13, 3.14*10-7, 0.00257903},
{16384, 14, 1.92*10-7, 0.00315361},
{32768, 15, 1.29*10-7, 0.004247},
{65536, 16, 1.23*10-7, 0.00810858},
{131072, 17, 9.4*10-8, 0.0124025},
{262144, 18, 7.*10-8, 0.0183653},
{524288, 19, 6.9*10-8, 0.0364506},
{1048576, 20, 6.2*10-8, 0.0655527},
{2097152, 21, 1.32*10-7, 0.277843},
{4194304, 22, 1.2*10-7, 0.506426},
{8388608, 23, 1.26*10-7, 1.06054},
{16777216, 24, 1.47*10-7, 2.47489},
{33554432, 25, 1.76*10-7, 5.93673},
{67108864, 26, 3.95*10-7, 26.5249},
{134217728, 27, 1.56*10-7, 20.9477},
{268435456, 28, 4.21*10-7, 113.09}}

Just to be a slightly annoying former colonial, myNextWorkingDay TemporalAdjuster in the DateTimeExamples (Ch 12) was almost twice as fast as yours. Here is the code:

private static class myNextWorkingDay implements TemporalAdjuster{
public Temporal adjustInto(Temporal temporal){
temporal = temporal.plus(1, ChronoUnit.DAYS);
}while( DayOfWeek.of(temporal.get(ChronoField.DAY_OF_WEEK)) == DayOfWeek.SATURDAY ||
DayOfWeek.of(temporal.get(ChronoField.DAY_OF_WEEK)) == DayOfWeek.SUNDAY);
return temporal;

I once read a Java consultant's blog in which he wrote something like, If you want performance in Java, never save anything to memory. That was good advice, and I have treasured it ever since I saw it. It does impact readability, though.

Again, thanks for your very useful book and for all your hard work in researching and writing it.

Charles Elliott
Thank you for your reply. I did download the code as you suggested, and I chose, eventually, to import it into Eclipse using the Maven option. The Eclipse Maven plug-in is installed. O/S is Win 8.1, Prof., 64-bit. I wrote a driver that instantiates BatchWorkflow and calls BatchWorkflow.initTestData().

I have spent about 4 days on this. There are two problems: First, there are multiple references to log4j in pom.xml, so when the code needs to call log4j, it cannot decide which instance to load. The problem was "solved" by misspelling one of the references to log4j in the pom, but then it loads the NOP logger, so there is no logging at all. The second problem is a null pointer exception at org.apache.hadoop.fs.Path.<init>(Path.java:61). I downloaded two copies of Hadoop (2.6.0 & 2.7.0), and neither has a line 61 in org.apache.hadoop.fs.Path labeling code. Line 61 is, however, in a section of code that is trying to interpret "X:\\" as a part of a file reference. The strange part of this is that part of the BatchWorkflow.initTestData() code is working because it does create the directory tree:


However, BatchWorkflow.initTestData() apparently fails when it goes to write the data.

Whoever wrote the code in BatchWorkflow.initTestData() must never have tried it, even in a Linux environment, because the logging problem must have always been there.

A working copy of big-data-code-master_Final.zip would be greatly appreciated.

Charles Elliott
I cannot answer your question authoritatively; in fact, I probably should not be trying. However, my plan is to create a main class in the same directory as E:\Development\Big-Data\src\java\manning\batchlayer\BatchWorkflow.java, and then just run through the code snippets in Chapter 9, one by one by instantiating the class BatchWorkflow and then calling the methods in it. I would call initTestData() first, then create some more test data for the "ingest" example code.

The reason I am writing is, while I did get the project to compile with maven, I don't know how to import the project into Eclipse. Can you tell me how you did that?

Thanks in advance for any help you may care to provide.

Charles Elliott
I downloaded and installed SBT into Windows 8.1. I typed in sbt console, and it downloaded a bunch of stuff. Then I typed in "runMain appendixA/Test" and sbt appeared to define a resource string. Then I typed in runMain appendixA/Test or runMain appendixA/Test null or runMain chap01/HelloWorld and all I saw was

<console>:8: warning: postfix operator HelloWorld should be enabled
by making the implicit value scala.language.postfixOps visible.
This can be achieved by adding the import clause 'import scala.language.postfixOps'
or by setting the compiler option -language:postfixOps.
See the Scala docs for value scala.language.postfixOps for a discussion
why the feature should be explicitly enabled.
runMain chap01/HelloWorld

I can write programs in C, C++, Pascal, Lisp, VisualBasic, C#, Java, and Perl, but the above is Greek to me. Even if I knew where to find out what it was talking about, there is no way I want to pour thru pages of documentation just to read your book and work the examples.

I spent at least 4 hours trying to make Eclipse-Scala work. It emitted nothing but errors; there is nothing to show for all that time. I am used to Eclipse; I have written hundreds of Java programs on Eclipse. It would be so nice I if I could make it work with Scala and Figaro, but nothing. Eclipse-Scala is all new; the Eclipse is the latest version, the Scala is the latest version. But it will not do anything. There is no up-to-date documentation on the website. I know you don't want to support Eclipse-Scala, but you have to give us something to work with, and a working Eclipse would be a real treat.

I did make Test.scala work with the command line scala Test.scala. It produced "1.0." Wow!

I can't figure out if you are trying to be annoying or just don't care, but whichever it is, you are very good at it. I am sure you have no time, but either do your readers. All I want to do is learn practical probabilistic programming for a project I am working on. I can't spend hours and hours just trying to make the examples in the book work. It just is not practical.