Susan Harkins

Errata Editor]]>

After this post links for epub and mobi magically appeared. Also all MEAPs I have had epub and mobi for download.]]>

Why you replaced Java with Python?

Any technical reason or it is just a new version for Python developers?

Thanks in advance.]]>

Firstly, thanks for your interest in this book!

I did take a closer look at this, and I think you are absolutely right. There is an error here that has carried over from the first edition.

However, there is also a misunderstanding in the way you have used the general form of the naive similarity metric.

Beta, is a hyperparameter. In general, we graph this to show how the hyperparameter can impact the similarity score for difference values of distance. We only really use beta=1 in the text however.

So, in the first example

1/(1+sqrt(3^2)) = 1/4 = 0.25

in the second example (note how I haven't changed beta here)

1/(1+sqrt(1^2+1^2+1^2)) = 1/(1+sqrt(3)) = 1/2.7 = 0.37, which is obviously not the same.

In general, we are trying to illustrate the a small difference in rating for many movies in common should probably not return the same similarity score as a larger difference in rating for fewer movies in common.

Actually, we can still demonstrate this, if we say:

[quote]If the two users had watched three movies, and among these three movies their ratings differed by square root three [/quote]

In this case

1/(1+sqrt(3*(sqrt(3)^2)))=1/(1+3)=1/4=0.25

Which is the same.

Thanks for flagging this.

All the best,

Doug

]]>

Firstly, thanks for your interest in the book!

In answer to your question, this is done to obtain an average distance per common item.

Before this line, [b]sim[/b] contains the sum of the squared distances between common items. The tendency will be for this number to be larger, if there are lots of movies in common, regardless of how alike the users are in their preferences.

To rectify this, we want to normalise to the total number of movies in common. This will give us the average square distance per movie in common. The final step, takes the square root to get the average distance per movie. The variable [b]sim[/b] doesn't really become a similarity until the final step, when [b]tanh[/b] is used:

[code]sim = 1.0 - math.tanh(sim)[/code]

This is explained here as follows in the book:

..."letâ€™s look at the first

(default) similarity definition between two users as shown in listing 3.2, where

sim_type=0. If the users have some movies in common, we divide the sum of their

squared differences by the number of common movies, take the positive square root,

and pass on that value to a special function. This function is called the hyperbolic tangent

function. We subtract the value of the hyperbolic tangent from 1.0, so that the final

value of similarity ranges between 0.0 and 1.0, with 0.0 implying dissimilarity and 1.0

implying the highest similarity."

Hopefully this makes things clearer.

All the best,

Doug

]]>

All the best,

Doug]]>

Firstly, many thanks for your interest in this book!

We decided to rewrite AIW using Python for two reasons. Firstly, Python is rapidly becoming the language of choice for many data scientists. Consequently, it will be beneficial for the reader to gain exposure within the pages of this book. Secondly Python is supported by several excellent machine learning libraries. We choose scikit-learn for AIW.

For reasons why you might consider scikit-learn:

http://radar.oreilly.com/2013/12/six-reasons-why-i-recommend-scikit-learn.html

http://daoudclarke.github.io/machine%20learning%20in%20practice/2013/09/18/why-i-love-scikit-learn/

Both articles written in late 2013, but equally relevant today.

Happy reading!

Doug]]>

It looks like the chapters have been distributed *without* the data files. I'm looking in to getting these uploaded to here: http://www.manning.com/mcilwraith/

Please check back periodically.

All the best,

Doug]]>

Many thanks for your feedback.

I agree, this is difficult to read. I'll work with the publication team to ensure that this is improved in subsequent MEAP releases.

All the best,

Doug]]>

The Manning Early Access Program (MEAP) exists to get content into the hands of readers as soon as is possible. Consequently, this content is often in a rough 'draft' format which contains various errors and typos that have been overlooked.

I can confirm that these errors will be corrected by the time of final release.

I do hope this does not impact your enjoyment of the MEAP release too much.

All the best,

Doug

]]>