jnyika (1) [Avatar] Offline
#1
I was reviewing the first page of the MEAP for your upcoming book and I was curious... will you in any way describe what subject areas, backgrounds, courses etc. would help a non data scientist prepare themselves to at least understand at a deeper level why they techniques you will discuss work...and also understand the boundary conditions and limits of the models etc..... ?

Your book appears to be very promising principally because it is focused on practical applications of data science techniques and to get the most out of it.. I would love to understand what I could review first to better prepare to extract the most from it.

Kind regards and I look forward to publication

Jim
nina.zumel (17) [Avatar] Offline
#2
There isn't currently a specific discussion on this, though we touch on it in the Preface (which hasn't been released to the MEAP, yet). For reading the book, I would recommend references to basic statistics/probability, R (R In Action is an excellent reference), perhaps basic linear algebra. Some experience with programming/scripting languages will help.

We will introduce the concepts from the above subjects as they are needed to make our points, so I would say have the reference books in hand as you are reading, rather than trying to read up on all the subjects first -- our goal is that reading our book will make reading the reference books easier, because our presentation will be more concrete than theirs.

To our other readers: Should we have a specific discussion of "pre-requisites" for the aspiring data scientist? And what would you recommend as "pre-requisites" or companion references to our book?
mcarloni (4) [Avatar] Offline
#3
I've read that R should be used in some instances, and Python for others. Perhaps a discussion of what R can/cannot do would be relevant.

A few companion references:

http://gigaom.com/2013/04/16/how-to-hire-data-scientists-and-get-hired-as-one/
http://tryr.codeschool.com/
http://www.twotorials.com/
http://www.microsiris.com/Statistical%20Decision%20Tree/
http://saedsayad.com/
john.mount (79) [Avatar] Offline
#4
That is a good point, we will have to do more to situate R and its strengths and weaknesses into a larger context.

Python is interesting (and getting more interesting especially with NumPy, SciPy and IPython workbooks). As you go forward you will have to use a lot of different tools. If your project requires a decent amount of programming you would want to look at Python as it is a very good programming language. If your project involves using standard methods on a data frame as we emphasizing you can start with R (though Pandas is extending Python's abilities there).

That is actually why the "with R" is in the book title. It is to try and not over-promise that we will have space to discuss all of the exciting tools. Some problems are beyond R's scope and the book's scope- but we hope the reasoning carries through.

Thanks for the very useful links (especially tryR). They will help us and others.
mcarloni (4) [Avatar] Offline
#5
You're welcome.

R can be accessed from Python via RPy, which may be useful for new R users with some Python experience.
http://rpy.sourceforge.net
ccc31807 (9) [Avatar] Offline
#6
> There isn't currently a specific discussion on this,
> though we touch on it in the Preface (which hasn't
> been released to the MEAP, yet). For reading the
> book, I would recommend references to basic
> statistics/probability, R (R In Action is an
> excellent reference), perhaps basic linear algebra.
> Some experience with programming/scripting languages
> will help.
>
> We will introduce the concepts from the above
> subjects as they are needed to make our points, so I
> would say have the reference books in hand as you are
> reading, rather than trying to read up on all the
> subjects first -- our goal is that reading our book
> will make reading the reference books easier, because
> our presentation will be more concrete than theirs.
>
> To our other readers: Should we have a specific
> discussion of "pre-requisites" for the aspiring data
> scientist? And what would you recommend as
> "pre-requisites" or companion references to our book?

Similar topics include statistics, data analysis, programming, R tutorials, business intelligence, and even artificial intelligence, machine learning, and information architecture. To me, 'practical data science' implies getting knowledge or information from data AND communicating that knowledge or information to end users.

I'm not sure that a list of narrowly targeted references would do much good. Rather, I think that a wider coverage on related topics would be best. A good example of this kind of approach is the end of chapter references in Philipp Janert's book an data analysis.

Still, I think that a focus on R from start to finish (from a programming perspective) is very important, with coverage of related technologies such as Perl, Python, Java, and .NET.

CC.
john.mount (79) [Avatar] Offline
#7
We definitely suggest moving beyond R as you work, we just needed to limit the scope of the book.