staran (8) [Avatar] Offline
#1
Enjoyed reading this book as it covers topics not in R in Action.

In the Linear Regression chapter, I see that you took a log of the dependent variable, income. My understanding was that the transformation was only necessary for the predictor/independent variables to ensure the normality assumptions required for linear regression were not violated. I would appreciate if you could please let me know if my understanding is not correct and also would like to understand the motivation of why you used the log transformation.

Thanks in advance
-STaran
john.mount (79) [Avatar] Offline
#2
Re: Log Transformation
Thanks for your comment!

This is a tricky subject. What assumptions you need depends on what modeling framework you are using. We ended up sharing a slightly mixed point of view in the book (frankly we are sympathetic to the Bayesian view which suggests transforms, but we didn't want to go to full Bayesian model). We try to clarify this a bit in our errata: http://winvector.github.io/PDSwR/PracticalDataScienceWithRErrata.html

Overall things are a bit more operational than some would like: a method is good if it helps with the data and problems you have at hand (though obviously you need to choose from principled methods).

If you are using the Gauss-Markov theorem to justify linear regression you need only assume the facts about the errors (that the are uncorrelated and of same magnitude see: http://www.win-vector.com/blog/2014/08/reading-the-gauss-markov-theorem/ ). If you are using a Bayesian/generative derivation you may want to assume some distributional facts about the x's and perhaps the y's. The "assuming normality" is along those lines- but not strictly what is traditionally taught in statistics.

See Andrew Gelman for some good ideas on regression: http://andrewgelman.com/2013/08/04/19470/ (from a Bayesian point of view, the frequentists pride themselves on working from weaker assumptions).

Among the most convincing reasons to log-transform are:
1) To fix structural assumptions. For problems like income, wealth and hedonic regression it is plausible each factor may contribute a relative change in expectation (air conditioning may be valued as adding 10% to the value of a car, even though its cost may be in dollars). So it is natural to model y ~ product_i (x_i)^(b_i) or log(y) ~ sum_i b_i * log(x_i) (notice we transformed both x's and y's here, not needed for categorical variables) or even log(y) ~ sum_i b_i x_i (only y's transformed).
2) To fix domain issues like y being non-negative.
3) To compress range. y varying over several orders of magnitude- would mean only a few very large y would dominate the fit. So if log(y) has a more reasonable range you are using more of your data (though you have changed the error model).

Also be aware: we are mostly using regression for prediction (estimating new unseen y's) not inference (what we called extracting advice, estimating the betas or coefficients). The requirements/standards are lower when making predictions than when inferring parameters. See http://www.win-vector.com/blog/2014/04/what-is-meant-by-regression-modeling/ for a bit of discussion on this. For more transforms see also the book or http://www.win-vector.com/blog/2012/03/modeling-trick-the-signed-pseudo-logarithm/
staran (8) [Avatar] Offline
#3
Re: Log Transformation
John,
Thanks for your reply. Frankly, I still need to get my arms around the Bayesian model and will stick to the frequentist inference approach using confidence intervals.

On the topic of log transformations, I get that we could very easily be using log transforms for either the dependent or the independent variables. One other side question around interpretation of coefficients. I would really appreciate if you can you explain how the interpretations are derived. Sometime we use % and sometimes we use units. Maybe, I need to brush up my high school maths again.

Y=B0 + B1*ln(X) + u ---> A 1% change in X is associated with a change in Y of 0.01*B1

ln(Y)=B0 + B1*X + u ---> A change in X by one unit (∆X=1) is associated with a 100*B1% change in Y


ln(Y)=B0 + B1*ln(X) + u ---> A 1% change in X is associated with a B1% change in Y, so B1 is the elasticity of Y with respect to X.

Thanks.
STaran
staran (8) [Avatar] Offline
#4
Re: Log Transformation
Never mind. I found an excellent video explanation at http://www.cazaar.com/ta/econ113/interpreting-beta

Thanks.
-STaran
nina.zumel (15) [Avatar] Offline
#5
Re: Log Transformation
Generally speaking, you can interpret the changes like this:

if y0 = b0 + b1x + u, then a change in x by one unit means:

y1= b0 + b1(x+1) + u = b0 + b1x + b1 +u = y0 + b1

So a unit change in x produces a change in y of b1, *if all other variable values stay fixed*.

With respect to the cases you’ve listed:

1) If we also assume that we are taking the logarithm as “log to the base 10”, rather than natural logarithm (and we mostly use log10 in the book),
then a unit increase in log10(x) —> log10(x) + 1 corresponds to multiplying x by 10, in “x-space”

so y1 = b0 + b1*(log10(x) + 1) + u = y0 + b1 —> multiplying x by 10 increases y by b1

2) log10(y1) = log10(y0) + b1 (as above)
so
y1 = y0 *10^b1 —> a unit change in x increases y by a multiplicative factor of 10^b1

3) Multiplying x by 10 increases y by a multiplicative factor of 10^b1

If you are using natural log, then substitute e for 10 in the statements above.

Hope this helps,
Nina