How do you change y axis numbers for the density without tampering them for the data?

Thanks in advance,

Chris]]>

#install.packages(filepath, repos = NULL, type="source")

My only concern is if the method to perform the post hoc comparisons is the most appropriate. After, reading some post it seems that there are many methods to correct for inflated alpha error. Hope this helps.]]>

age --> DOB ?]]>

I am having trouble with Singular Value Decomposition where some of the values are NA in my matrix.

items

users i1 i2 i3 i4 i5 i6 i7 i8 i9 i10

u1 4 0 NA 2 NA NA 4 NA NA 4

u2 2 0 NA NA NA NA NA NA 1 NA

u3 NA 5 0 3 2 3 5 NA NA NA

u4 NA 5 3 0 2 1 NA 3 3 NA

u5 NA NA NA 0 NA 2 NA NA 3 0

m <- as.matrix(sample(c(NA,0:5),50, replace=TRUE, prob=c(.5,rep(.5/6,6))),nrow=5, ncol=10, dimnames = list(users=paste('u', 1:5, sep=''),items=paste('i', 1:10, sep='')))

svd(m)

I am getting the following error.

Error in svd(m) : infinite or missing values in 'x'. I would appreciate if you would kindly let me know if there is any solution for this.

Kind Regards,

-STaran]]>

When I run following code on page-361, I get error:

> y <- x[which(sd(x) > 0)]

Error in is.data.frame(x) :

(list) object cannot be coerced to type 'double'

Its same in R-code "RinA CH15 Code.txt"

Please suggest a solution.]]>

name <- strsplit((roster$Student), " ")

Error in strsplit((roster$Student), " ") : non-character argument

I suspect that there is a typo, if not, why the parentheses around roster$Student? I'm not quite following.

Thanks]]>

> x <- mtcars$wt

> x <- x/sum(x)*100

> x

Charles Kangai]]>

I have tried coercing the first argument into a data frame since the error messages suggests it is being passed a list, and still it does not work. i.e. the following does not work

by(as.data.frame(mtcars[vars]), mtcars$am, dstats)

The problem is with the by function. If you replace dstats with mean or sd, it still does not work. It only works with the summary function. I wonder how the author got the output in the book?

Thanks in advance.

Charles Kangai (Bristol, UK)]]>

I am working with the Airline data. I have read the data by year into a list that has the index of the years '1987'=data, '1988'=data, ....

So the data has several columns. And I can get to a column using:

sapply(data, '[', columnName). Now if columnName is 'dayOfMonth' and ...

If assign this to a variable as: x <- sapply(data, '[[', 'dayOfMonth')

The print (y)

I get a the list index and then the elements.

$`1987.dayOfMonth`

[1] 14 15 17 18 19 21 22 ...

Now I am trying to actually apply a factor to this column as in:

y <- factor(x)

However just indicating x - this does not take me to the list elements.

Instead I have to do: y <- factor(sapply(x, '[[', 1))

I cannot seem to do it as part of a for loop where I am going through all the years and applying a factor to dayOfMonth as I can never get to the list elements themselves.

I can get to the list elements to apply factor if I say y[[1]]$'1987.dayOfMonth'

But again this is not possible as I am doing this as part of a loop.

What is the best way to get to the elements directly so I can apply a factor to them. The code is getting too convoluted as I am having to do too many things to get to the factor when I think that there should be a way to do this more simply.

Also the data is part of an object which has a list that stores the years so I am already 3 levels in with the first sapply.

I hope this narrative is not too confusing - but just been quite stuck at this because every-time I try to apply the factor it says:

Error in sort.list(y) : 'x' must be atomic for 'sort.list'

Have you called 'sort' on a list?

Thank you again for your book and help in this matter.]]>

library(RODBC)

> channel <-odbcConnectExcel("test.xls")

> mydataframe <- sqlFetch(channel,"Sheet1")

but I receive the folowing error:

Sheet1: table not found on channel

The excel worksheet is definitely called "Sheet1". and the excel file is called test.xls. I have no idea why this is not working]]>

One solution is to use cbind.data.frame(treatment, improved, Freq) instead of as.data.frame(cbind(treatment, improved, Freq))]]>

The package you recommend downloaded its contents and I tried again, but still could not get ggm to yield the pcor function.When I input library(ggm), the response was

Loading required package: gRbase

Loading required package: graph

Error: package graph could not be loaded

In addition: Warning messages:

1: package ggm was built under R version 2.15.1

2: package gRbase was built under R version 2.15.1

3: In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc) :

there is no package called graph

And then the error message to the pcor command was as before

On the R Progamming Group on LinkedIn,, there's a suggestion that ggm doesn't work on 2.15.1: 'it needs RBGL, which is no longer available'.

Any other way I can get the partial correlation ?]]>

Initially, I believe I'd provided sufficient environment variable definition, etc., that R could find the file and open it. However, I don't believe I'd ever actually set the R_PROFILE environment variable in [i]in R as part of R startup[/i].

Eventually, I used R_ENVIRON in Windows XP to point to an Renviron.site file, and within that file, set R_PROFILE to the complete path and filename of Rprofile.site. This resolved the problem and the Rprofile.site executes as expected.]]>

I'm involved in teaching R and would like to have the option of distributing selected chapters from R in Action to my classes. Is it possible to purchase individual chapters? I have the pdf so can create the chapters myself.

Kind regards

Tony]]>

treatment <- rep(c("Placebo", "Treated"), each=6)

improved <- rep(c("None", "Some", "Marked"), times=2,each=2)

sex <-rep(c("female","male"),times=6)

Freq <- c(19,10,7,0,6,1,6,7,5,2,16,5)

mytable <- as.data.frame(cbind(treatment, improved,sex, Freq))

When I now use the table2flat function, I get a data frame mydata with 69 observations (rows). But I would expect 84 cases (rows). When I try the same with the data in table 7.2, I get 20 rows in mydata. Also here I would expect 84 rows.

Consequently I get different statistics from page 157 (bottom) when I use the assocstats(mytable) function:

mytable2 <-xtabs(~treatment+improved, data=mytable)

assocstats(mytable2):

assocstats(mytable2)

X^2 df P(> X^2)

Likelihood Ratio 1.2900 2 0.52467

Pearson 1.2749 2 0.52864

Phi-Coefficient : 0.136

Contingency Coeff.: 0.135

Cramer's V : 0.136

Is there a mistake in my thinking? Do I something wrong?]]>

Thankyou for an excellent book. I've had it since early in MEAP and it has been extremely useful.

In section 13.2.2, there is a 'trick' where all of the variables are set to their means, except the variable of interest. I am wondering if you could suggest an analogous method I could use if I have predictors that are factors (since I can't use 'mean' on them).

Thanks,

Dan]]>

Rob]]>

Sure, knowing the rules solves the problems of dealing with a new language - but the suggestion I just made would put R syntax/functionality more in line with conventional PLs.

In java, new int[10] creates an array, a 10-element array. The elements of the array are of type integer, the array is not of type integer, it is of type int[].

All you've said and my experiments are helping my to deal with (accept and learn) what I feel are are a few counter-intuitive aspects of R.

- Bob]]>

I tried to import a data file which contains a) completely blank rows and b) rows with only 1 value instead of 42 (... and I would like to ignore this line as well).

How can I handle this problem? I realized that the read.table function has the option blank.lines.skip. However, how can I handle problem b)?!

I work with RStudio and already realized, that some details seem to be different than in the "standard" R software package. For example RStudio doesn't seem to recognize the edit() and fix() function. I copied the whole library from R to RStudio, so I don't think that I miss a necessary library. Do you know how I can handle this issue?

Thanks a lot for your help in advance!

Best regards,

Rafi

Message was edited by:

Neurdon]]>

I get the following using 12.13.1 on Mac OS 10.7.1

Note n is not rounded to the whole number 34.

> pwr.t.test(n = , d = 0.8, sig.level = 0.05, power = 0.9, alternative = "two.sided")

Two-sample t test power calculation

n = 33.82554

d = 0.8

sig.level = 0.05

power = 0.9

alternative = two.sided

NOTE: n is number in *each* group

Message was edited by: Mark Sharp

rmsharp]]>

Regards and condolences ...]]>

Rob]]>

Thanks for your responses.

Regards,

Sanjeev]]>

Let's say that you are predicting Y from X1, X2, X3, X4, and X5.

fit <- lm(Y ~ X1+X2+X3+X4+X5, data=mydata)

library(car)

vif(fit)

If the sqare root of the VIF for any variable is greater than 2, you probably have a multicolinearity issue.

sqrt(vif(fit)) > 2

See page 198 of the book for more details.

Hope this helps.

Rob]]>

That worked out really well. Thank you! It even returned the row numbers as column headers which exactly what I needed. The apply() family of functions is what I needed.

----------

I got a chance to look up the apply family of functions and also in your book. Which brings me to the next question - and last in this sequence.

I would like to create a user-defined function that applies a certain formula across the columns. In sec 5.5 in your book and also other places I found user-defined functions that apply in a similar way but they are not column specific.

----------

For example if I have a dataSet with columns c1, c2, c3. I would like to return sum of columns as:

ComputeStats <- function(x, na.omit=TRUE) {

voteTotal <- (sum(1 * x$c1, 2 * x$c2, 3 * x$c3))

return(voteTotal)

}

voteTotals <- apply(dataSet, 1, ComputeStats)

and expect similar to what you showed for mean, media etc. would happen.

But it complains on dataSet$c1 - so I tried dataSet[c1], dataSet[,c1] and even dataSet[,1]

----------

No luck.

Anyway - just thought check to see if that can be done without looping. Again I found examples of apply that can work simple custom functions for above such as x = x+2 which applies x+2 across all columns individually.

Just wondering if this last aspect can be managed

----------

Again thank you for the suggestions Robert and really appreciate your book in understanding R and more importantly how to stepwise analyze data.

Btw a suggestion would be to add some material with using ggplot2.

Thanks again.]]>

Btw I did not have to change any code, but when I was calling it in a script it was returning me the same numbers. After I restarted R it magically started working.

But thanks for this as well.]]>

Peter]]>

Thanks very much and congratulations for very your useful book.]]>

This could be a note to experienced programmers.]]>

i = 10

is used instead of i <- 10. Both appear to work, but I would suggest commenting on this usage or changing it to i <- 10.]]>

Thank you.]]>

Although this is technically correct, I think it sounds a bit confusing as the chart at the beginning of section 2.2 labels one element arrays as "scalars." It might be beneficial to expand or change wording here for the inexperienced.

Thank you!

Message was edited by:

jperickson]]>

studying the chapter 5.6.3 "The reshape package" in the most recent preprint and Wickham's paper in the Journal of Statistical Software ( http://www.jstatsoft.org/v21/i12 ) I was wondering about to pieces:

- In your excellent explanation of the cast function (p. 133) you used triple dots in the formula, I guess to hint that more variables could be listed in that part of the formula. I understood (might be wrong), that triple dots are defined as "all other variables" and can thus be used only once in a cast-formula.

- On page 134 you stated "As you can see, the flexibility of the reshape() functions is amazing." which I believe is also true, but was not discussed in that chapter. You probably meant "the flexibility of the melt() and cast() function" or the functions of the reshape package.

best regards,

Robert]]>

plot(mtcars$wt, mtcars$mpg,

xlab="Miles Per Gallon",

ylab="Car Weight")

While it should be:

plot(mtcars$wt, mtcars$mpg,

xlab="Car Weight",

ylab="Miles Per Gallon")

As a result, Figure 3.18 is incorrectly labeled as well.

"Car Weight" should be on x-axis while "Miles Per Gallon" on the y-axis. Need to fix this because I stared at this figure for a long time because it just didn't make sense and the labeling was the cause of the confusion.]]>

The second last paragraph on p33 reads, "The statement status <- factor (status, ordered = TRUE) will encode the vector as (3,2,1,3) and associate these values internally as 1=Excellent, 2=Improved, and 3=Poor."

Does R assign these values internally? If R does assign these values, could you please explicitly spell out the rule? If not, how may the user assign these values? I beg your pardon if I have overlooked any clue about internal association of values with ordinal variables. Thanks in anticipation.

Regards,

KwokC]]>

On Mac 10.6 snow leopard,

minor.tick(nx=2, ny=3, tick.ratio=0.5)

gives me one tick mark on the Y axis, alternating between 1/3 and 2/3 of the way between major marks.]]>

Axis text will [b]be to[/b] italic]]>

I'd be inclined to drop the "actually".]]>

> is.finite(c(1,2,3,NA,NaN,Inf,-Inf,NULL))

[1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE]]>

[b]The period (.) has no special significance in object names. However, the dollar sign ($) has a somewhat analogous meaning, identifying the parts of an object.[/b]

The [i]analogous[/i] is a bit confusing without a referent. Perhaps

The period (.) has no special significance in object names. However, the dollar sign ($) has a somewhat analogous meaning [b]to the period in other object-oriented languages[/b], identifying the parts of an object.]]>

The formulas are not translating properly when turned into PDF documents from word documents. The publishers will fix this.

With regard to overdispersion, take a look at

http://www-m4.ma.tum.de/courses/GLM/lec5.pdf

It might answer your question.

I appreciate your comments and interest!

Sincerely,

Rob]]>

Thanks!

Rob]]>

Sincerely,

Rob]]>

Sincerely,

Rob]]>

page 61 (last saved 2/9/2010):

e.g., --> e.g.

page 64 (last saved 2/9/2010):

we have create --> we have created

page 65 (last saved 2/9/2010):

not every sentence in table 3.7 end with a '.'

page 69 (last saved 2/9/2010):

an better display --> a better display

page 70 (last saved 2/9/2010):

interpretation each --> interpretation of each

page 74 (last saved 2/9/2010):

is again place --> is again placed

page 83 (last saved 2/9/2010):

vectors(sumx and meanx) . --> vectors (sumx and meanx).

page 96 (last saved 2/9/2010):

to chosen from --> to choose from

page 101 (last saved 2/9/2010):

sd(c(1,2,3,4) --> sd(c(1,2,3,4))

page 102 (last saved 2/9/2010):

listing 4.2 --> listing 5.2

page 103 (last saved 2/9/2010):

6.25 12.25 --> 6.25, 12.25

(x-means)**2) --> (x-means)**2

page 116 (last saved 2/9/2010):

semicolons). --> semicolons

page 122 (last saved 2/9/2010):

have get --> have got

page 127 (last saved 2/9/2010):

We the following --> In the following

page 135 (last saved 2/9/2010):

(lbls) resolves --> (lbls)) resolves

page 140 (last saved 2/9/2010):

amount=0.01) --> amount=0.01))

page 144 (last saved 2/9/2010):

boxplot.stats(mtcars$mpg) --> boxplot.stats(mtcars$mpg) or boxplot(mtcars$mpg)$stats

page 162 (last saved 2/9/2010):

section 6.4) , --> section 6.4),

page 180 (last saved 2/9/2010):

if want --> if we want

page 189 (last saved 2/9/2010):

in the formula on the second half of the page: a matched set of brackets is missing

the i in Yi should be printed as subscript

page 192 (last saved 2/9/2010):

Weight = -87.52 + 3.45* Height --> weight = -87.52 + 3.45* height

page 194 (last saved 2/9/2010):

Weight --> weight

Height --> height

Height --> height

page 206 (last saved 2/9/2010):

chapter 12). --> chapter 12.)

page 207 (last saved 2/9/2010):

add to the function in listing 8.6: rug(jitter(x))

page 217 (last saved 2/9/2010):

add the following lines of text at the of page 217:

For every graph the user may click on points of interest and R will show the label of that point.

Escape or Stop will show the next graph.

page 219 (last saved 2/9/2010):

library(car --> library(car)

page 240 (last saved 2/9/2010):

aov( formula --> aov(formula

page 249 (last saved 2/9/2010):

function the HN --> function in the HN

page 251 (last saved 2/9/2010):

designs assumes --> designs assume

page 258 (last saved 2/9/2010):

add on top of the page: > attach(wlbl)

page 269 (last saved 2/9/2010):

that Reaction time --> that reaction time

page 270 (last saved 2/9/2010):

remove the '.' at the end of 4 paragraphs under the topic Specifically

page 296 (last saved 4/2/2010):

mvrnorm() --> mvrnorm()

page 323 (last saved 9/13/2010):

fist use --> first use

package in available --> package available

page 324 (last saved 9/13/2010):

remove the first space in ' data identifies'

remove the '.' at the end of 3 paragraphs under the topic where

page 329 (last saved 9/13/2010):

would we could --> we could

page 347 (last saved 6/29/2010):

variety of variety of --> variety of

page 350 (last saved 6/29/2010):

can be save --> can be saved

page 353 (last saved 9/13/2010):

the 2 formulas on this page are not legible

page 358 (last saved 9/13/2010):

the formula on this page is not legible

page 366 (last saved 9/13/2010):

better fit your --> better fit for your

page 375 (last saved 9/13/2010):

widely use --> widely used

page 378 (last saved 9/13/2010):

to decided to extract --> to decide to extract

page 379 (last saved 9/13/2010):

we when we --> when we

page 380 (last saved 9/13/2010):

PA1 and P2 --> PA1 and PA2

verbal land --> verbal and

page 457 (last saved 10/13/2010):

( http --> (http

page 472 (last saved 10/13/2010):

.ls,objects() --> ls.objects()

page 1 - end

use 'dataframe' or 'data frame']]>

pag 39: library(xlxs) --> library(xlsx)

pag 39: myworkbook.xlxs --> myworkbook.xlsx

pag 39: Webscaping --> Webscraping

pag 39: webscaping --> webscraping

pag 44: e.g., --> e.g.

pag 58: filled green filled --> filled green]]>

page 23: will be help --> will help]]>

Sincerely,

Rob]]>

Sincerely,

Rob]]>

Sincerely,

Rob]]>

Sorry for the inconvenience.

Sincerely,

Rob]]>

Thank you very, very much for your coaching!

I tried your second script. The formatting of the table looked right, but the numbers in the cells did not look right.

I wrote my own script (please see below) that does get the right numbers into the cells. But it is really clunky. I hope there is a more elegant way to do this in R.

Also, I am wondering how to do Tukeys pairwise tests to see whether the subpopulations of males vs. females within each region are different in terms of proportions who prefer different contact options. This would mean testing whether in Northeast Males and Females were different in terms of proportion who prefer the over the phone option (i. e., to see whether 11% vs 13% is a significant difference) etc.

I am not sure why your script does not give the right numbers. I am not sure how to interpret this: prop.table(t, 3) converts the cell counts in table t to proportions, where the proportions add up to 1.0 for the 3rd variable (C here). Replace the 3 with any index or vector of indexes to control the margins. But the error seemed to have occurred at that step.

Cheers,

YZK

----------------------------------------------------------------------------------------------------------------

#This script seems to get the right table printed.

library(Hmisc)

fcc.df = spss.get("FCC.Consumer.Survey.Fall.2009.sav")

t= table( fcc.df$cregion, fcc.df$sex, fcc.df$q35b)

tbase=t

for (i in (1: dim(tbase)[1])) {

for ( j in (1: dim(tbase)[2]) ) {

for ( k in (1: dim(tbase)[3])) {

tbase[i,j,k] = sum( t[i,j,])

print( c("i=", i, " j=",j, " k=", k))

#This print statement is unnecessary for the calculations just a diagnostic

}

}

}

#tbase holds the total number of respondents of a given gender in a given region

t3 = ftable (t/tbase, row.vars=3)

print ("Entries in cells are in percentages for easier readability")

#I wonder how to put % sign into table when printing

print( t3*100, digits = 1)]]>

You are correct about demo() creating variables in the current environment. I have not found a way around this yet. If you quit R and restart it, the variables created by demo() will be gone (as long as you do not save the workspace).

In any case, the datasets (like iris) that exist in packages are unaffected by what you do on the command line. The original copies are protected against anything but uninstalling the package.

Hope this helps.

Rob]]>

On page 453, you have a box-plot with pair-wise comparisons listed across the top. For me, I would find a further discussion of how to make a stand-alone table of pair-wise comparisons or across a plot with the accompanying "abc" indicators very useful.

Thanks

Chase]]>

Rob]]>

Sincerely,

Rob]]>

Sincerely,

Rob]]>

Rob]]>

Sincerely,

Rob]]>

I have made the change.

Sincerely,

Rob]]>

I have made the corrections.

Sincerely,

Rob]]>

Sincerely,

Rob]]>

I have made the corrections.

Sincerely,

Rob]]>

Sincerely,

Rob]]>

Sincerely,

Rob]]>

You provided code to do multiple imputation:

library(mice)

data(sleep, package="VIM")

imp <- mice(sleep, seed=1234)

fit <- with(imp,lm(Dream ~ Span + Gest))

pooled <- pool(fit)

summary(pooled)

Following are my questions:

1. I would like to perform this procedure for some of the variables with missing values but not for all of them. How can I do it?

2 How can force R to fill the missing values with variables that I choose?

3 I would like to perfom COX regression based on the imputed files same as you perform with the "with" function in the above code. How can I do it?

I would greatly appreciate your help.

Thank you, Victoria]]>

as invalid URL]]>

Rob]]>

It is a good idea. I will definitely consider adding some info on this topic. In the mean time, the best online source I have seen is

http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/

I hope this helps.

Sincerely,

Rob]]>

"In the following code, one figure is again place in row 1 and two figures are placed in row 2. However, the figure in row 1 is 1/3 the height of the figures in row 2. Additionally, the figure in the bottom right cell is 1/4 the width of the figure in the bottom left cell."

Due to bizarre pagination, an orphaned "2" and indentation of the first line, it reads as if the first line is an incomplete sentence and the second line is the beginning of Item 2.

Message was edited by:

ClayH]]>

"We can same graph with the code"

probably should read:

"We can produce the same graph with the code"

Message was edited by:

ClayH]]>

Thanks for responding to this. I actually did mean to say "imputing". When we impute missing values, we replace them with reasonable non-missing estimates (filler values if you will) before continuing the data analysis. Missing data is such important and pervasive problem for the data analyst that I decided to add a separate chapter on the subject (chapter 16).

Keep the comments coming!

Sincerely,

Rob]]>

I am missing function for calculating trend of given x (slope of regression line for time series, which will be aggregated).

For example:

y = c(1.3, 1.6, 2.3, 4.5) - this vector will be aggregated - response variable

x = c(1,2,3,4) - just lenght(y) - explanatory variable

I am missing formula for regression slope (beta_1) with arbitrary length of y

some working examples from me:

M_mean<-cast(melt_data, ID_1+ID_2~variable, mean)

M_median<-cast(melt_data, ID_1+ID_2~variable, median)

M_std<-cast(melt_data, ID_1+ID_2~variable, sd)

M_range<-cast(melt_data, ID_1+ID_2~variable, function(x) max(x) - min(x))

M_skew<-cast(melt_data, ID_1+ID_2~variable, function(x) sum((x-mean(x))**3/sqrt(var(x))**3)/length(x))

M_kurtosis<-cast(melt_data, ID_1+ID_2~variable, function(x) sum((x-mean(x))**4/var(x)**2)/length(x) - 3)

Regards,

Jüri Kuusik

e-mail: kuusik@neti.ee]]>

Rob]]>

The statement is not working because gender was left out of the dataset in the last edit. I will make the corrections. Sorry for the confusion.

Sincerely,

Rob]]>

Rob]]>

You can purchase the ebook from http://www.manning.com/kabacoff/

I hope this helps.

Sincerely,

Rob]]>

Rob]]>

Thanks for the suggestion. I will try to incorporate it.

Sincerely,

Rob]]>

By the time you read this, a new chapter on Power Analysis will have been released on MEAP. I have also written two new graphics chapter that should come out shortly.

Sincerely,

Rob]]>

Sincerely,

Rob]]>

Sincerely,

Rob]]>

From a cursory look at your code, I am wondering if you need to specify the dependent (outcome) variable in the line

comparison <- LSD.test(value, Treatment,Residuals,df,MSerror,group=F)

Sincerely,

Rob]]>

I have fixed it.

Sincerely,

Rob]]>