The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

YZK_R (5) [Avatar] Offline
#1
Dear Dr. Kabacoff,

First, thank you very much for answering my previous query. I truly appreciated the clear, quick and friendly reply smilie Makes learning from your book feel like a more personal and "supported" experience.

I am continuing to explore R and your book.

I see several examples dealing with with tabulation of data. But I have not found an example of how to take typical survey data and do a typical nested cross-tabulation. (I have not yet read the entire book. Please forgive me if the example is there and I simply have not found it.)

In such data the response to a single question on a 5-point scale gets turned into 5 columns of data (at least, that is how it looks in SPSS)

The cross tabulation would "nest" two or more categories or respondents (for example, show respondents grouped by age into 3 age categories and then show males and females as subcategories within the age groups).

The cross-tab would show what % of the people within each of the genders /age group answered the question of interested (e.g., responded that they would definitely/probably/,maybe/probably not/definitely not buy a widget).

I wonder how to construct such tables efficiently (typically, in a survey analysis one has to do a ton of these).

Another thing that a cross-tab would show may be which groups had pairwise statistically significant differences (Tukey T tests).

You mentioned that there is a package called "survey" that addresses survey data.I explored that package and checked with its author. Unfortunately, that package is not design for market research type of survey analysis.

If you are looking for a "public" data set as it comes from a survey, here is an example: http://www.data.gov/raw/2077 (If you download the data that is posted under CSV you will actually get a zip file containing .sav file, I downloanded it via Hmisc and it seems to have downloaded fine.)

Cheers,
YZK

Message was edited by:
YZK_R

Message was edited by:
YZK_R
robert.kabacoff (170) [Avatar] Offline
#2
Re: Nested cross-tabulation with pairwise significance tests?
Dear YZK_R,

I downloaded the file you suggested and created a table that I think will fit your needs. Take a look at the following code:

# input data
library(Hmisc)
s <- spss.get("FCC.Consumer.Survey.Fall.2009.sav"smilie

# create 3 way table
t <- table(s$cregion, s$sex, s$q35b)

# flatten and get row proportions
t2 <- ftable(prop.table(t, 3))

# print results
print(t2, digits=1)

The NaNs in the lawyer column are because no one selected that option. Give it a try and let me know what you think.

Sincerely,

Rob
YZK_R (5) [Avatar] Offline
#3
Re: Nested cross-tabulation with pairwise significance tests?
Dear Dr. Kabacoff,
Thank you very, very much yet again.

The code you created does put together a nested table. But it is not quite what I was trying to do. I apologize for not making myself clearer in my previous query.

I was not able to tweak the code you sent (so far) to do what I was trying to do. I hope you would be kind enough to show me how to do that.

I was trying to put the "answers of interest" (in this case Q35) as rows and the "demographics" (e.g., genders nested within regions) as columns.

The table would show that in the Northeast 11% of males and 13% of females prefer the “over the phone” option, etc.

The table would also indicate where Tukey’s T test would show significant differences for males vs. females (within specific regions).

I will send you a PDF file with a couple of examples of such tables via e-mail.

The tables I will send you are the output of SPSS. My version of SPSS does not seem to be able to do Tukey’s T test in the table. It only does plain pariwise student t-test. (Ideally, it should do Tukey’s test since we are making multiple pairwise comparisons. This is one of the reasons I would like to figure out how to do this in R)

If it is possible for you to show us how to generate something similar in R , that would be really helpful.

Cheers,
Yana
robert.kabacoff (170) [Avatar] Offline
#4
Re: Nested cross-tabulation with pairwise significance tests?
Let's break it down:


t <- table(A, B, C) creates a 3-way cross tabulation of categorical variables A, B, and C.

prop.table(t, 3) converts the cell counts in table t to proportions, where the proportions add up to 1.0 for the 3rd variable (C here). Replace the 3 with any index or vector of indexes to control the margins.

ftable(prop.table(t, 3), row.vars=3) creates a flattened table with variable C as the rows. Thus each row sums to 1. The ftable function is very flexible. You can specify both the row.vars and col.vars.

Putting it all together:

# input data
library(Hmisc)
s <- spss.get("FCC.Consumer.Survey.Fall.2009.sav"smilie

# create 3 way table
t <- table(s$cregion, s$sex, s$q35b)

# flatten and get row proportions
t2 <- ftable(prop.table(t, 3), row.vars=3)

# print results
print(t2, digits=1)

I hope this helps.

Rob
YZK_R (5) [Avatar] Offline
#5
Re: Nested cross-tabulation with pairwise significance tests?
Dear Dr. Kabacoff,

Thank you very, very much for your coaching!

I tried your second script. The formatting of the table looked right, but the numbers in the cells did not look right.

I wrote my own script (please see below) that does get the right numbers into the cells. But it is really clunky. I hope there is a more elegant way to do this in R.

Also, I am wondering how to do Tukey’s pairwise tests to see whether the subpopulations of males vs. females within each region are different in terms of proportions who prefer different contact options. This would mean testing whether in Northeast Males and Females were different in terms of proportion who prefer the “over the phone option” (i. e., to see whether 11% vs 13% is a significant difference) etc.

I am not sure why your script does not give the right numbers. I am not sure how to interpret this: “prop.table(t, 3) converts the cell counts in table t to proportions, where the proportions add up to 1.0 for the 3rd variable (C here). Replace the 3 with any index or vector of indexes to control the margins.” But the error seemed to have occurred at that step.

Cheers,
YZK
----------------------------------------------------------------------------------------------------------------
#This script seems to get the right table printed.
library(Hmisc)
fcc.df = spss.get("FCC.Consumer.Survey.Fall.2009.sav")

t= table( fcc.df$cregion, fcc.df$sex, fcc.df$q35b)

tbase=t

for (i in (1: dim(tbase)[1])) {
for ( j in (1: dim(tbase)[2]) ) {
for ( k in (1: dim(tbase)[3])) {
tbase[i,j,k] = sum( t[i,j,])
print( c("i=", i, " j=",j, " k=", k))
#This print statement is unnecessary for the calculations – just a diagnostic
}
}
}
#tbase holds the total number of respondents of a given gender in a given region

t3 = ftable (t/tbase, row.vars=3)

print ("Entries in cells are in percentages for easier readability")
#I wonder how to put % sign into table when printing
print( t3*100, digits = 1)