The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

BobFutrelle (9) [Avatar] Offline
#1
Reading from comma-separated or blank separated or newline separated. Tried to convert to vector, list etc. Length is always 1, so I can't plot it vs., say 1:500 vector.

1. What's the answer?
2. Where can I find it explained in the book?

Good book, by the way smilie

- Bob
robert.kabacoff (170) [Avatar] Offline
#2
Re: When I read in 500 values, R says length() of it is 1 (one)
Hi Bob,

I am not sure that I understand the question. Let's say that you have a comma delimited file named mydata.csv with the following data:

A,B,C
1,2,3
2,5,6
2,6,6
1,2,3

Then we get the following in R

> mydata <- read.csv("mydata.csv"smilie
> mydata
A B C
1 1 2 3
2 2 5 6
3 2 6 6
4 1 2 3
> length(mydata)
[1] 3
> dim(mydata)
[1] 4 3
> str(mydata)
'data.frame': 4 obs. of 3 variables:
$ A: int 1 2 2 1
$ B: int 2 5 6 2
$ C: int 3 6 6 3

What am I missing? What would you like to do?

Rob
BobFutrelle (9) [Avatar] Offline
#3
Re: When I read in 500 values, R says length() of it is 1 (one)
My question is even simpler.
I want to display word frequency versus rank.
I read in a 500 item file of frequencies like this, starting with the frequency of "the" in some big document,

2134713
2038529
1501908
1156570
1057301
...
9801
9758

I want it to create an object of length 500.
I then want to plot it versus the sequence,
1 2 3 4 5 6 ... 499 500
Which are the ranks of the successive words.

Thanks for your quick reply - you ARE the expert. smilie

- Bob
BobFutrelle (9) [Avatar] Offline
#4
Re: When I read in 500 values, R says length() of it is 1 (one)
As a hack, I pasted my 500 values into the c() function,

c(2134713,2038529,1501908,1156570,1057301,...,9801,975smilie

This gave me an object of length 500, which plotted nicely
against a 1:500 sequence, showing the approximately
exponential fall of the frequencies versus rank.

-Bob
robert.kabacoff (170) [Avatar] Offline
#5
Re: When I read in 500 values, R says length() of it is 1 (one)
Hi Bob,

I think I understand. Let's try this. Say your data is in a file called mydata.txt as:
2134713
2038529
1501908
1156570
1057301
9801
9758

Then in R

> mydata <- read.table("mydata.txt"smilie
> mydata
V1
1 2134713
2 2038529
3 1501908
4 1156570
5 1057301
6 9801
7 9758

and

> plot(1:nrow(mydata), mydata$V1, xlab="Rank", ylab="Frequency"smilie

should give you the plot you want.

Does this work for you?

Rob
BobFutrelle (9) [Avatar] Offline
#6
Re: When I read in 500 values, R says length() of it is 1 (one)
Does this work for me?
It sure does.
Like a charm.

Now I'll apply a log function to the x and y data.
Plotting log(Frequency) vs. log(Rank) should give something like a linear dependence.

I noticed that mydata$V1 is an object of length 500, so I will be able to apply log() across all the values.

My other question remains open: How/where can I find info in the book that would teach me how to do what you showed me?

- Bob
robert.kabacoff (170) [Avatar] Offline
#7
Re: When I read in 500 values, R says length() of it is 1 (one)
Hi Bob,

The colon operator is introduced on page 24 (along with vectors).
The data frame is covered on page 27.
Reading the data into a dataframe from a text file is covered on page 35.
The plot command is described in detail in chapter 3.

Hope this helps.

Rob
BobFutrelle (9) [Avatar] Offline
#8
Re: When I read in 500 values, R says length() of it is 1 (one)
Maybe you can get some insight into my confusion when you see this sequence of commands,

> mydata <- read.table("Top500frequencies.txt"smilie
> typeof(mydata)
[1] "list"
> length(mydata)
[1] 1
> typeof(mydata$V1)
[1] "integer"
> length(mydata$V1)
[1] 499

Why does an object of type 'list' have length 1 and at the same time, an object of type 'integer' have a length of 499? Seems backwards.

This is quite at variance with the properties of data types and data structures in virtually all other programming languages such as C, Java, etc.

- Bob
robert.kabacoff (170) [Avatar] Offline
#9
Re: When I read in 500 values, R says length() of it is 1 (one)
Bob,

What is going on under the hood is somewhat complicated bu If I read you right, then

mydata is a list with one component - the vector V1

The V1 component of mydata is a vector with 499 elements.

When applied to a list length() tells you how many components, when applied to a vector, it tell you how many elements.

Rob
BobFutrelle (9) [Avatar] Offline
#10
Re: When I read in 500 values, R says length() of it is 1 (one)
Since you have just stated that the V1 component of mydata is a vector, I would want typeof(<the V1 component of mydata>smilie to be 'vector', just as you say it is. Instead typeof() returns the type of the objects held in each vector element, 'integer'.

Sure, knowing the rules solves the problems of dealing with a new language - but the suggestion I just made would put R syntax/functionality more in line with conventional PLs.

In java, new int[10] creates an array, a 10-element array. The elements of the array are of type integer, the array is not of type integer, it is of type int[].

All you've said and my experiments are helping my to deal with (accept and learn) what I feel are are a few counter-intuitive aspects of R.

- Bob