The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

import-bot (20211) [Avatar] Offline
#1
[Originally posted by kirkw]

I have a _very_ large data file to in the format of:

DATE JULIAN INDEX TREEID CIRCUIT BCON BBRK FBUD
FOPN FPST SEX LENGTH WIDTH LFIN L75 L95 TIP
COMMENTS
03/28/1995 87 87 ACPE-01 21 MS 0 ND
0 0 0 0 0 0 0 0
03/28/1995 87 87 ACPE-02 36 SS 0 ND
0 0 0 0 0 0 0 0
03/28/1995 87 87 ACPE-03 41 SS 0 ND
0 0 0 0 0 0 0 0
03/28/1995 87 87 ACPE-04 69 NS 0 ND
0 0 0 0 0 0 0 0
03/28/1995 87 87 ACRU-01 8 MS 0
VS 0 0 M 0 0 0 0 0
0
03/28/1995 87 87 ACRU-02 34 MS 0
VS 0 0 M 0 0 0 0 0
0
03/28/1995 87 87 ACRU-03 65 MS 0
VS 0 0 0 0 0 0 0
0
03/28/1995 87 87 ACRU-04 76 MS 0
MS 0 0 0 0 0 0 0
0
03/28/1995 87 87 ACRU-05 106 MS 0
MS 0 0 0 0 0 0 0
0
03/28/1995 87 87 ACSA-01 11 SS 0
ND 0 0 0 0 0 0 0
0
03/28/1995 87 87 ACSA-02 98 NS 0
ND 0 0 0 0 0 0 0
0
03/28/1995 87 87 ACSA-03 103 NS 0
ND 0 0 0 0 0 0 0
0
04/03/1995 93 93 ACPE-01 22 MS 0 ND
0 0 0 0 0 0 0 0
04/03/1995 93 93 ACPE-02 36 MS 0 ND
0 0 0 0 0 0 0 0
04/03/1995 93 93 ACPE-03 41 MS 0 ND
0 0 0 0 0 0 0 0
04/03/1995 93 93 ACPE-04 69 NS 0 ND
0 0 0 0 0 0 0 0
04/03/1995 93 93 ACRU-01 8 MS 0
VS 0 0 M 0 0 0 0 0
0
04/03/1995 93 93 ACRU-02 34 MS 0
VS 0 0 M 0 0 0 0 0
0
04/03/1995 93 93 ACRU-03 65 MS 0
VS 0 0 0 0 0 0 0
0
04/03/1995 93 93 ACRU-04 76 MS 0
VS 0 0 0 0 0 0 0
0
04/03/1995 93 93 ACRU-05 106 VS 0
VS 0 0 0 0 0 0 0
0
04/03/1995 93 93 ACSA-01 11 NS 0
ND 0 0 0 0 0 0 0
0
04/03/1995 93 93 ACSA-02 98 NS 0
ND 0 0 0 0 0 0 0
0
04/03/1995 93 93 ACSA-03 103 NS 0
ND 0 0 0 0 0 0 0 0


I need to find the mean for two different fields (bbrk and lfin). I have put
together the
following script that seems to work (with the exception that I'm not sure what
the
warning :Name "main::header" used only once: possible typo at ./aggrigate.pl
line 7.
means).


#!/usr/bin/perl -w

$, = ' '; # set output field separator
$ = "
"; # set output record separator

# take care of header
$header = (<>smilie;

$firstRec = 1;
while (<>smilie {
if (!eof()) {
chomp;
@data = split;
@tree = split('-',$data[3]);
$unique = 1;
if ($firstRec == 0) {
for ($i = 0; $i <= $#date; $i++) {
if ($data[0] eq $date[$i] and $data[2] eq $julianIndex[$i]
and $treeid[$i] eq $tree[0]) {
$unique = 0;
$element = $i;
$i = $#date;
}
}
}
if ($unique == 1) {
push(@date,$data[0]);
push(@julianIndex,$data[2]);
push(@treeid,$tree[0]);
push(@count,1);
push(@total7,$data[6]);
push(@total14,$data[13]);
$firstRec = 0;
} else {
$count[$element]++;
$total7[$element] += $data[6];
$total14[$element] += $data[13];
}
}

}
for ($j = 0; $j <= $#date; $j++) {
$lfinAvg = $total14[$j] / $count[$j];
$bbrkAvg = $total7[$j] / $count[$j];
print $date[$j], $julianIndex[$j], $treeid[$j], $bbrkAvg, $lfinAvg,
$count[$j];

}

This script finds the mean for each species in the data set for each
observation date.
Now I would like to get similar output for a subset of the species in the data
set. I can
easily remove the requirement "$treeid[$i] eq $tree[0]" from the first "if"
statement
and get a mean for all species. However, I'd like to be able to specify two or
three
species within the field "TREEID" and get back the means for each date from
those two
or three trees.

Any thoughts would be appreciated...

Kirk
import-bot (20211) [Avatar] Offline
#2
Re: aggrigateing data
[Originally posted by dave]

Kirk,

I'm really sorry to have taken so long to respond to your post, but there was
a large amount of code in your example and it's taken me a while to get to
grips with it. In the end, I'm afraid I threw it away and started from
scratch. Here's my version of your script:

#!/usr/bin/perl -w

use strict;

my %stuff;

<>;

while (<>smilie {
chomp;
my @data = split;
my ($tree) = split(/-/, $data[3]);

$stuff{$data[0]}{$tree}{lfin} += $data[6];
$stuff{$data[0]}{$tree}{bbrk} += $data[14];

$stuff{$data[0]}{$tree}{count}++;
$stuff{$data[0]}{$tree}{julian} = $data[2];
}

$ = "
";
$, = ' ';
foreach my $date (keys %stuff) {
foreach my $tree (keys %{$stuff{$date}}) {
print $date, $stuff{$date}{$tree}{julian}, $tree,
$stuff{$date}{$tree}{bbrk} / $stuff{$date}{$tree}{count},
$stuff{$date}{$tree}{lfin} / $stuff{$date}{$tree}{count},
$stuff{$date}{$tree}{count};
}
}

Which I'll hope you'll agree is a lot shorter and easier to follow.

As for your problem of only wanting to summarise certain species of tree.
Here's what I'd do:

1/ Define a list containing the species that you're interested in

my @trees = qw(ACPE ACRU);

2/ Convert that into a hash where the keys are the trees and the values are
all 1

my %trees;
@trees{@trees} = (1) x @trees;

3/ In the while loop, ignore the line unless a matching value appears in $trees

while (<>smilie {
chomp;
my @data = split;
my ($tree) = split(/-/, $data[3]);

next unless $trees{$tree};

# rest of while loop
}

This will then only report results for the two tree species listed.

Hope this helps. Please let me know if anything is unclear.

Dave...
import-bot (20211) [Avatar] Offline
#3
Re: aggrigateing data
[Originally posted by kirkw]

Dave,

My that _is_ less convoluted. Thanks. I still diff'ing it to what I was
starting with so that
I can understand what you did. I'm also interested in the function qw(). Is it
discussed
in the book anywhere?

Kirk
import-bot (20211) [Avatar] Offline
#4
Re: aggrigateing data
[Originally posted by dave]

> Dave,
>
> My that _is_ less convoluted. Thanks. I still diff'ing it to what I was
> starting with so that I can understand what you did.

Kirk,

Glad you like it smilie

Let me know if you have any questions on it.

> I'm also interested in the function qw(). Is it discussed
> in the book anywhere?

No it isn't, tho' I do use it a few times.

qw() isn't a function, it's an operator. The difference is mainly academic,
but it means that you can look up the details in the perlop manual page (as
opposed to perlfunc).

qw() is a nice shortcut for creating lists of items. Code like:

my @days = qw(Sun Mon Tue Wed Thu Fri Sat);

is much cleaner and easier to follow than:

my @days = ('Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat');

which is the longer way to do the same thing.

hth,

Dave...