The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

sstark (2) [Avatar] Offline
#1
I'm somewhat of a novice in OO programming and am having trouble using the HTML:smiliearser modules. The perldoc is very confusing for someone at my level, and the examples Cross's Data Munging book are helpful but inadequate.

All I'm trying to do is parse some HTML, grabbing the text between certain tags and storing it into hashes and arrays. For example, in psuedocode:

find all <span> tags where the attribute id = "date"
(e.g. <span id="date"> )
store the text between <span> and </span> into %myhash


I've gotten HTML:smiliearser to grab the attributes of tags and the text of the tags separately, but can't figure out how to get them both together:

use HTML:smiliearser;
my($page) = shift;
my $h = HTML:smiliearser->new(text_h => [&text,'text'],
start_h => [&start,'tagname,attr,attrseq'],
end_h => [&end,'tagname']);
$h->parse_file($page);
sub text { ... }
sub start { ... }
sub end { ... }

Similarly, I've been able to get HTML::TokeParser to work, as described at the end of the HTML Parser chapter, but I can't figure out how to get the attribute of the current tag (the perldoc says that $p->get_token gets the attributes of the *next* tag, but not the current one).

my $p = HTML::TokeParser->new($file);
open(IN, "$file");
open(OUT, ">anth.txt"smilie;
while( <IN> ){
if($p->get_tag('span')){
my($text) = $p->get_text;
}
}

I'm sure I'm doing something basically wrong here, but I can't figure it out from the available documentation. Any help would be greatly appreciated.

Hopefully this forum is still alive!

Scott