Tuesday, April 21, 2009

Parsing HTML in Perl

# Load the file into a tree
$html_tree = HTML::TreeBuilder->new;
$html_tree->;parse_file($file_name);

# Get all of the meta tags
@meta_tags = $html_tree->find('meta');

The code takes advantage of the HTML::Tree Perl module from CPAN to take the HTML file, referenced in the $file_name variable, and build a tree of the tags in memory. Once the tree is built I can use the find method to find all of the meta tags and put them into the array @meta_tags. Once I have the array I can step through them one at a time and process them as required.

It is worth noting the HTML::Tree module is dependent on the HTML::Parser module which is dependent on the HTML::Tagset module.

Following link contains
http://www.tc.umn.edu/~hause011/code/extract_from_many_excel.html
code to extract escel data.

No comments:

Post a Comment