.\" Automatically generated by Pod::Man 2.27 (Pod::Simple 3.28)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings. \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote. \*(C+ will
.\" give a nicer C++. Capital omega is used to do unbreakable dashes and
.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
. ds -- \(*W-
. ds PI pi
. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
. ds L" ""
. ds R" ""
. ds C` ""
. ds C' ""
'br\}
.el\{\
. ds -- \|\(em\|
. ds PI \(*p
. ds L" ``
. ds R" ''
. ds C`
. ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\"
.\" If the F register is turned on, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD. Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{
. if \nF \{
. de IX
. tm Index:\\$1\t\\n%\t"\\$2"
..
. if !\nF==2 \{
. nr % 0
. nr F 2
. \}
. \}
.\}
.rr rF
.\"
.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
.\" Fear. Run. Save yourself. No user-serviceable parts.
. \" fudge factors for nroff and troff
.if n \{\
. ds #H 0
. ds #V .8m
. ds #F .3m
. ds #[ \f1
. ds #] \fP
.\}
.if t \{\
. ds #H ((1u-(\\\\n(.fu%2u))*.13m)
. ds #V .6m
. ds #F 0
. ds #[ \&
. ds #] \&
.\}
. \" simple accents for nroff and troff
.if n \{\
. ds ' \&
. ds ` \&
. ds ^ \&
. ds , \&
. ds ~ ~
. ds /
.\}
.if t \{\
. ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
. ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
. ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
. ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
. ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
. ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
.\}
. \" troff and (daisy-wheel) nroff accents
.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
.ds ae a\h'-(\w'a'u*4/10)'e
.ds Ae A\h'-(\w'A'u*4/10)'E
. \" corrections for vroff
.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
. \" for low resolution devices (crt and lpr)
.if \n(.H>23 .if \n(.V>19 \
\{\
. ds : e
. ds 8 ss
. ds o a
. ds d- d\h'-1'\(ga
. ds D- D\h'-1'\(hy
. ds th \o'bp'
. ds Th \o'LP'
. ds ae ae
. ds Ae AE
.\}
.rm #[ #] #H #V #F C
.\" ========================================================================
.\"
.IX Title "HTML::Tree 3"
.TH HTML::Tree 3 "2019-10-04" "perl v5.16.3" "User Contributed Perl Documentation"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
HTML::Tree \- build and scan parse\-trees of HTML
.SH "VERSION"
.IX Header "VERSION"
This document describes version 5.07 of
HTML::Tree, released August 31, 2017
as part of HTML-Tree.
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 3
\& use HTML::TreeBuilder;
\& my $tree = HTML::TreeBuilder\->new();
\& $tree\->parse_file($filename);
\&
\& # Then do something with the tree, using HTML::Element
\& # methods \-\- for example:
\&
\& $tree\->dump
\&
\& # Finally:
\&
\& $tree\->delete;
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
HTML-Tree is a suite of Perl modules for making parse trees out of
\&\s-1HTML\s0 source. It consists of mainly two modules, whose documentation
you should refer to: HTML::TreeBuilder
and HTML::Element.
.PP
HTML::TreeBuilder is the module that builds the parse trees. (It uses
HTML::Parser to do the work of breaking the \s-1HTML\s0 up into tokens.)
.PP
The tree that TreeBuilder builds for you is made up of objects of the
class HTML::Element.
.PP
If you find that you do not properly understand the documentation
for HTML::TreeBuilder and HTML::Element, it may be because you are
unfamiliar with tree-shaped data structures, or with object-oriented
modules in general. Sean Burke has written some articles for
\&\fIThe Perl Journal\fR (\f(CW\*(C`www.tpj.com\*(C'\fR) that seek to provide that background.
The full text of those articles is contained in this distribution, as:
.IP "HTML::Tree::AboutObjects" 4
.IX Item "HTML::Tree::AboutObjects"
\&\*(L"User's View of Object-Oriented Modules\*(R" from \s-1TPJ17.\s0
.IP "HTML::Tree::AboutTrees" 4
.IX Item "HTML::Tree::AboutTrees"
\&\*(L"Trees\*(R" from \s-1TPJ18\s0
.IP "HTML::Tree::Scanning" 4
.IX Item "HTML::Tree::Scanning"
\&\*(L"Scanning \s-1HTML\*(R"\s0 from \s-1TPJ19\s0
.PP
Readers already familiar with object-oriented modules and tree-shaped
data structures should read just the last article. Readers without
that background should read the first, then the second, and then the
third.
.SH "METHODS"
.IX Header "METHODS"
All these methods simply redirect to the corresponding method in
HTML::TreeBuilder. It's more efficient to use HTML::TreeBuilder
directly, and skip loading HTML::Tree at all.
.SS "new"
.IX Subsection "new"
Redirects to \*(L"new\*(R" in HTML::TreeBuilder.
.SS "new_from_file"
.IX Subsection "new_from_file"
Redirects to \*(L"new_from_file\*(R" in HTML::TreeBuilder.
.SS "new_from_content"
.IX Subsection "new_from_content"
Redirects to \*(L"new_from_content\*(R" in HTML::TreeBuilder.
.SS "new_from_url"
.IX Subsection "new_from_url"
Redirects to \*(L"new_from_url\*(R" in HTML::TreeBuilder.
.SH "SUPPORT"
.IX Header "SUPPORT"
You can find documentation for this module with the perldoc command.
.PP
.Vb 1
\& perldoc HTML::Tree
\&
\& You can also look for information at:
.Ve
.IP "\(bu" 4
AnnoCPAN: Annotated \s-1CPAN\s0 documentation
.Sp
<http://annocpan.org/dist/HTML\-Tree>
.IP "\(bu" 4
\&\s-1CPAN\s0 Ratings
.Sp
<http://cpanratings.perl.org/d/HTML\-Tree>
.IP "\(bu" 4
\&\s-1RT: CPAN\s0's request tracker
.Sp
<http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTML\-Tree>
.IP "\(bu" 4
Search \s-1CPAN\s0
.Sp
<http://search.cpan.org/dist/HTML\-Tree>
.IP "\(bu" 4
Stack Overflow
.Sp
<http://stackoverflow.com/questions/tagged/html\-tree>
.Sp
If you have a question about how to use HTML-Tree, Stack Overflow is
the place to ask it. Make sure you tag it both \f(CW\*(C`perl\*(C'\fR and \f(CW\*(C`html\-tree\*(C'\fR.
.SH "SEE ALSO"
.IX Header "SEE ALSO"
HTML::TreeBuilder, HTML::Element, HTML::Tagset,
HTML::Parser, HTML::DOMbo
.PP
The book \fIPerl & \s-1LWP\s0\fR by Sean M. Burke published by
O'Reilly and Associates, 2002. \s-1ISBN: 0\-596\-00178\-9\s0
.PP
It has several chapters to do with \s-1HTML\s0 processing in general,
and HTML-Tree specifically. There's more info at:
.PP
.Vb 1
\& http://www.oreilly.com/catalog/perllwp/
\&
\& http://www.amazon.com/exec/obidos/ASIN/0596001789
.Ve
.SH "SOURCE REPOSITORY"
.IX Header "SOURCE REPOSITORY"
HTML-Tree is now maintained using Git. The main public repository is
<https://github.com/kentfredric/HTML\-Tree>.
.PP
The best way to send a patch is to make a pull request there.
.SH "ACKNOWLEDGEMENTS"
.IX Header "ACKNOWLEDGEMENTS"
Thanks to Gisle Aas, Sean Burke and Andy Lester for their original work.
.PP
Thanks to Chicago Perl Mongers (http://chicago.pm.org) for their
patches submitted to HTML::Tree as part of the Phalanx project
(http://qa.perl.org/phalanx).
.PP
Thanks to the following people for additional patches and documentation:
Terrence Brannon, Gordon Lack, Chris Madsen and Ricardo Signes.
.SH "AUTHOR"
.IX Header "AUTHOR"
Current maintainers:
.IP "\(bu" 4
Christopher J. Madsen \f(CW\*(C`<perl\ AT\ cjmweb.net>\*(C'\fR
.IP "\(bu" 4
Jeff Fearn \f(CW\*(C`<jfearn\ AT\ cpan.org>\*(C'\fR
.PP
Original HTML-Tree author:
.IP "\(bu" 4
Gisle Aas
.PP
Former maintainers:
.IP "\(bu" 4
Sean M. Burke
.IP "\(bu" 4
Andy Lester
.IP "\(bu" 4
Pete Krawczyk \f(CW\*(C`<petek\ AT\ cpan.org>\*(C'\fR
.PP
You can follow or contribute to HTML-Tree's development at
<https://github.com/kentfredric/HTML\-Tree>.
.SH "COPYRIGHT AND LICENSE"
.IX Header "COPYRIGHT AND LICENSE"
Copyright 1995\-1998 Gisle Aas, 1999\-2004 Sean M. Burke,
2005 Andy Lester, 2006 Pete Krawczyk, 2010 Jeff Fearn,
2012 Christopher J. Madsen.
(Except the articles contained in HTML::Tree::AboutObjects,
HTML::Tree::AboutTrees, and HTML::Tree::Scanning, which are all
copyright 2000 The Perl Journal.)
.PP
Except for those three \s-1TPJ\s0 articles, the whole HTML-Tree distribution,
of which this file is a part, is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
.PP
Those three \s-1TPJ\s0 articles may be distributed under the same terms as
Perl itself.
.PP
The programs in this library are distributed in the hope that they
will be useful, but without any warranty; without even the implied
warranty of merchantability or fitness for a particular purpose.