HTML Parser Home Page

org.htmlparser.visitors
Class NodeVisitor

java.lang.Object
  extended by org.htmlparser.visitors.NodeVisitor
Direct Known Subclasses:
HtmlPage, LinkFindingVisitor, ObjectFindingVisitor, StringBean, StringFindingVisitor, TagFindingVisitor, TextExtractingVisitor, UrlModifyingVisitor

public abstract class NodeVisitor
extends Object

The base class for the 'Visitor' pattern. Classes that wish to use visitAllNodesWith() will subclass this class and provide implementations for methods they are interested in processing.

The operation of visitAllNodesWith() is to call beginParsing(), then visitXXX() according to the types of nodes encountered in depth-first order and finally finishedParsing().

Typical code to print all the link tags:

 import org.htmlparser.Parser;
 import org.htmlparser.Tag;
 import org.htmlparser.Text;
 import org.htmlparser.util.ParserException;
 import org.htmlparser.visitors.NodeVisitor;
 
 public class MyVisitor extends NodeVisitor
 {
     public MyVisitor ()
     {
     }

     public void visitTag (Tag tag)
     {
         System.out.println ("\n" + tag.getTagName () + tag.getStartPosition ());
     }

     public void visitStringNode (Text string)
     {
         System.out.println (string);
     }

     public static void main (String[] args) throws ParserException
     {
         Parser parser = new Parser ("http://cbc.ca");
         Visitor visitor = new MyVisitor ();
         parser.visitAllNodesWith (visitor);
     }
 }
 
If you want to handle more than one tag type with the same visitor you will need to check the tag type in the visitTag method. You can do that by either checking the tag name:
     public void visitTag (Tag tag)
     {
        if (tag.getName ().equals ("BODY"))
            ... do something with the BODY tag
        else if (tag.getName ().equals ("FRAME"))
            ... do something with the FRAME tag
    }
 
or you can use instanceof if all the tags you want to handle have a registered tag (i.e. they are generated by the NodeFactory):
     public void visitTag (Tag tag)
     {
        if (tag instanceof BodyTag)
        {
            BodyTag body = (BodyTag)tag;
            ... do something with body
        }
        else if (tag instanceof FrameTag)
        {
            FrameTag frame = (FrameTag)tag;
            ... do something with frame
        }
        else // other specific tags and generic TagNode objects
        {
        }
    }


Constructor Summary
NodeVisitor()
          Creates a node visitor that recurses itself and it's children.
NodeVisitor(boolean recurseChildren)
          Creates a node visitor that recurses itself and it's children only if recurseChildren is true.
NodeVisitor(boolean recurseChildren, boolean recurseSelf)
          Creates a node visitor that recurses itself only if recurseSelf is true and it's children only if recurseChildren is true.
 
Method Summary
 void beginParsing()
          Override this method if you wish to do special processing prior to the start of parsing.
 void finishedParsing()
          Override this method if you wish to do special processing upon completion of parsing.
 boolean shouldRecurseChildren()
          Depth traversal predicate.
 boolean shouldRecurseSelf()
          Self traversal predicate.
 void visitEndTag(Tag tag)
          Called for each Tag visited that is an end tag.
 void visitRemarkNode(Remark remark)
          Called for each RemarkNode visited.
 void visitStringNode(Text string)
          Called for each StringNode visited.
 void visitTag(Tag tag)
          Called for each Tag visited.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

NodeVisitor

public NodeVisitor()
Creates a node visitor that recurses itself and it's children.


NodeVisitor

public NodeVisitor(boolean recurseChildren)
Creates a node visitor that recurses itself and it's children only if recurseChildren is true.

Parameters:
recurseChildren - If true, the visitor will visit children, otherwise only the top level nodes are recursed.

NodeVisitor

public NodeVisitor(boolean recurseChildren,
                   boolean recurseSelf)
Creates a node visitor that recurses itself only if recurseSelf is true and it's children only if recurseChildren is true.

Parameters:
recurseChildren - If true, the visitor will visit children, otherwise only the top level nodes are recursed.
recurseSelf - If true, the visitor will visit the top level node.
Method Detail

beginParsing

public void beginParsing()
Override this method if you wish to do special processing prior to the start of parsing.


visitTag

public void visitTag(Tag tag)
Called for each Tag visited.

Parameters:
tag - The tag being visited.

visitEndTag

public void visitEndTag(Tag tag)
Called for each Tag visited that is an end tag.

Parameters:
tag - The end tag being visited.

visitStringNode

public void visitStringNode(Text string)
Called for each StringNode visited.

Parameters:
string - The string node being visited.

visitRemarkNode

public void visitRemarkNode(Remark remark)
Called for each RemarkNode visited.

Parameters:
remark - The remark node being visited.

finishedParsing

public void finishedParsing()
Override this method if you wish to do special processing upon completion of parsing.


shouldRecurseChildren

public boolean shouldRecurseChildren()
Depth traversal predicate.

Returns:
true if children are to be visited.

shouldRecurseSelf

public boolean shouldRecurseSelf()
Self traversal predicate.

Returns:
true if a node itself is to be visited.

© 2006 Derrick Oswald
Sep 17, 2006

HTML Parser is an open source library released under Common Public License. SourceForge.net