HTML Parser Home Page

org.htmlparser
Class PrototypicalNodeFactory

java.lang.Object
  extended by org.htmlparser.PrototypicalNodeFactory
All Implemented Interfaces:
Serializable, NodeFactory

public class PrototypicalNodeFactory
extends Object
implements Serializable, NodeFactory

A node factory based on the prototype pattern. This factory uses the prototype pattern to generate new nodes. These are cloned as needed to form new Text, Remark and Tag nodes.

Text and remark nodes are generated from prototypes accessed via the textPrototype and remarkPrototype properties respectively. Tag nodes are generated as follows:

Prototype tags, in the form of undifferentiated tags, are held in a hash table. On a request for a tag, the attributes are examined for the name of the tag to be created. If a prototype of that name has been registered (exists in the hash table), it is cloned and the clone is given the characteristics (Attributes, start and end position) of the requested tag.

In the case that no tag has been registered under that name, a generic tag is created from the prototype acessed via the tagPrototype property.

The hash table of registered tags can be automatically populated with all the known tags from the org.htmlparser.tags package when the factory is constructed, or it can start out empty and be populated explicitly.

Here is an example of how to override all text issued from Text.toPlainTextString(), in this case decoding (converting character references), which illustrates the use of setting the text prototype:

 PrototypicalNodeFactory factory = new PrototypicalNodeFactory ();
 factory.setTextPrototype (
     // create a inner class that is a subclass of TextNode
     new TextNode () {
         public String toPlainTextString()
         {
             String original = super.toPlainTextString ();
             return (org.htmlparser.util.Translate.decode (original));
         }
     });
 Parser parser = new Parser ();
 parser.setNodeFactory (factory);
 

Here is an example of using a custom link tag, in this case just printing the URL, which illustrates registering a tag:


 class PrintingLinkTag extends LinkTag
 {
     public void doSemanticAction ()
         throws
             ParserException
     {
         System.out.println (getLink ());
     }
 }
 PrototypicalNodeFactory factory = new PrototypicalNodeFactory ();
 factory.registerTag (new PrintingLinkTag ());
 Parser parser = new Parser ();
 parser.setNodeFactory (factory);
 

See Also:
Serialized Form

Field Summary
protected  Map mBlastocyst
          The list of tags to return.
protected  Remark mRemark
          The prototypical remark node.
protected  Tag mTag
          The prototypical tag node.
protected  Text mText
          The prototypical text node.
 
Constructor Summary
PrototypicalNodeFactory()
          Create a new factory with all tags registered.
PrototypicalNodeFactory(boolean empty)
          Create a new factory.
PrototypicalNodeFactory(Tag tag)
          Create a new factory with the given tag as the only registered tag.
PrototypicalNodeFactory(Tag[] tags)
          Create a new factory with the given tags registered.
 
Method Summary
 void clear()
          Clean out the registry.
 Remark createRemarkNode(Page page, int start, int end)
          Create a new remark node.
 Text createStringNode(Page page, int start, int end)
          Create a new string node.
 Tag createTagNode(Page page, int start, int end, Vector attributes)
          Create a new tag node.
 Tag get(String id)
          Gets a tag from the registry.
 Remark getRemarkPrototype()
          Get the object that is cloned to generate remark nodes.
 Set getTagNames()
          Get the list of tag names.
 Tag getTagPrototype()
          Get the object that is cloned to generate tag nodes.
 Text getTextPrototype()
          Get the object that is cloned to generate text nodes.
 Tag put(String id, Tag tag)
          Adds a tag to the registry.
 void registerTag(Tag tag)
          Register a tag.
 PrototypicalNodeFactory registerTags()
          Register all known tags in the tag package.
 Tag remove(String id)
          Remove a tag from the registry.
 void setRemarkPrototype(Remark remark)
          Set the object to be used to generate remark nodes.
 void setTagPrototype(Tag tag)
          Set the object to be used to generate tag nodes.
 void setTextPrototype(Text text)
          Set the object to be used to generate text nodes.
 void unregisterTag(Tag tag)
          Unregister a tag.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mText

protected Text mText
The prototypical text node.


mRemark

protected Remark mRemark
The prototypical remark node.


mTag

protected Tag mTag
The prototypical tag node.


mBlastocyst

protected Map mBlastocyst
The list of tags to return. The list is keyed by tag name.

Constructor Detail

PrototypicalNodeFactory

public PrototypicalNodeFactory()
Create a new factory with all tags registered. Equivalent to PrototypicalNodeFactory(false).


PrototypicalNodeFactory

public PrototypicalNodeFactory(boolean empty)
Create a new factory.

Parameters:
empty - If true, creates an empty factory, otherwise create a new factory with all tags registered.

PrototypicalNodeFactory

public PrototypicalNodeFactory(Tag tag)
Create a new factory with the given tag as the only registered tag.

Parameters:
tag - The single tag to register in the otherwise empty factory.

PrototypicalNodeFactory

public PrototypicalNodeFactory(Tag[] tags)
Create a new factory with the given tags registered.

Parameters:
tags - The tags to register in the otherwise empty factory.
Method Detail

put

public Tag put(String id,
               Tag tag)
Adds a tag to the registry.

Parameters:
id - The name under which to register the tag. For proper operation, the id should be uppercase so it will be matched by a Map lookup.
tag - The tag to be returned from a createTagNode(org.htmlparser.lexer.Page, int, int, java.util.Vector) call.
Returns:
The tag previously registered with that id if any, or null if none.

get

public Tag get(String id)
Gets a tag from the registry.

Parameters:
id - The name of the tag to return.
Returns:
The tag registered under the id name, or null if none.

remove

public Tag remove(String id)
Remove a tag from the registry.

Parameters:
id - The name of the tag to remove.
Returns:
The tag that was registered with that id, or null if none.

clear

public void clear()
Clean out the registry.


getTagNames

public Set getTagNames()
Get the list of tag names.

Returns:
The names of the tags currently registered.

registerTag

public void registerTag(Tag tag)
Register a tag. Registers the given tag under every id that the tag has (i.e. all names returned by tag.getIds().

For proper operation, the ids are converted to uppercase so they will be matched by a Map lookup.

Parameters:
tag - The tag to register.

unregisterTag

public void unregisterTag(Tag tag)
Unregister a tag. Unregisters the given tag from every id the tag has.

The ids are converted to uppercase to undo the operation of registerTag.

Parameters:
tag - The tag to unregister.

registerTags

public PrototypicalNodeFactory registerTags()
Register all known tags in the tag package. Registers tags from the tag package by calling registerTag().

Returns:
'this' nodefactory as a convenience.

getTextPrototype

public Text getTextPrototype()
Get the object that is cloned to generate text nodes.

Returns:
The prototype for Text nodes.
See Also:
setTextPrototype(org.htmlparser.Text)

setTextPrototype

public void setTextPrototype(Text text)
Set the object to be used to generate text nodes.

Parameters:
text - The prototype for Text nodes. If null the prototype is set to the default (TextNode).
See Also:
getTextPrototype()

getRemarkPrototype

public Remark getRemarkPrototype()
Get the object that is cloned to generate remark nodes.

Returns:
The prototype for Remark nodes.
See Also:
setRemarkPrototype(org.htmlparser.Remark)

setRemarkPrototype

public void setRemarkPrototype(Remark remark)
Set the object to be used to generate remark nodes.

Parameters:
remark - The prototype for Remark nodes. If null the prototype is set to the default (RemarkNode).
See Also:
getRemarkPrototype()

getTagPrototype

public Tag getTagPrototype()
Get the object that is cloned to generate tag nodes. Clones of this object are returned from createTagNode(org.htmlparser.lexer.Page, int, int, java.util.Vector) when no specific tag is found in the list of registered tags.

Returns:
The prototype for Tag nodes.
See Also:
setTagPrototype(org.htmlparser.Tag)

setTagPrototype

public void setTagPrototype(Tag tag)
Set the object to be used to generate tag nodes. Clones of this object are returned from createTagNode(org.htmlparser.lexer.Page, int, int, java.util.Vector) when no specific tag is found in the list of registered tags.

Parameters:
tag - The prototype for Tag nodes. If null the prototype is set to the default (TagNode).
See Also:
getTagPrototype()

createStringNode

public Text createStringNode(Page page,
                             int start,
                             int end)
Create a new string node.

Specified by:
createStringNode in interface NodeFactory
Parameters:
page - The page the node is on.
start - The beginning position of the string.
end - The ending position of the string.
Returns:
A text node comprising the indicated characters from the page.

createRemarkNode

public Remark createRemarkNode(Page page,
                               int start,
                               int end)
Create a new remark node.

Specified by:
createRemarkNode in interface NodeFactory
Parameters:
page - The page the node is on.
start - The beginning position of the remark.
end - The ending positiong of the remark.
Returns:
A remark node comprising the indicated characters from the page.

createTagNode

public Tag createTagNode(Page page,
                         int start,
                         int end,
                         Vector attributes)
Create a new tag node. Note that the attributes vector contains at least one element, which is the tag name (standalone attribute) at position zero. This can be used to decide which type of node to create, or gate other processing that may be appropriate.

Specified by:
createTagNode in interface NodeFactory
Parameters:
page - The page the node is on.
start - The beginning position of the tag.
end - The ending positiong of the tag.
attributes - The attributes contained in this tag.
Returns:
A tag node comprising the indicated characters from the page.

© 2006 Derrick Oswald
Sep 17, 2006

HTML Parser is an open source library released under Common Public License. SourceForge.net