HTML Parser Home Page

org.htmlparser.nodes
Class TagNode

java.lang.Object
  extended by org.htmlparser.nodes.AbstractNode
      extended by org.htmlparser.nodes.TagNode
All Implemented Interfaces:
Serializable, Cloneable, Node, Tag
Direct Known Subclasses:
BaseHrefTag, CompositeTag, DoctypeTag, FrameTag, ImageTag, InputTag, JspTag, MetaTag, ProcessingInstructionTag

public class TagNode
extends AbstractNode
implements Tag

TagNode represents a generic tag. If no scanner is registered for a given tag name, this is what you get. This is also the base class for all tags created by the parser.

See Also:
Serialized Form

Field Summary
protected static Hashtable breakTags
          Set of tags that breaks the flow.
protected  Vector mAttributes
          The tag attributes.
protected static Scanner mDefaultScanner
          The default scanner for non-composite tags.
 
Fields inherited from class org.htmlparser.nodes.AbstractNode
children, mPage, nodeBegin, nodeEnd, parent
 
Constructor Summary
TagNode()
          Create an empty tag.
TagNode(Page page, int start, int end, Vector attributes)
          Create a tag with the location and attributes provided
TagNode(TagNode tag, TagScanner scanner)
          Create a tag like the one provided.
 
Method Summary
 void accept(NodeVisitor visitor)
          Default tag visiting code.
 boolean breaksFlow()
          Determines if the given tag breaks the flow of text.
 String getAttribute(String name)
          Returns the value of an attribute.
 Attribute getAttributeEx(String name)
          Returns the attribute with the given name.
 Vector getAttributesEx()
          Gets the attributes in the tag.
 String[] getEnders()
          Return the set of tag names that cause this tag to finish.
 int getEndingLineNumber()
          Get the line number where this tag ends.
 Tag getEndTag()
          Get the end tag for this (composite) tag.
 String[] getEndTagEnders()
          Return the set of end tag names that cause this tag to finish.
 String[] getIds()
          Return the set of names handled by this tag.
 String getRawTagName()
          Return the name of this tag.
 int getStartingLineNumber()
          Get the line number where this tag starts.
 int getTagBegin()
          Gets the nodeBegin.
 int getTagEnd()
          Gets the nodeEnd.
 String getTagName()
          Return the name of this tag.
 String getText()
          Return the text contained in this tag.
 Scanner getThisScanner()
          Return the scanner associated with this tag.
 boolean isEmptyXmlTag()
          Is this an empty xml tag of the form <tag/>.
 boolean isEndTag()
          Predicate to determine if this tag is an end tag (i.e.
 void removeAttribute(String key)
          Remove the attribute with the given key, if it exists.
 void setAttribute(Attribute attribute)
          Set an attribute.
 void setAttribute(String key, String value)
          Set attribute with given key, value pair.
 void setAttribute(String key, String value, char quote)
          Set attribute with given key, value pair where the value is quoted by quote.
 void setAttributeEx(Attribute attribute)
          Set an attribute.
 void setAttributesEx(Vector attribs)
          Sets the attributes.
 void setEmptyXmlTag(boolean emptyXmlTag)
          Set this tag to be an empty xml node, or not.
 void setEndTag(Tag end)
          Set the end tag for this (composite) tag.
 void setTagBegin(int tagBegin)
          Sets the nodeBegin.
 void setTagEnd(int tagEnd)
          Sets the nodeEnd.
 void setTagName(String name)
          Set the name of this tag.
 void setText(String text)
          Parses the given text to create the tag contents.
 void setThisScanner(Scanner scanner)
          Set the scanner associated with this tag.
 String toHtml(boolean verbatim)
          Render the tag as HTML.
 String toPlainTextString()
          Get the plain text from this node.
 String toString()
          Print the contents of the tag.
 
Methods inherited from class org.htmlparser.nodes.AbstractNode
clone, collectInto, doSemanticAction, getChildren, getEndPosition, getFirstChild, getLastChild, getNextSibling, getPage, getParent, getPreviousSibling, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHtml
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.htmlparser.Node
clone, collectInto, doSemanticAction, getChildren, getEndPosition, getFirstChild, getLastChild, getNextSibling, getPage, getParent, getPreviousSibling, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHtml
 

Field Detail

mDefaultScanner

protected static final Scanner mDefaultScanner
The default scanner for non-composite tags.


mAttributes

protected Vector mAttributes
The tag attributes. Objects of type Attribute. The first element is the tag name, subsequent elements being either whitespace or real attributes.


breakTags

protected static Hashtable breakTags
Set of tags that breaks the flow.

Constructor Detail

TagNode

public TagNode()
Create an empty tag.


TagNode

public TagNode(Page page,
               int start,
               int end,
               Vector attributes)
Create a tag with the location and attributes provided

Parameters:
page - The page this tag was read from.
start - The starting offset of this node within the page.
end - The ending offset of this node within the page.
attributes - The list of attributes that were parsed in this tag.
See Also:
Attribute

TagNode

public TagNode(TagNode tag,
               TagScanner scanner)
Create a tag like the one provided.

Parameters:
tag - The tag to emulate.
scanner - The scanner for this tag.
Method Detail

getAttribute

public String getAttribute(String name)
Returns the value of an attribute.

Specified by:
getAttribute in interface Tag
Parameters:
name - Name of attribute, case insensitive.
Returns:
The value associated with the attribute or null if it does not exist, or is a stand-alone or
See Also:
Tag.setAttribute(java.lang.String, java.lang.String)

setAttribute

public void setAttribute(String key,
                         String value)
Set attribute with given key, value pair. Figures out a quote character to use if necessary.

Specified by:
setAttribute in interface Tag
Parameters:
key - The name of the attribute.
value - The value of the attribute.
See Also:
Tag.getAttribute(java.lang.String), Tag.setAttribute(String,String,char)

removeAttribute

public void removeAttribute(String key)
Remove the attribute with the given key, if it exists.

Specified by:
removeAttribute in interface Tag
Parameters:
key - The name of the attribute.

setAttribute

public void setAttribute(String key,
                         String value,
                         char quote)
Set attribute with given key, value pair where the value is quoted by quote.

Specified by:
setAttribute in interface Tag
Parameters:
key - The name of the attribute.
value - The value of the attribute.
quote - The quote character to be used around value. If zero, it is an unquoted value.
See Also:
Tag.getAttribute(java.lang.String)

getAttributeEx

public Attribute getAttributeEx(String name)
Returns the attribute with the given name.

Specified by:
getAttributeEx in interface Tag
Parameters:
name - Name of attribute, case insensitive.
Returns:
The attribute or null if it does not exist.
See Also:
Tag.setAttributeEx(org.htmlparser.Attribute)

setAttributeEx

public void setAttributeEx(Attribute attribute)
Set an attribute.

Specified by:
setAttributeEx in interface Tag
Parameters:
attribute - The attribute to set.
See Also:
setAttribute(Attribute)

setAttribute

public void setAttribute(Attribute attribute)
Set an attribute. This replaces an attribute of the same name. To set the zeroth attribute (the tag name), use setTagName().

Parameters:
attribute - The attribute to set.

getAttributesEx

public Vector getAttributesEx()
Gets the attributes in the tag.

Specified by:
getAttributesEx in interface Tag
Returns:
Returns the list of Attributes in the tag. The first element is the tag name, subsequent elements being either whitespace or real attributes.
See Also:
Tag.setAttributesEx(java.util.Vector)

getTagName

public String getTagName()
Return the name of this tag.

Note: This value is converted to uppercase and does not begin with "/" if it is an end tag. Nor does it end with a slash in the case of an XML type tag. To get at the original text of the tag name use getRawTagName(). The conversion to uppercase is performed with an ENGLISH locale.

Specified by:
getTagName in interface Tag
Returns:
The tag name.
See Also:
Tag.setTagName(java.lang.String)

getRawTagName

public String getRawTagName()
Return the name of this tag.

Specified by:
getRawTagName in interface Tag
Returns:
The tag name or null if this tag contains nothing or only whitespace.

setTagName

public void setTagName(String name)
Set the name of this tag. This creates or replaces the first attribute of the tag (the zeroth element of the attribute vector).

Specified by:
setTagName in interface Tag
Parameters:
name - The tag name.
See Also:
Tag.getTagName()

getText

public String getText()
Return the text contained in this tag.

Specified by:
getText in interface Node
Overrides:
getText in class AbstractNode
Returns:
The complete contents of the tag (within the angle brackets).
See Also:
Node.setText(java.lang.String)

setAttributesEx

public void setAttributesEx(Vector attribs)
Sets the attributes. NOTE: Values of the extended hashtable are two element arrays of String, with the first element being the original name (not uppercased), and the second element being the value.

Specified by:
setAttributesEx in interface Tag
Parameters:
attribs - The attribute collection to set.
See Also:
Tag.getAttributesEx()

setTagBegin

public void setTagBegin(int tagBegin)
Sets the nodeBegin.

Parameters:
tagBegin - The nodeBegin to set

getTagBegin

public int getTagBegin()
Gets the nodeBegin.

Returns:
The nodeBegin value.

setTagEnd

public void setTagEnd(int tagEnd)
Sets the nodeEnd.

Parameters:
tagEnd - The nodeEnd to set

getTagEnd

public int getTagEnd()
Gets the nodeEnd.

Returns:
The nodeEnd value.

setText

public void setText(String text)
Parses the given text to create the tag contents.

Specified by:
setText in interface Node
Overrides:
setText in class AbstractNode
Parameters:
text - A string of the form <TAGNAME xx="yy">.
See Also:
Node.getText()

toPlainTextString

public String toPlainTextString()
Get the plain text from this node.

Specified by:
toPlainTextString in interface Node
Specified by:
toPlainTextString in class AbstractNode
Returns:
An empty string (tag contents do not display in a browser). If you want this tags HTML equivalent, use toHtml().

toHtml

public String toHtml(boolean verbatim)
Render the tag as HTML. A call to a tag's toHtml() method will render it in HTML.

Specified by:
toHtml in interface Node
Specified by:
toHtml in class AbstractNode
Parameters:
verbatim - If true return as close to the original page text as possible.
Returns:
The tag as an HTML fragment.
See Also:
Node.toHtml()

toString

public String toString()
Print the contents of the tag.

Specified by:
toString in interface Node
Specified by:
toString in class AbstractNode
Returns:
An string describing the tag. For text that looks like HTML use #toHtml().

breaksFlow

public boolean breaksFlow()
Determines if the given tag breaks the flow of text.

Specified by:
breaksFlow in interface Tag
Returns:
true if following text would start on a new line, false otherwise.

accept

public void accept(NodeVisitor visitor)
Default tag visiting code. Based on isEndTag(), calls either visitTag() or visitEndTag().

Specified by:
accept in interface Node
Specified by:
accept in class AbstractNode
Parameters:
visitor - The visitor that is visiting this node.

isEmptyXmlTag

public boolean isEmptyXmlTag()
Is this an empty xml tag of the form <tag/>.

Specified by:
isEmptyXmlTag in interface Tag
Returns:
true if the last character of the last attribute is a '/'.

setEmptyXmlTag

public void setEmptyXmlTag(boolean emptyXmlTag)
Set this tag to be an empty xml node, or not. Adds or removes an ending slash on the tag.

Specified by:
setEmptyXmlTag in interface Tag
Parameters:
emptyXmlTag - If true, ensures there is an ending slash in the node, i.e. <tag/>, otherwise removes it.

isEndTag

public boolean isEndTag()
Predicate to determine if this tag is an end tag (i.e. </HTML>).

Specified by:
isEndTag in interface Tag
Returns:
true if this tag is an end tag.

getStartingLineNumber

public int getStartingLineNumber()
Get the line number where this tag starts.

Specified by:
getStartingLineNumber in interface Tag
Returns:
The (zero based) line number in the page where this tag starts.

getEndingLineNumber

public int getEndingLineNumber()
Get the line number where this tag ends.

Specified by:
getEndingLineNumber in interface Tag
Returns:
The (zero based) line number in the page where this tag ends.

getIds

public String[] getIds()
Return the set of names handled by this tag. Since this a a generic tag, it has no ids.

Specified by:
getIds in interface Tag
Returns:
The names to be matched that create tags of this type.

getEnders

public String[] getEnders()
Return the set of tag names that cause this tag to finish. These are the normal (non end tags) that if encountered while scanning (a composite tag) will cause the generation of a virtual tag. Since this a a non-composite tag, the default is no enders.

Specified by:
getEnders in interface Tag
Returns:
The names of following tags that stop further scanning.

getEndTagEnders

public String[] getEndTagEnders()
Return the set of end tag names that cause this tag to finish. These are the end tags that if encountered while scanning (a composite tag) will cause the generation of a virtual tag. Since this a a non-composite tag, it has no end tag enders.

Specified by:
getEndTagEnders in interface Tag
Returns:
The names of following end tags that stop further scanning.

getThisScanner

public Scanner getThisScanner()
Return the scanner associated with this tag.

Specified by:
getThisScanner in interface Tag
Returns:
The scanner associated with this tag.
See Also:
Tag.setThisScanner(org.htmlparser.scanners.Scanner)

setThisScanner

public void setThisScanner(Scanner scanner)
Set the scanner associated with this tag.

Specified by:
setThisScanner in interface Tag
Parameters:
scanner - The scanner for this tag.
See Also:
Tag.getThisScanner()

getEndTag

public Tag getEndTag()
Get the end tag for this (composite) tag. For a non-composite tag this always returns null.

Specified by:
getEndTag in interface Tag
Returns:
The tag that terminates this composite tag, i.e. </HTML>.
See Also:
Tag.setEndTag(org.htmlparser.Tag)

setEndTag

public void setEndTag(Tag end)
Set the end tag for this (composite) tag. For a non-composite tag this is a no-op.

Specified by:
setEndTag in interface Tag
Parameters:
end - The tag that terminates this composite tag, i.e. </HTML>.
See Also:
Tag.getEndTag()

© 2005 Derrick Oswald
Jun 10, 2006

HTML Parser is an open source library released under LGPL. SourceForge.net