CompositeTag (HTML Parser 2.0)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

HTML Parser Home Page

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.htmlparser.tags
Class CompositeTag

java.lang.Object
  org.htmlparser.nodes.AbstractNode
      org.htmlparser.nodes.TagNode
          org.htmlparser.tags.CompositeTag

All Implemented Interfaces:: Serializable, Cloneable, Node, Tag

Direct Known Subclasses:: AppletTag, BodyTag, Bullet, BulletList, DefinitionList, DefinitionListBullet, Div, FormTag, FrameSetTag, HeadingTag, HeadTag, Html, LabelTag, LinkTag, ObjectTag, OptionTag, ParagraphTag, ScriptTag, SelectTag, Span, StyleTag, TableColumn, TableHeader, TableRow, TableTag, TextareaTag, TitleTag

public class CompositeTag
extends TagNode
extends TagNode

The base class for tags that have an end tag. Provided extra accessors for the children above and beyond what the basic Tag provides. Also handles the conversion of it's children for the toHtml method.

See Also:: Serialized Form

Field Summary
`protected static CompositeTagScanner`	`mDefaultCompositeScanner` The default scanner for non-composite tags.
`protected Tag`	`mEndTag` The tag that causes this tag to finish.

Fields inherited from class org.htmlparser.nodes.TagNode
`breakTags, mAttributes, mDefaultScanner`

Fields inherited from class org.htmlparser.nodes.AbstractNode
`children, mPage, nodeBegin, nodeEnd, parent`

Constructor Summary
`CompositeTag()` Create a composite tag.

Method Summary
`void`	`accept(NodeVisitor visitor)` Tag visiting code.
`Node`	`childAt(int index)` Get child at given index
`SimpleNodeIterator`	`children()` Get an iterator over the children of this node.
`void`	`collectInto(NodeList list, NodeFilter filter)` Collect this node and its child nodes (if-applicable) into the list parameter, provided the node satisfies the filtering criteria.
`Text[]`	`digupStringNode(String searchText)` Finds a text node, however embedded it might be, and returns it.
`SimpleNodeIterator`	`elements()` Return the child tags as an iterator.
`int`	`findPositionOf(Node searchNode)` Returns the node number of a child node given the node object.
`int`	`findPositionOf(String text)` Returns the node number of the first node containing the given text.
`int`	`findPositionOf(String text, Locale locale)` Returns the node number of the first node containing the given text.
`Node`	`getChild(int index)` Get the child of this node at the given position.
`int`	`getChildCount()` Return the number of child nodes in this tag.
`Node[]`	`getChildrenAsNodeArray()` Get the children as an array of `Node` objects.
`String`	`getChildrenHTML()` Return the HTML code for the children of this tag.
`Tag`	`getEndTag()` Get the end tag for this tag.
`String`	`getStringText()` Return the text between the start tag and the end tag.
`String`	`getText()` Return the text contained in this tag.
`protected void`	`putChildrenInto(StringBuffer sb, boolean verbatim)` Add the textual contents of the children of this node to the buffer.
`protected void`	`putEndTagInto(StringBuffer sb, boolean verbatim)` Add the textual contents of the end tag of this node to the buffer.
`void`	`removeChild(int i)` Remove the child at the position given.
`Tag`	`searchByName(String name)` Searches all children who for a name attribute.
`NodeList`	`searchFor(Class classType, boolean recursive)` Collect all objects that are of a certain type Note that this will not check for parent types, and will not recurse through child tags
`NodeList`	`searchFor(String searchString)` Searches for all nodes whose text representation contains the search string.
`NodeList`	`searchFor(String searchString, boolean caseSensitive)` Searches for all nodes whose text representation contains the search string.
`NodeList`	`searchFor(String searchString, boolean caseSensitive, Locale locale)` Searches for all nodes whose text representation contains the search string.
`void`	`setEndTag(Tag tag)` Set the end tag for this tag.
`String`	`toHtml(boolean verbatim)` Return this tag as HTML code.
`String`	`toPlainTextString()` Return the textual contents of this tag and it's children.
`String`	`toString()` Return a string representation of the contents of this tag, it's children and it's end tag suitable for debugging.
`void`	`toString(int level, StringBuffer buffer)` Return a string representation of the contents of this tag, it's children and it's end tag suitable for debugging.

Methods inherited from class org.htmlparser.nodes.TagNode
`breaksFlow, getAttribute, getAttributeEx, getAttributesEx, getEnders, getEndingLineNumber, getEndTagEnders, getIds, getRawTagName, getStartingLineNumber, getTagBegin, getTagEnd, getTagName, getThisScanner, isEmptyXmlTag, isEndTag, removeAttribute, setAttribute, setAttribute, setAttribute, setAttributeEx, setAttributesEx, setEmptyXmlTag, setTagBegin, setTagEnd, setTagName, setText, setThisScanner`

Methods inherited from class org.htmlparser.nodes.TagNode

breaksFlow, getAttribute, getAttributeEx, getAttributesEx, getEnders, getEndingLineNumber, getEndTagEnders, getIds, getRawTagName, getStartingLineNumber, getTagBegin, getTagEnd, getTagName, getThisScanner, isEmptyXmlTag, isEndTag, removeAttribute, setAttribute, setAttribute, setAttribute, setAttributeEx, setAttributesEx, setEmptyXmlTag, setTagBegin, setTagEnd, setTagName, setText, setThisScanner

Methods inherited from class org.htmlparser.nodes.AbstractNode
`clone, doSemanticAction, getChildren, getEndPosition, getFirstChild, getLastChild, getNextSibling, getPage, getParent, getPreviousSibling, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHtml`

Methods inherited from class java.lang.Object
`equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Methods inherited from interface org.htmlparser.Node
`clone, doSemanticAction, getChildren, getEndPosition, getFirstChild, getLastChild, getNextSibling, getPage, getParent, getPreviousSibling, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHtml`

Field Detail

mEndTag

protected Tag mEndTag

The tag that causes this tag to finish. May be a virtual tag generated by the scanning logic.

mDefaultCompositeScanner

protected static final CompositeTagScanner mDefaultCompositeScanner

The default scanner for non-composite tags.

Constructor Detail

CompositeTag

public CompositeTag()

Create a composite tag.

Method Detail

children

public SimpleNodeIterator children()

Get an iterator over the children of this node.

Returns:: Am iterator over the children of this node.

getChild

public Node getChild(int index)

Get the child of this node at the given position.

Parameters:: index - The in the node list of the child.
Returns:: The child at that index.

getChildrenAsNodeArray

public Node[] getChildrenAsNodeArray()

Get the children as an array of Node objects.

Returns:: The children in an array.

removeChild

public void removeChild(int i)

Remove the child at the position given.

Parameters:: i - The index of the child to remove.

elements

public SimpleNodeIterator elements()

Return the child tags as an iterator. Equivalent to calling getChildren ().elements ().

Returns:: An iterator over the children.

toPlainTextString

public String toPlainTextString()

Return the textual contents of this tag and it's children.

Specified by:: toPlainTextString in interface Node
Overrides:: toPlainTextString in class TagNode

Returns:: The 'browser' text contents of this tag.

putChildrenInto

protected void putChildrenInto(StringBuffer sb,
                               boolean verbatim)

Add the textual contents of the children of this node to the buffer.

Parameters:: verbatim - If true return as close to the original page text as possible.; sb - The buffer to append to.

putEndTagInto

protected void putEndTagInto(StringBuffer sb,
                             boolean verbatim)

Add the textual contents of the end tag of this node to the buffer.

Parameters:: verbatim - If true return as close to the original page text as possible.; sb - The buffer to append to.

toHtml

public String toHtml(boolean verbatim)

Return this tag as HTML code.

Specified by:: toHtml in interface Node
Overrides:: toHtml in class TagNode

Parameters:: verbatim - If true return as close to the original page text as possible.
Returns:: This tag and it's contents (children) and the end tag as HTML code.
See Also:: Node.toHtml()

searchByName

public Tag searchByName(String name)

Searches all children who for a name attribute. Returns first match.

Parameters:: name - Attribute to match in tag
Returns:: Tag Tag matching the name attribute

searchFor

public NodeList searchFor(String searchString)

Searches for all nodes whose text representation contains the search string. Collects all nodes containing the search string into a NodeList. This search is case-insensitive and the search string and the node text are converted to uppercase using an English locale. For example, if you wish to find any textareas in a form tag containing "hello world", the code would be:


 NodeList nodeList = formTag.searchFor("Hello World");

Parameters:: searchString - Search criterion.
Returns:: A collection of nodes whose string contents or representation have the searchString in them.

searchFor

public NodeList searchFor(String searchString,
                          boolean caseSensitive)

Searches for all nodes whose text representation contains the search string. Collects all nodes containing the search string into a NodeList. For example, if you wish to find any textareas in a form tag containing "hello world", the code would be:


 NodeList nodeList = formTag.searchFor("Hello World");

Parameters:: searchString - Search criterion.; caseSensitive - If true this search should be case sensitive. Otherwise, the search string and the node text are converted to uppercase using an English locale.
Returns:: A collection of nodes whose string contents or representation have the searchString in them.

searchFor

public NodeList searchFor(String searchString,
                          boolean caseSensitive,
                          Locale locale)


 NodeList nodeList = formTag.searchFor("Hello World");

Parameters:: searchString - Search criterion.; caseSensitive - If true this search should be case sensitive. Otherwise, the search string and the node text are converted to uppercase using the locale provided.; locale - The locale for uppercase conversion.
Returns:: A collection of nodes whose string contents or representation have the searchString in them.

searchFor

public NodeList searchFor(Class classType,
                          boolean recursive)

Collect all objects that are of a certain type Note that this will not check for parent types, and will not recurse through child tags

Parameters:: classType - The class to search for.; recursive - If true, recursively search through the children.
Returns:: A list of children found.

findPositionOf

public int findPositionOf(String text)

Returns the node number of the first node containing the given text. This can be useful to index into the composite tag and get other children. Text is compared without case sensitivity and conversion to uppercase uses an English locale.

Parameters:: text - The text to search for.
Returns:: int The node index in the children list of the node containing the text or -1 if not found.
See Also:: (String, Locale)

findPositionOf

public int findPositionOf(String text,
                          Locale locale)

Parameters:: locale - The locale to use in converting to uppercase.; text - The text to search for.
Returns:: int The node index in the children list of the node containing the text or -1 if not found.

findPositionOf

public int findPositionOf(Node searchNode)

Returns the node number of a child node given the node object. This would typically be used in conjuction with digUpStringNode, after which the string node's parent can be used to find the string node's position. Faster than calling findPositionOf(text) again. Note that the position is at a linear level alone - there is no recursion in this method.

Parameters:: searchNode - The child node to find.
Returns:: The offset of the child tag or -1 if it was not found.

childAt

public Node childAt(int index)

Get child at given index

Parameters:: index - The index into the child node list.
Returns:: Node The child node at the given index or null if none.

collectInto

public void collectInto(NodeList list,
                        NodeFilter filter)

Collect this node and its child nodes (if-applicable) into the list parameter, provided the node satisfies the filtering criteria.

This mechanism allows powerful filtering code to be written very easily, without bothering about collection of embedded tags separately. e.g. when we try to get all the links on a page, it is not possible to get it at the top-level, as many tags (like form tags), can contain links embedded in them. We could get the links out by checking if the current node is a CompositeTag, and going through its children. So this method provides a convenient way to do this.

Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:

 NodeList list = new NodeList();
 NodeFilter filter = new TagNameFilter ("A");
 for (NodeIterator e = parser.elements(); e.hasMoreNodes();)
      e.nextNode().collectInto(list, filter);

Thus, list will hold all the link nodes, irrespective of how deep the links are embedded.

Another way to accomplish the same objective is:

 NodeList list = new NodeList();
 NodeFilter filter = new TagClassFilter (LinkTag.class);
 for (NodeIterator e = parser.elements(); e.hasMoreNodes();)
      e.nextNode().collectInto(list, filter);

This is slightly less specific because the LinkTag class may be registered for more than one node name, e.g. <LINK> tags too.

Specified by:: collectInto in interface Node
Overrides:: collectInto in class AbstractNode

Parameters:: list - The list to add nodes to.; filter - The filter to apply.
See Also:: org.htmlparser.filters

getChildrenHTML

public String getChildrenHTML()

Return the HTML code for the children of this tag.

Returns:: A string with the HTML code for the contents of this tag.

accept

public void accept(NodeVisitor visitor)

Tag visiting code. Invokes accept() on the start tag and then walks the child list invoking accept() on each of the children, finishing up with an accept() call on the end tag. If shouldRecurseSelf() returns true it then asks the visitor to visit itself.

Specified by:: accept in interface Node
Overrides:: accept in class TagNode

Parameters:: visitor - The NodeVisitor object to be signalled for each child and possibly this tag.

getChildCount

public int getChildCount()

Return the number of child nodes in this tag.

Returns:: The child node count.

getEndTag

public Tag getEndTag()

Get the end tag for this tag. For example, if the node is <LABEL>The label</LABLE>, then this method would return the </LABLE> end tag.

Specified by:: getEndTag in interface Tag
Overrides:: getEndTag in class TagNode

Returns:: The end tag for this node. Note: If the start and end position of the end tag is the same, then the end tag was injected (it's a virtual end tag).
See Also:: Tag.setEndTag(org.htmlparser.Tag)

setEndTag

public void setEndTag(Tag tag)

Set the end tag for this tag.

Specified by:: setEndTag in interface Tag
Overrides:: setEndTag in class TagNode

Parameters:: tag - The new end tag for this tag. Note: no checking is perfromed so you can generate bad HTML by setting the end tag with a name not equal to the name of the start tag, i.e. <LABEL>The label</TITLE>
See Also:: Tag.getEndTag()

digupStringNode

public Text[] digupStringNode(String searchText)

Finds a text node, however embedded it might be, and returns it. The text node will retain links to its parents, so further navigation is possible.

Parameters:: searchText - The text to search for.
Returns:: The list of text nodes (recursively) found.

toString

public String toString()

Return a string representation of the contents of this tag, it's children and it's end tag suitable for debugging.

Specified by:: toString in interface Node
Overrides:: toString in class TagNode

Returns:: A textual representation of the tag.

getText

public String getText()

Return the text contained in this tag.

Specified by:: getText in interface Node
Overrides:: getText in class TagNode

Returns:: The complete contents of the tag (within the angle brackets).
See Also:: Node.setText(java.lang.String)

getStringText

public String getStringText()

Return the text between the start tag and the end tag.

Returns:: The contents of the CompositeTag.

toString

public void toString(int level,
                     StringBuffer buffer)

Return a string representation of the contents of this tag, it's children and it's end tag suitable for debugging.

Parameters:: level - The indentation level to use.; buffer - The buffer to append to.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

HTML Parser is an open source library released under Common Public License.

org.htmlparser.tags Class CompositeTag

mEndTag

mDefaultCompositeScanner

CompositeTag

children

getChild

getChildrenAsNodeArray

removeChild

elements

toPlainTextString

putChildrenInto

putEndTagInto

toHtml

searchByName

searchFor

searchFor

searchFor

searchFor

findPositionOf

findPositionOf

findPositionOf

childAt

collectInto

getChildrenHTML

accept

getChildCount

getEndTag

setEndTag

digupStringNode

toString

getText

getStringText

toString

org.htmlparser.tags
Class CompositeTag