|
HTML Parser Home Page | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.htmlparser.nodes.AbstractNode org.htmlparser.nodes.TagNode org.htmlparser.tags.CompositeTag
public class CompositeTag
The base class for tags that have an end tag.
Provided extra accessors for the children above and beyond what the basic
Tag
provides. Also handles the conversion of it's children for
the toHtml
method.
Field Summary | |
---|---|
protected static CompositeTagScanner |
mDefaultCompositeScanner
The default scanner for non-composite tags. |
protected Tag |
mEndTag
The tag that causes this tag to finish. |
Fields inherited from class org.htmlparser.nodes.TagNode |
---|
breakTags, mAttributes, mDefaultScanner |
Fields inherited from class org.htmlparser.nodes.AbstractNode |
---|
children, mPage, nodeBegin, nodeEnd, parent |
Constructor Summary | |
---|---|
CompositeTag()
Create a composite tag. |
Method Summary | |
---|---|
void |
accept(NodeVisitor visitor)
Tag visiting code. |
Node |
childAt(int index)
Get child at given index |
SimpleNodeIterator |
children()
Get an iterator over the children of this node. |
void |
collectInto(NodeList list,
NodeFilter filter)
Collect this node and its child nodes (if-applicable) into the list parameter, provided the node satisfies the filtering criteria. |
Text[] |
digupStringNode(String searchText)
Finds a text node, however embedded it might be, and returns it. |
SimpleNodeIterator |
elements()
Return the child tags as an iterator. |
int |
findPositionOf(Node searchNode)
Returns the node number of a child node given the node object. |
int |
findPositionOf(String text)
Returns the node number of the first node containing the given text. |
int |
findPositionOf(String text,
Locale locale)
Returns the node number of the first node containing the given text. |
Node |
getChild(int index)
Get the child of this node at the given position. |
int |
getChildCount()
Return the number of child nodes in this tag. |
Node[] |
getChildrenAsNodeArray()
Get the children as an array of Node objects. |
String |
getChildrenHTML()
Return the HTML code for the children of this tag. |
Tag |
getEndTag()
Get the end tag for this tag. |
String |
getStringText()
Return the text between the start tag and the end tag. |
String |
getText()
Return the text contained in this tag. |
protected void |
putChildrenInto(StringBuffer sb,
boolean verbatim)
Add the textual contents of the children of this node to the buffer. |
protected void |
putEndTagInto(StringBuffer sb,
boolean verbatim)
Add the textual contents of the end tag of this node to the buffer. |
void |
removeChild(int i)
Remove the child at the position given. |
Tag |
searchByName(String name)
Searches all children who for a name attribute. |
NodeList |
searchFor(Class classType,
boolean recursive)
Collect all objects that are of a certain type Note that this will not check for parent types, and will not recurse through child tags |
NodeList |
searchFor(String searchString)
Searches for all nodes whose text representation contains the search string. |
NodeList |
searchFor(String searchString,
boolean caseSensitive)
Searches for all nodes whose text representation contains the search string. |
NodeList |
searchFor(String searchString,
boolean caseSensitive,
Locale locale)
Searches for all nodes whose text representation contains the search string. |
void |
setEndTag(Tag tag)
Set the end tag for this tag. |
String |
toHtml(boolean verbatim)
Return this tag as HTML code. |
String |
toPlainTextString()
Return the textual contents of this tag and it's children. |
String |
toString()
Return a string representation of the contents of this tag, it's children and it's end tag suitable for debugging. |
void |
toString(int level,
StringBuffer buffer)
Return a string representation of the contents of this tag, it's children and it's end tag suitable for debugging. |
Methods inherited from class org.htmlparser.nodes.TagNode |
---|
breaksFlow, getAttribute, getAttributeEx, getAttributesEx, getEnders, getEndingLineNumber, getEndTagEnders, getIds, getRawTagName, getStartingLineNumber, getTagBegin, getTagEnd, getTagName, getThisScanner, isEmptyXmlTag, isEndTag, removeAttribute, setAttribute, setAttribute, setAttribute, setAttributeEx, setAttributesEx, setEmptyXmlTag, setTagBegin, setTagEnd, setTagName, setText, setThisScanner |
Methods inherited from class org.htmlparser.nodes.AbstractNode |
---|
clone, doSemanticAction, getChildren, getEndPosition, getFirstChild, getLastChild, getNextSibling, getPage, getParent, getPreviousSibling, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHtml |
Methods inherited from class java.lang.Object |
---|
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Methods inherited from interface org.htmlparser.Node |
---|
clone, doSemanticAction, getChildren, getEndPosition, getFirstChild, getLastChild, getNextSibling, getPage, getParent, getPreviousSibling, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHtml |
Field Detail |
---|
protected Tag mEndTag
protected static final CompositeTagScanner mDefaultCompositeScanner
Constructor Detail |
---|
public CompositeTag()
Method Detail |
---|
public SimpleNodeIterator children()
public Node getChild(int index)
index
- The in the node list of the child.
public Node[] getChildrenAsNodeArray()
Node
objects.
public void removeChild(int i)
i
- The index of the child to remove.public SimpleNodeIterator elements()
public String toPlainTextString()
toPlainTextString
in interface Node
toPlainTextString
in class TagNode
protected void putChildrenInto(StringBuffer sb, boolean verbatim)
verbatim
- If true
return as close to the original
page text as possible.sb
- The buffer to append to.protected void putEndTagInto(StringBuffer sb, boolean verbatim)
verbatim
- If true
return as close to the original
page text as possible.sb
- The buffer to append to.public String toHtml(boolean verbatim)
toHtml
in interface Node
toHtml
in class TagNode
verbatim
- If true
return as close to the original
page text as possible.
Node.toHtml()
public Tag searchByName(String name)
name
- Attribute to match in tag
public NodeList searchFor(String searchString)
NodeList nodeList = formTag.searchFor("Hello World");
searchString
- Search criterion.
searchString
in them.public NodeList searchFor(String searchString, boolean caseSensitive)
NodeList nodeList = formTag.searchFor("Hello World");
searchString
- Search criterion.caseSensitive
- If true
this search should be case
sensitive. Otherwise, the search string and the node text are converted
to uppercase using an English locale.
searchString
in them.public NodeList searchFor(String searchString, boolean caseSensitive, Locale locale)
NodeList nodeList = formTag.searchFor("Hello World");
searchString
- Search criterion.caseSensitive
- If true
this search should be case
sensitive. Otherwise, the search string and the node text are converted
to uppercase using the locale provided.locale
- The locale for uppercase conversion.
searchString
in them.public NodeList searchFor(Class classType, boolean recursive)
classType
- The class to search for.recursive
- If true, recursively search through the children.
public int findPositionOf(String text)
text
- The text to search for.
(String, Locale)
public int findPositionOf(String text, Locale locale)
locale
- The locale to use in converting to uppercase.text
- The text to search for.
public int findPositionOf(Node searchNode)
searchNode
- The child node to find.
public Node childAt(int index)
index
- The index into the child node list.
public void collectInto(NodeList list, NodeFilter filter)
This mechanism allows powerful filtering code to be written very easily,
without bothering about collection of embedded tags separately.
e.g. when we try to get all the links on a page, it is not possible to
get it at the top-level, as many tags (like form tags), can contain
links embedded in them. We could get the links out by checking if the
current node is a CompositeTag
, and going through its children.
So this method provides a convenient way to do this.
Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:
NodeList list = new NodeList(); NodeFilter filter = new TagNameFilter ("A"); for (NodeIterator e = parser.elements(); e.hasMoreNodes();) e.nextNode().collectInto(list, filter);Thus,
list
will hold all the link nodes, irrespective of how
deep the links are embedded.
Another way to accomplish the same objective is:
NodeList list = new NodeList(); NodeFilter filter = new TagClassFilter (LinkTag.class); for (NodeIterator e = parser.elements(); e.hasMoreNodes();) e.nextNode().collectInto(list, filter);This is slightly less specific because the LinkTag class may be registered for more than one node name, e.g. <LINK> tags too.
collectInto
in interface Node
collectInto
in class AbstractNode
list
- The list to add nodes to.filter
- The filter to apply.org.htmlparser.filters
public String getChildrenHTML()
public void accept(NodeVisitor visitor)
accept()
on the start tag and then
walks the child list invoking accept()
on each
of the children, finishing up with an accept()
call on the end tag. If shouldRecurseSelf()
returns true it then asks the visitor to visit itself.
accept
in interface Node
accept
in class TagNode
visitor
- The NodeVisitor
object to be signalled
for each child and possibly this tag.public int getChildCount()
public Tag getEndTag()
getEndTag
in interface Tag
getEndTag
in class TagNode
Tag.setEndTag(org.htmlparser.Tag)
public void setEndTag(Tag tag)
setEndTag
in interface Tag
setEndTag
in class TagNode
tag
- The new end tag for this tag.
Note: no checking is perfromed so you can generate bad HTML by setting
the end tag with a name not equal to the name of the start tag,
i.e. <LABEL>The label</TITLE>Tag.getEndTag()
public Text[] digupStringNode(String searchText)
searchText
- The text to search for.
public String toString()
toString
in interface Node
toString
in class TagNode
public String getText()
getText
in interface Node
getText
in class TagNode
Node.setText(java.lang.String)
public String getStringText()
public void toString(int level, StringBuffer buffer)
level
- The indentation level to use.buffer
- The buffer to append to.
|
© 2006 Derrick Oswald Sep 17, 2006
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
HTML Parser is an open source library released under Common Public License. |