|
HTML Parser Home Page | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface Node
Specifies the minimum requirements for nodes returned by the Lexer or Parser.
There are three types of nodes in HTML: text, remarks and tags. You may wish
to define your own nodes to be returned by the
Lexer
or Parser
, but each of the types
must support this interface.
More specific interface requirements for each of the node types are specified
by the Text
, Remark
and Tag
interfaces.
Method Summary | |
---|---|
void |
accept(NodeVisitor visitor)
Apply the visitor to this node. |
Object |
clone()
Allow cloning of nodes. |
void |
collectInto(NodeList list,
NodeFilter filter)
Collect this node and its child nodes into a list, provided the node satisfies the filtering criteria. |
void |
doSemanticAction()
Perform the meaning of this tag. |
NodeList |
getChildren()
Get the children of this node. |
int |
getEndPosition()
Gets the ending position of the node. |
Node |
getFirstChild()
Get the first child of this node. |
Node |
getLastChild()
Get the last child of this node. |
Node |
getNextSibling()
Get the next sibling to this node. |
Page |
getPage()
Get the page this node came from. |
Node |
getParent()
Get the parent of this node. |
Node |
getPreviousSibling()
Get the previous sibling to this node. |
int |
getStartPosition()
Gets the starting position of the node. |
String |
getText()
Returns the text of the node. |
void |
setChildren(NodeList children)
Set the children of this node. |
void |
setEndPosition(int position)
Sets the ending position of the node. |
void |
setPage(Page page)
Set the page this node came from. |
void |
setParent(Node node)
Sets the parent of this node. |
void |
setStartPosition(int position)
Sets the starting position of the node. |
void |
setText(String text)
Sets the string contents of the node. |
String |
toHtml()
Return the HTML for this node. |
String |
toHtml(boolean verbatim)
Return the HTML for this node. |
String |
toPlainTextString()
A string representation of the node. |
String |
toString()
Return the string representation of the node. |
Method Detail |
---|
String toPlainTextString()
for (Enumeration e = parser.elements (); e.hasMoreElements ();) // or do whatever processing you wish with the plain text string System.out.println ((Node)e.nextElement ()).toPlainTextString ());
String toHtml()
String toHtml(boolean verbatim)
verbatim
- If true
return as close to the original
page text as possible.
String toString()
System.out.println (node);or within a debugging environment.
toString
in class Object
void collectInto(NodeList list, NodeFilter filter)
This mechanism allows powerful filtering code to be written very
easily, without bothering about collection of embedded tags separately.
e.g. when we try to get all the links on a page, it is not possible to
get it at the top-level, as many tags (like form tags), can contain
links embedded in them. We could get the links out by checking if the
current node is a CompositeTag
, and going
through its children. So this method provides a convenient way to do
this.
Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:
NodeList list = new NodeList (); NodeFilter filter = new TagNameFilter ("A"); for (NodeIterator e = parser.elements (); e.hasMoreNodes ();) e.nextNode ().collectInto (list, filter);Thus,
list
will hold all the link nodes, irrespective of how
deep the links are embedded.
Another way to accomplish the same objective is:
NodeList list = new NodeList (); NodeFilter filter = new TagClassFilter (LinkTag.class); for (NodeIterator e = parser.elements (); e.hasMoreNodes ();) e.nextNode ().collectInto (list, filter);This is slightly less specific because the LinkTag class may be registered for more than one node name, e.g. <LINK> tags too.
list
- The list to collect nodes into.filter
- The criteria to use when deciding if a node should
be added to the list.int getStartPosition()
setStartPosition(int)
void setStartPosition(int position)
position
- The new start position.getStartPosition()
int getEndPosition()
setEndPosition(int)
void setEndPosition(int position)
position
- The new end position.getEndPosition()
Page getPage()
setPage(org.htmlparser.lexer.Page)
void setPage(Page page)
page
- The page that supplied this node.getPage()
void accept(NodeVisitor visitor)
visitor
- The visitor to this node.Node getParent()
Lexer
.
Currently, the object returned from this method can be safely cast to a
CompositeTag
, but this behaviour should not
be expected in the future.
null
otherwise.setParent(org.htmlparser.Node)
void setParent(Node node)
node
- The node that contains this node.getParent()
NodeList getChildren()
null
otherwise.setChildren(org.htmlparser.util.NodeList)
void setChildren(NodeList children)
children
- The new list of children this node contains.getChildren()
Node getFirstChild()
null
otherwise.Node getLastChild()
null
otherwise.Node getPreviousSibling()
null
otherwise.Node getNextSibling()
null
otherwise.String getText()
setText(java.lang.String)
void setText(String text)
text
- The new text for the node.getText()
void doSemanticAction() throws ParserException
getChildren()
.
ParserException
- If a problem is encountered performing the
semantic action.Object clone() throws CloneNotSupportedException
will be true, and that the expression:x.clone() != x
will be true, but these are not absolute requirements. While it is typically the case that:x.clone().getClass() == x.getClass()
will be true, this is not an absolute requirement.x.clone().equals(x)
By convention, the returned object should be obtained by calling super.clone. If a class and all of its superclasses (except Object) obey this convention, it will be the case that x.clone().getClass() == x.getClass().
By convention, the object returned by this method should be independent of this object (which is being cloned). To achieve this independence, it may be necessary to modify one or more fields of the object returned by super.clone before returning it. Typically, this means copying any mutable objects that comprise the internal "deep structure" of the object being cloned and replacing the references to these objects with references to the copies. If a class contains only primitive fields or references to immutable objects, then it is usually the case that no fields in the object returned by super.clone need to be modified.
The method clone for class Object performs a specific cloning operation. First, if the class of this object does not implement the interface Cloneable, then a CloneNotSupportedException is thrown. Note that all arrays are considered to implement the interface Cloneable. Otherwise, this method creates a new instance of the class of this object and initializes all its fields with exactly the contents of the corresponding fields of this object, as if by assignment; the contents of the fields are not themselves cloned. Thus, this method performs a "shallow copy" of this object, not a "deep copy" operation.
The class Object does not itself implement the interface Cloneable, so calling the clone method on an object whose class is Object will result in throwing an exception at run time.
CloneNotSupportedException
- if the object's class does not
support the Cloneable
interface. Subclasses
that override the clone
method can also
throw this exception to indicate that an instance cannot
be cloned.Cloneable
|
© 2006 Derrick Oswald Sep 17, 2006
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
HTML Parser is an open source library released under Common Public License. |