|
HTML Parser Home Page | |||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
---|---|
Scanner | Generic interface for scanning. |
Class Summary | |
---|---|
CompositeTagScanner | The main scanning logic for nested tags. |
JspScanner | Placeholder for yet to be written scanner for JSP tags. |
ScriptDecoder | Decode script. |
ScriptScanner | The ScriptScanner handles script CDATA. |
StyleScanner | The StyleScanner handles style elements. |
TagScanner | TagScanner is an abstract superclass, subclassed to create specific scanners. |
The scanners package contains classes responsible for the tertiary
identification of tags. The lower level classes in the lexer
package convert
byte streams to characters and characters to nodes (via the NodeFactory
). In the case of tags, the
scanners in this package can then complete the tag or override the current tag
and return an augmented tag. The existing implementation of the composite tag
scanner
, for example, gathers the children of composite tags, identifying the
nested structure of HTML documents. The script scanner
overrides the nodes
returned by the lexer and creates a tag containing a single string that is the
script code.
You might need to create a scanner (that implements the
Scanner
interface) if
the text you are trying to parse doesn't look like HTML, as is the case for the
script scanner, or the normal processing of tags by nesting their structure is
inadequate.
|
© 2006 Derrick Oswald Sep 17, 2006
|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
HTML Parser is an open source library released under Common Public License. |