org.htmlparser.scanners (HTML Parser 2.0)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

HTML Parser Home Page

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES

Package org.htmlparser.scanners

The scanners package contains classes responsible for the tertiary identification of tags.

See:
Description

Interface Summary
Scanner	Generic interface for scanning.

Class Summary
CompositeTagScanner	The main scanning logic for nested tags.
JspScanner	Placeholder for yet to be written scanner for JSP tags.
ScriptDecoder	Decode script.
ScriptScanner	The ScriptScanner handles script CDATA.
StyleScanner	The StyleScanner handles style elements.
TagScanner	TagScanner is an abstract superclass, subclassed to create specific scanners.

Package org.htmlparser.scanners Description

The scanners package contains classes responsible for the tertiary identification of tags. The lower level classes in the lexer package convert byte streams to characters and characters to nodes (via the NodeFactory). In the case of tags, the scanners in this package can then complete the tag or override the current tag and return an augmented tag. The existing implementation of the composite tag scanner, for example, gathers the children of composite tags, identifying the nested structure of HTML documents. The script scanner overrides the nodes returned by the lexer and creates a tag containing a single string that is the script code.
You might need to create a scanner (that implements the Scanner interface) if the text you are trying to parse doesn't look like HTML, as is the case for the script scanner, or the normal processing of tags by nesting their structure is inadequate.