HTML Parser Home Page

org.htmlparser.scanners
Class ScriptScanner

java.lang.Object
  extended by org.htmlparser.scanners.TagScanner
      extended by org.htmlparser.scanners.CompositeTagScanner
          extended by org.htmlparser.scanners.ScriptScanner
All Implemented Interfaces:
Serializable, Scanner

public class ScriptScanner
extends CompositeTagScanner

The ScriptScanner handles script CDATA.

See Also:
Serialized Form

Field Summary
static boolean STRICT
          Strict parsing of CDATA flag.
 
Constructor Summary
ScriptScanner()
          Create a script scanner.
 
Method Summary
 Tag scan(Tag tag, Lexer lexer, NodeList stack)
          Scan for script.
 
Methods inherited from class org.htmlparser.scanners.CompositeTagScanner
addChild, createVirtualEndTag, finishTag, isTagToBeEndedFor
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

STRICT

public static boolean STRICT
Strict parsing of CDATA flag. If this flag is set true, the parsing of script is performed without regard to quotes. This means that erroneous script such as:
 document.write("</script>");
 
will be parsed in strict accordance with appendix B.3.2 Specifying non-HTML data of the HTML 4.01 Specification and hence will be split into two or more nodes. Correct javascript would escape the ETAGO:
 document.write("<\/script>");
 
If true, CDATA parsing will stop at the first ETAGO ("</") no matter whether it is quoted or not. If false, balanced quotes (either single or double) will shield an ETAGO. Beacuse of the possibility of quotes within single or multiline comments, these are also parsed. In most cases, users prefer non-strict handling since there is so much broken script out in the wild.

Constructor Detail

ScriptScanner

public ScriptScanner()
Create a script scanner.

Method Detail

scan

public Tag scan(Tag tag,
                Lexer lexer,
                NodeList stack)
         throws ParserException
Scan for script. Accumulates text from the page, until </[a-zA-Z] is encountered.

Specified by:
scan in interface Scanner
Overrides:
scan in class CompositeTagScanner
Parameters:
tag - The tag this scanner is responsible for.
lexer - The source of CDATA.
stack - The parse stack, not used.
Returns:
The resultant tag (may be unchanged).
Throws:
ParserException - if an unrecoverable problem occurs.

© 2006 Derrick Oswald
Sep 17, 2006

HTML Parser is an open source library released under Common Public License. SourceForge.net