HTML Parser Home Page

org.htmlparser.util
Class ParserUtils

java.lang.Object
  extended by org.htmlparser.util.ParserUtils

public class ParserUtils
extends Object


Constructor Summary
ParserUtils()
           
 
Method Summary
static Parser createParserParsingAnInputString(String input)
          Create a Parser Object having a String Object as input (instead of a url or a string representing the url location).
static Node[] findTypeInNode(Node node, Class type)
          Search given node and pick up any objects of given type.
static String removeChars(String s, char occur)
           
static String removeEscapeCharacters(String inputString)
           
static String removeTrailingBlanks(String text)
           
static String[] splitButChars(String input, String charsDoNotBeRemoved)
          Split the input string considering as string separator all the characters with the only exception of the characters specified in charsDoNotBeRemoved param.
static String[] splitButDigits(String input, String charsDoNotBeRemoved)
          Split the input string considering as string separator all the not numerical characters with the only exception of the characters specified in charsDoNotBeRemoved param.
static String[] splitChars(String input, String charsToBeRemoved)
          Split the input string considering as string separator the chars specified in the input variable charsToBeRemoved.
static String[] splitSpaces(String input, String charsToBeRemoved)
          Split the input string considering as string separator all the spaces and tabs like chars and the chars specified in the input variable charsToBeRemoved.
static String[] splitTags(String input, Class nodeType)
          Split the input string in a string array, considering the tags as delimiter for splitting.
static String[] splitTags(String input, Class nodeType, boolean recursive, boolean insideTag)
          Split the input string in a string array, considering the tags as delimiter for splitting.
static String[] splitTags(String input, NodeFilter filter)
          Split the input string in a string array, considering the tags as delimiter for splitting.
static String[] splitTags(String input, NodeFilter filter, boolean recursive, boolean insideTag)
          Split the input string in a string array, considering the tags as delimiter for splitting.
static String[] splitTags(String input, String[] tags)
          Split the input string in a string array, considering the tags as delimiter for splitting.
static String[] splitTags(String input, String[] tags, boolean recursive, boolean insideTag)
          Split the input string in a string array, considering the tags as delimiter for splitting.
static String trimAllTags(String input, boolean inside)
          Trim the input string, removing all the tags in the input string.
static String trimButChars(String input, String charsDoNotBeRemoved)
          Remove from the input string all the characters with the only exception of the characters specified in charsDoNotBeRemoved param.
static String trimButCharsBeginEnd(String input, String charsDoNotBeRemoved)
          Remove from the beginning and the end of the input string all the characters with the only exception of the characters specified in charsDoNotBeRemoved param.
static String trimButDigits(String input, String charsDoNotBeRemoved)
          Remove from the input string all the not numerical characters with the only exception of the characters specified in charsDoNotBeRemoved param.
static String trimButDigitsBeginEnd(String input, String charsDoNotBeRemoved)
          Remove from the beginning and the end of the input string all the not numerical characters with the only exception of the characters specified in charsDoNotBeRemoved param.
static String trimChars(String input, String charsToBeRemoved)
          Remove from the input string all the chars specified in the input variable charsToBeRemoved.
static String trimCharsBeginEnd(String input, String charsToBeRemoved)
          Remove from the beginning and the end of the input string all the chars specified in the input variable charsToBeRemoved.
static String trimSpaces(String input, String charsToBeRemoved)
          Remove from the input string all the spaces and tabs like chars.
static String trimSpacesBeginEnd(String input, String charsToBeRemoved)
          Remove from the beginning and the end of the input string all the spaces and tabs like chars.
static String trimTags(String input, Class nodeType)
          Trim all tags in the input string and return a string like the input one without the tags and their content.
static String trimTags(String input, Class nodeType, boolean recursive, boolean insideTag)
          Trim all tags in the input string and return a string like the input one without the tags and their content (optional).
static String trimTags(String input, NodeFilter filter)
          Trim all tags in the input string and return a string like the input one without the tags and their content.
static String trimTags(String input, NodeFilter filter, boolean recursive, boolean insideTag)
          Trim all tags in the input string and return a string like the input one without the tags and their content (optional).
static String trimTags(String input, String[] tags)
          Trim all tags in the input string and return a string like the input one without the tags and their content.
static String trimTags(String input, String[] tags, boolean recursive, boolean insideTag)
          Trim all tags in the input string and return a string like the input one without the tags and their content (optional).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ParserUtils

public ParserUtils()
Method Detail

removeChars

public static String removeChars(String s,
                                 char occur)

removeEscapeCharacters

public static String removeEscapeCharacters(String inputString)

removeTrailingBlanks

public static String removeTrailingBlanks(String text)

findTypeInNode

public static Node[] findTypeInNode(Node node,
                                    Class type)
Search given node and pick up any objects of given type.

Parameters:
node - The node to search.
type - The class to search for.
Returns:
A node array with the matching nodes.

splitButDigits

public static String[] splitButDigits(String input,
                                      String charsDoNotBeRemoved)
Split the input string considering as string separator all the not numerical characters with the only exception of the characters specified in charsDoNotBeRemoved param.
For example if you call splitButDigits("<DIV> +12.5, +3.4 </DIV>", "+."),
you obtain an array of strings {"+12.5", "+3.4"} as output (1,2,3,4 and 5 are digits and +,. are chars that do not be removed).

Parameters:
input - The string in input.
charsDoNotBeRemoved - The chars that do not be removed.
Returns:
The array of strings as output.

trimButDigits

public static String trimButDigits(String input,
                                   String charsDoNotBeRemoved)
Remove from the input string all the not numerical characters with the only exception of the characters specified in charsDoNotBeRemoved param.
For example if you call trimButDigits("<DIV> +12.5 </DIV>", "+."),
you obtain a string "+12.5" as output (1,2 and 5 are digits and +,. are chars that do not be removed).
For example if you call trimButDigits("<DIV> +1 2 . 5 </DIV>", "+."),
you obtain a string "+12.5" as output (the spaces between 1 and 2, 2 and ., . and 5 are removed).

Parameters:
input - The string in input.
charsDoNotBeRemoved - The chars that do not be removed.
Returns:
The string as output.

trimButDigitsBeginEnd

public static String trimButDigitsBeginEnd(String input,
                                           String charsDoNotBeRemoved)
Remove from the beginning and the end of the input string all the not numerical characters with the only exception of the characters specified in charsDoNotBeRemoved param.
The removal process removes only chars at the beginning and at the end of the string.
For example if you call trimButDigitsBeginEnd("<DIV> +12.5 </DIV>", "+."),
you obtain a string "+12.5" as output (1,2 and 5 are digits and +,. are chars that do not be removed).
For example if you call trimButDigitsBeginEnd("<DIV> +1 2 . 5 </DIV>", "+."),
you obtain a string "+1 2 . 5" as output (the spacess inside the string are not removed).

Parameters:
input - - The string in input.
charsDoNotBeRemoved - - The chars that do not be removed.
Returns:
The string as output.

splitSpaces

public static String[] splitSpaces(String input,
                                   String charsToBeRemoved)
Split the input string considering as string separator all the spaces and tabs like chars and the chars specified in the input variable charsToBeRemoved.
For example if you call splitSpaces("<DIV> +12.5, +3.4 </DIV>", "<>DIV/,"), <BR>you obtain an array of strings {"+12.5", "+3.4"} as output (space chars and <,>,D,I,V,/ and the comma are chars that must be removed).

Parameters:
input - The string in input.
charsToBeRemoved - The chars to be removed.
Returns:
The array of strings as output.

trimSpaces

public static String trimSpaces(String input,
                                String charsToBeRemoved)
Remove from the input string all the spaces and tabs like chars. Remove also the chars specified in the input variable charsToBeRemoved.
For example if you call trimSpaces("<DIV> +12.5 </DIV>", "<>DIV/"),
you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed).
For example if you call trimSpaces("<DIV> Trim All Spaces Also The Ones Inside The String </DIV>", "<>DIV/"),
you obtain a string "TrimAllSpacesAlsoTheOnesInsideTheString" as output (all the spaces inside the string are removed).

Parameters:
input - The string in input.
charsToBeRemoved - The chars to be removed.
Returns:
The string as output.

trimSpacesBeginEnd

public static String trimSpacesBeginEnd(String input,
                                        String charsToBeRemoved)
Remove from the beginning and the end of the input string all the spaces and tabs like chars. Remove also the chars specified in the input variable charsToBeRemoved.
The removal process removes only chars at the beginning and at the end of the string.
For example if you call trimSpacesBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/"),
you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed).
For example if you call trimSpacesBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/"),
you obtain a string "Trim all spaces but not the ones inside the string" as output (all the spaces inside the string are preserved).

Parameters:
input - The string in input.
charsToBeRemoved - The chars to be removed.
Returns:
The string as output.

splitButChars

public static String[] splitButChars(String input,
                                     String charsDoNotBeRemoved)
Split the input string considering as string separator all the characters with the only exception of the characters specified in charsDoNotBeRemoved param.
For example if you call splitButChars("<DIV> +12.5, +3.4 </DIV>", "+.1234567890"),
you obtain an array of strings {"+12.5", "+3.4"} as output (+,.,1,2,3,4,5,6,7,8,9,0 are chars that do not be removed).

Parameters:
input - The string in input.
charsDoNotBeRemoved - The chars that do not be removed.
Returns:
The array of strings as output.

trimButChars

public static String trimButChars(String input,
                                  String charsDoNotBeRemoved)
Remove from the input string all the characters with the only exception of the characters specified in charsDoNotBeRemoved param.
For example if you call trimButChars("<DIV> +12.5 </DIV>", "+.1234567890"),
you obtain a string "+12.5" as output (+,.,1,2,3,4,5,6,7,8,9,0 are chars that do not be removed).
For example if you call trimButChars("<DIV> +1 2 . 5 </DIV>", "+.1234567890"),
you obtain a string "+12.5" as output (the spaces between 1 and 2, 2 and ., . and 5 are removed).

Parameters:
input - The string in input.
charsDoNotBeRemoved - The chars that do not be removed.
Returns:
The string as output.

trimButCharsBeginEnd

public static String trimButCharsBeginEnd(String input,
                                          String charsDoNotBeRemoved)
Remove from the beginning and the end of the input string all the characters with the only exception of the characters specified in charsDoNotBeRemoved param.
The removal process removes only chars at the beginning and at the end of the string.
For example if you call trimButCharsBeginEnd("<DIV> +12.5 </DIV>", "+.1234567890"),
you obtain a string "+12.5" as output (+,.,1,2,3,4,5,6,7,8,9,0 are chars that do not be removed).
For example if you call trimButCharsBeginEnd("<DIV> +1 2 . 5 </DIV>", "+.1234567890"),
you obtain a string "+1 2 . 5" as output (the spaces inside the string are not removed).

Parameters:
input - The string in input.
charsDoNotBeRemoved - The chars that do not be removed.
Returns:
The string as output.

splitChars

public static String[] splitChars(String input,
                                  String charsToBeRemoved)
Split the input string considering as string separator the chars specified in the input variable charsToBeRemoved.
For example if you call splitChars("<DIV> +12.5, +3.4 </DIV>", " <>DIV/,"),
you obtain an array of strings {"+12.5", "+3.4"} as output (space chars and <,>,D,I,V,/ and the comma are chars that must be removed).

Parameters:
input - The string in input.
charsToBeRemoved - The chars to be removed.
Returns:
The array of strings as output.

trimChars

public static String trimChars(String input,
                               String charsToBeRemoved)
Remove from the input string all the chars specified in the input variable charsToBeRemoved.
For example if you call trimChars("<DIV> +12.5 </DIV>", "<>DIV/ "),
you obtain a string "+12.5" as output (<,>,D,I,V,/ and space char are chars that must be removed).
For example if you call trimChars("<DIV> Trim All Chars Also The Ones Inside The String </DIV>", "<>DIV/ "),
you obtain a string "TrimAllCharsAlsoTheOnesInsideTheString" as output (all the spaces inside the string are removed).

Parameters:
input - The string in input.
charsToBeRemoved - The chars to be removed.
Returns:
The string as output.

trimCharsBeginEnd

public static String trimCharsBeginEnd(String input,
                                       String charsToBeRemoved)
Remove from the beginning and the end of the input string all the chars specified in the input variable charsToBeRemoved.
The removal process removes only chars at the beginning and at the end of the string.
For example if you call trimCharsBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/ "),
you obtain a string "+12.5" as output (' ' is a space char and <,>,D,I,V,/ are chars that must be removed).
For example if you call trimCharsBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/ "),
you obtain a string "Trim all spaces but not the ones inside the string" as output (all the spaces inside the string are preserved).

Parameters:
input - The string in input.
charsToBeRemoved - The chars to be removed.
Returns:
The string as output.

splitTags

public static String[] splitTags(String input,
                                 String[] tags)
                          throws ParserException,
                                 UnsupportedEncodingException
Split the input string in a string array, considering the tags as delimiter for splitting.

Throws:
ParserException
UnsupportedEncodingException
See Also:
(String input, String[] tags, boolean recursive, boolean insideTag).

splitTags

public static String[] splitTags(String input,
                                 String[] tags,
                                 boolean recursive,
                                 boolean insideTag)
                          throws ParserException,
                                 UnsupportedEncodingException
Split the input string in a string array, considering the tags as delimiter for splitting.
For example if you call splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}),
you obtain a string array {"Begin ", " ALL OK"} as output (splitted <DIV> tags and their content recursively).
For example if you call splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, false),
you obtain a string array {"Begin ", "<DIV> +12.5 </DIV>", " ALL OK"} as output (splitted <DIV> tags and not their content and no recursively).
For example if you call splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, true, false),
you obtain a string array {"Begin ", " +12.5 ", " ALL OK"} as output (splitted <DIV> tags and not their content recursively).
For example if you call splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, true),
you obtain a string array {"Begin ", " ALL OK"} as output (splitted <DIV> tags and their content).

Parameters:
input - The string in input.
tags - The tags to be used as splitting delimiter.
recursive - Optional parameter (true if not present), if true delete all the tags recursively.
insideTag - Optional parameter (true if not present), if true delete also the content of the tags.
Returns:
The string array containing the strings delimited by tags.
Throws:
ParserException
UnsupportedEncodingException

splitTags

public static String[] splitTags(String input,
                                 Class nodeType)
                          throws ParserException,
                                 UnsupportedEncodingException
Split the input string in a string array, considering the tags as delimiter for splitting.
Use Class class as input parameter instead of tags[] string array.

Throws:
ParserException
UnsupportedEncodingException
See Also:
(String input, String[] tags, boolean recursive, boolean insideTag).

splitTags

public static String[] splitTags(String input,
                                 Class nodeType,
                                 boolean recursive,
                                 boolean insideTag)
                          throws ParserException,
                                 UnsupportedEncodingException
Split the input string in a string array, considering the tags as delimiter for splitting.
Use Class class as input parameter instead of tags[] string array.

Throws:
ParserException
UnsupportedEncodingException
See Also:
(String input, String[] tags, boolean recursive, boolean insideTag).

splitTags

public static String[] splitTags(String input,
                                 NodeFilter filter)
                          throws ParserException,
                                 UnsupportedEncodingException
Split the input string in a string array, considering the tags as delimiter for splitting.
Use NodeFilter class as input parameter instead of tags[] string array.

Throws:
ParserException
UnsupportedEncodingException
See Also:
(String input, String[] tags, boolean recursive, boolean insideTag).

splitTags

public static String[] splitTags(String input,
                                 NodeFilter filter,
                                 boolean recursive,
                                 boolean insideTag)
                          throws ParserException,
                                 UnsupportedEncodingException
Split the input string in a string array, considering the tags as delimiter for splitting.
Use NodeFilter class as input parameter instead of tags[] string array.

Throws:
ParserException
UnsupportedEncodingException
See Also:
(String input, String[] tags, boolean recursive, boolean insideTag).

trimAllTags

public static String trimAllTags(String input,
                                 boolean inside)
Trim the input string, removing all the tags in the input string.
The method trims all the substrings included in the input string of the following type: "<XXX>", where XXX could be a string of any type.
If you set to true the inside parameter, the method deletes also the YYY string in the following input string: "<XXX>YYY<ZZZ>", note that ZZZ is not necessary the closing tag of XXX.

Parameters:
input - The string in input.
inside - If true, it forces the method to delete also what is inside the tags.
Returns:
The string without tags.

trimTags

public static String trimTags(String input,
                              String[] tags)
                       throws ParserException,
                              UnsupportedEncodingException
Trim all tags in the input string and return a string like the input one without the tags and their content.

Throws:
ParserException
UnsupportedEncodingException
See Also:
(String input, String[] tags, boolean recursive, boolean insideTag).

trimTags

public static String trimTags(String input,
                              String[] tags,
                              boolean recursive,
                              boolean insideTag)
                       throws ParserException,
                              UnsupportedEncodingException
Trim all tags in the input string and return a string like the input one without the tags and their content (optional).
For example if you call trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}),
you obtain a string " ALL OK" as output (trimmed <DIV> tags and their content recursively).
For example if you call trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, false),
you obtain a string "<DIV> +12.5 </DIV> ALL OK" as output (trimmed <DIV> tags and not their content and no recursively).
For example if you call trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, true, false),
you obtain a string " +12.5 ALL OK" as output (trimmed <DIV> tags and not their content recursively).
For example if you call trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, true),
you obtain a string " ALL OK" as output (trimmed <DIV> tags and their content).

Parameters:
input - The string in input.
tags - The tags to be removed.
recursive - Optional parameter (true if not present), if true delete all the tags recursively.
insideTag - Optional parameter (true if not present), if true delete also the content of the tags.
Returns:
The string without tags.
Throws:
ParserException
UnsupportedEncodingException

trimTags

public static String trimTags(String input,
                              Class nodeType)
                       throws ParserException,
                              UnsupportedEncodingException
Trim all tags in the input string and return a string like the input one without the tags and their content.
Use Class class as input parameter instead of tags[] string array.

Throws:
ParserException
UnsupportedEncodingException
See Also:
(String input, String[] tags, boolean recursive, boolean insideTag).

trimTags

public static String trimTags(String input,
                              Class nodeType,
                              boolean recursive,
                              boolean insideTag)
                       throws ParserException,
                              UnsupportedEncodingException
Trim all tags in the input string and return a string like the input one without the tags and their content (optional).
Use Class class as input parameter instead of tags[] string array.

Throws:
ParserException
UnsupportedEncodingException
See Also:
(String input, String[] tags, boolean recursive, boolean insideTag).

trimTags

public static String trimTags(String input,
                              NodeFilter filter)
                       throws ParserException,
                              UnsupportedEncodingException
Trim all tags in the input string and return a string like the input one without the tags and their content.
Use NodeFilter class as input parameter instead of tags[] string array.

Throws:
ParserException
UnsupportedEncodingException
See Also:
(String input, String[] tags, boolean recursive, boolean insideTag).

trimTags

public static String trimTags(String input,
                              NodeFilter filter,
                              boolean recursive,
                              boolean insideTag)
                       throws ParserException,
                              UnsupportedEncodingException
Trim all tags in the input string and return a string like the input one without the tags and their content (optional).
Use NodeFilter class as input parameter instead of tags[] string array.

Throws:
ParserException
UnsupportedEncodingException
See Also:
(String input, String[] tags, boolean recursive, boolean insideTag).

createParserParsingAnInputString

public static Parser createParserParsingAnInputString(String input)
                                               throws ParserException,
                                                      UnsupportedEncodingException
Create a Parser Object having a String Object as input (instead of a url or a string representing the url location).
The string will be parsed as it would be a file.

Parameters:
input - The string in input.
Returns:
The Parser Object with the string as input stream.
Throws:
ParserException
UnsupportedEncodingException

© 2006 Derrick Oswald
Sep 17, 2006

HTML Parser is an open source library released under Common Public License. SourceForge.net