HTML Parser Home Page

Package org.htmlparser.filters

The filters package contains example filters to select only desired nodes.

See:
          Description

Class Summary
AndFilter Accepts nodes matching all of its predicate filters (AND operation).
CssSelectorNodeFilter A NodeFilter that accepts nodes based on whether they match a CSS2 selector.
HasAttributeFilter This class accepts all tags that have a certain attribute, and optionally, with a certain value.
HasChildFilter This class accepts all tags that have a child acceptable to the filter.
HasParentFilter This class accepts all tags that have a parent acceptable to another filter.
HasSiblingFilter This class accepts all tags that have a sibling acceptable to another filter.
IsEqualFilter This class accepts only one specific node.
LinkRegexFilter This class accepts tags of class LinkTag that contain a link matching a given regex pattern.
LinkStringFilter This class accepts tags of class LinkTag that contain a link matching a given pattern string.
NodeClassFilter This class accepts all tags of a given class.
NotFilter Accepts all nodes not acceptable to it's predicate filter.
OrFilter Accepts nodes matching any of its predicates filters (OR operation).
RegexFilter This filter accepts all string nodes matching a regular expression.
StringFilter This class accepts all string nodes containing the given string.
TagNameFilter This class accepts all tags matching the tag name.
 

Package org.htmlparser.filters Description

The filters package contains example filters to select only desired nodes. For example, to display tags having the "id" attribute, you could use:

Parser parser = new Parser ("http://yadda");
parser.parse (new HasAttributeFilter ("id"));
These filters can be combined to yield powerful extraction capabilities. For example, to get a list of links where the contents is an image, you could use:
NodeList list = new NodeList ();
NodeFilter filter =
    new AndFilter (
        new TagNameFilter ("A"),
        new HasChildFilter (
            new TagNameFilter ("IMG")));
for (NodeIterator e = parser.elements (); e.hasMoreNodes (); )
    e.nextNode ().collectInto (list, filter);


© 2006 Derrick Oswald
Sep 17, 2006

HTML Parser is an open source library released under Common Public License. SourceForge.net