HTML Parser Home Page

Package org.htmlparser.http

The http package is responsible for HTTP connections to servers.

See:
          Description

Interface Summary
ConnectionMonitor Interface for HTTP connection notification callbacks.
 

Class Summary
ConnectionManager Handles proxies, password protected URLs and request properties including cookies.
Cookie A HTTP cookie.
HttpHeader Utility methods to display HTTP headers.
 

Package org.htmlparser.http Description

The http package is responsible for HTTP connections to servers. The Lexer and Parser provide many ways to supply text to be parsed, but this package only deals with cases where a URL is supplied as a string, with the expectation that the Lexer or Parser will perform the HTTP connection.

The ConnectionManager class adds

capabilities when accessing the internet via the HTTP protocol. Each of these capabilities requires conditioning the HTTP connection. A HTTP header utility class is also included.

The ConnectionMonitor interface is a callback mechanism for the ConnectionManager to notify an interested application when an HTTP connection is made. Example uses may include conditioning the connection further, accessing HTTP header information, or providing reporting or statistical functions. Callbacks are not performed for FileURLConnections, which are also handled by the connection manager.

The Cookie class is a container for cookie data received and sent in HTTP requests and responses. It may be necessary to prime the ConnectionManager with cookies received via a login procedure in order to access protected HTML content.

A typical use of this package, might look something like this:

ConnectionManager manager = Parser.getConnectionManager ();
// set up proxying
manager.setProxyHost ("proxyhost.mycompany.com");
manager.setProxyPort (8888);
manager.setProxyUser ("FredBarnes");
manager.setProxyPassword ("secret");
// set up cookies
Cookie cookie = new Cookie ("USER", "FreddyBaby");
manager.setCookie (cookie, "www.freshmeat.net");
cookie = new Cookie ("PHPSESSID", "e5dbeb6152e70d99427f2458d8969f8b");
cookie.setDomain (".freshmeat.net");
manager.setCookie (cookie, null);
// set up security to access a password protected URL
manager.setUser ("FredB");
manager.setPassword ("holy$cow");
// set up (an inner class) for callbacks
ConnectionMonitor monitor = new ConnectionMonitor ()
    {
        public void preConnect (HttpURLConnection connection)
        {
            System.out.println (HttpHeader.getRequestHeader (connection));
        }
        public void postConnect (HttpURLConnection connection)
        {
            System.out.println (HttpHeader.getResponseHeader (connection));
        }
    };
manager.setMonitor (monitor);
// perform the connection
Parser parser = new Parser ("http://frehmeat.net");
The ConnectionManager used by the Parser class is actually held by the Page class. It is accessible from the Parser (or the Page class) via getConnectionManager(). It is a static (singleton) instance so that subsequent connections made by the parser will use the contents of the cookie jar from previous connections. By default, cookie processing is not enabled. It can be enabled by either setting a cookie or using setCookieProcessingEnabled().


© 2005 Derrick Oswald
Jun 10, 2006

HTML Parser is an open source library released under LGPL. SourceForge.net