How to Use the HTML Parser libraries

Step 1: Java

You should make sure that a Java development system (JDK) is installed, not just a Java runtime (JRE). If you are working in an IDE (Integrated Development Environment) this is usually taken care of for you. If you are using just a command line, should see help information when you type:
  javac
  
Java versions greater than 1.2 are supported for the parser, and Java 1.1 for the lexer. You can check your version with the command:
  java -version
  
If you are using Java 5, you may need to specify option "-source 1.3" to avoid some warnings.

Step 2: Setting the CLASSPATH

To use the HTML Parser you will need to add the htmlparser.jar to the classpath. This jar includes all the files in htmllexer.jar, which is the subset of classes used by the lexer. If you are using an IDE, you need to add the htmlparser.jar to the list of jars/libraries used by your project.

NetBeans

Eclipse

Command Line

You can either add the jar to the CLASSPATH environment variable, or specify it each time on the command line:
Windows
set CLASSPATH=[htmlp_dir]\lib\htmlparser.jar;%CLASSPATH%
where [htmlp_dir] is the directory where you unzipped the distribution: xxx\htmlparser1_5, or use:
javac -classpath=[htmlp_dir]\lib\htmlparser.jar  MyProgram.java
Linux
export CLASSPATH=[htmlp_dir]/lib/htmlparser.jar:$CLASSPATH
where [htmlp_dir] is the directory where you unzipped the distribution: xxx/htmlparser1_5, or use
javac -classpath=[htmlp_dir]/lib/htmlparser.jar  MyProgram.java

Step 3: Import Necessary Classes

Whatever classes you use from the HTML Parser libraries will need to be imported by your program. For example, the simplest usage is:
    import org.htmlparser.Parser;
    import org.htmlparser.util.NodeList;
    import org.htmlparser.util.ParserException;
  
    class Test
    {
        public static void main (String[] args)
        {
            try
            {
                Parser parser = new Parser (args[0]);
                NodeList list = parser.parse (null);
                System.out.println (list.toHtml ());
            }
            catch (ParserException pe)
            {
                pe.printStackTrace ();
            }
        }
    }
  
Note the import statements may also have been written:
    import org.htmlparser.*;
    import org.htmlparser.util.*;
  

Step 4: Compile & Run

Within an IDE the compile and execute steps are usually combined.

NetBeans

Eclipse

Command Line

The above program in a file called Test.java can be compiled and run with the commands:
Windows
  javac -classpath=[htmlp_dir]\lib\htmlparser.jar  Test.java
  java -classpath=.;[htmlp_dir]\lib\htmlparser.jar  Test.java
  
Linux
  javac -classpath=[htmlp_dir]/lib/htmlparser.jar  Test.java
  java -classpath=.:[htmlp_dir]/lib/htmlparser.jar  Test.java