HTML Parser Home Page

Package org.htmlparser.lexerapplications.thumbelina

Extract the images behind thumbnail images.


Class Summary
Picture Class to track pictures within the frame.
PicturePanel Hold and display a group of pictures.
Sequencer Display received images at a constant rate.
Thumbelina View images behind thumbnails.
ThumbelinaFrame Encapsulate a Thumbelina bean and add menu and preferences support.
TileSet Class to track picture regions.

Package org.htmlparser.lexerapplications.thumbelina Description

Extract the images behind thumbnail images. This package is a demonstration of filtering the tags that are produced by the Lexer package. In this case the idea is to find links to known types of image file (.gif, .png and .jpg) that have as the link text a reference to a smaller or lower resolution image, often called a thumbnail image; hence the name.

Besides a lot of support code to provide a user interface, the heart of the process is found in Thumbelina.extractImageLinks(), which has a wee state machine that notes when an <IMG> tag is discovered within the body of an <A></A> tag pair. This triggers a fetch of the HREF (image file).

The fetch is performed in the background by the ToolKit image loading code which runs 4 threads (on my machine). When an image is received it is added to the list of pending images. This list is drained by the Sequencer as it presents images at fixed intervals.

The TileSet and Picture classes provide a framework for displaying the various sizes of image that arrive in a random way, while still being able to repaint the panel when required.

The images are only retained in memory long enough to get covered over by subsequent images, but in general, the manipulation of images is a memory intensive task which requires a higher than normal limit on the maximum heap memory, i.e. use the -Xms256M command line switch to avoid java.lang.OutOfMemoryError messages.

The rest is just the UI code, that can be altered by intrepid programmers as they see fit.


  • Fix race condition that background thread adds new URL's after a reset.
  • Send output to log window instead of URL's in titlebar.
  • Add pending list items as greyed out items to the history list.
  • Make status bar a pipeline with valves and limit switches (better on/off buttons).
  • Fix race condition that sometimes doesn't resize PicturePanel with frame.
  • Tree view.
  • Drag and drop support.
  • JavaHelp.
  • Allow filter configuration.
  • Handle OutOfMemoryError more gracefully (trap System.err?).
  • Add more background threads.
  • Find out how to honour reset on the image fetcher threads.

  • © 2005 Derrick Oswald
    Jun 10, 2006

    HTML Parser is an open source library released under LGPL.