Class StringExtractor


  • public class StringExtractor
    extends java.lang.Object
    Extract plaintext strings from a web page. Illustrative program to gather the textual contents of a web page. Uses a StringBean to accumulate the user visible text (what a browser would display) into a single string.
    • Constructor Summary

      Constructors 
      Constructor Description
      StringExtractor​(java.lang.String resource)
      Construct a StringExtractor to read from the given resource.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String extractStrings​(boolean links)
      Extract the text from a page.
      static void main​(java.lang.String[] args)
      Mainline.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • StringExtractor

        public StringExtractor​(java.lang.String resource)
        Construct a StringExtractor to read from the given resource.
        Parameters:
        resource - Either a URL or a file name.
    • Method Detail

      • extractStrings

        public java.lang.String extractStrings​(boolean links)
                                        throws ParserException
        Extract the text from a page.
        Parameters:
        links - if true include hyperlinks in output.
        Returns:
        The textual contents of the page.
        Throws:
        ParserException - If a parse error occurs.
      • main

        public static void main​(java.lang.String[] args)
        Mainline.
        Parameters:
        args - The command line arguments.