Package weka.core

Class Stopwords

  • All Implemented Interfaces:
    RevisionHandler

    public class Stopwords
    extends java.lang.Object
    implements RevisionHandler
    Class that can test whether a given string is a stop word. Lowercases all words before the test.

    The format for reading and writing is one word per line, lines starting with '#' are interpreted as comments and therefore skipped.

    The default stopwords are based on Rainbow.

    Accepts the following parameter:

    -i file
    loads the stopwords from the given file

    -o file
    saves the stopwords to the given file

    -p
    outputs the current stopwords on stdout

    Any additional parameters are interpreted as words to test as stopwords.

    Version:
    $Revision: 1.6 $
    Author:
    Eibe Frank (eibe@cs.waikato.ac.nz), Ashraf M. Kibriya (amk14@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
    • Constructor Summary

      Constructors 
      Constructor Description
      Stopwords()
      initializes the stopwords (based on Rainbow).
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void add​(java.lang.String word)
      adds the given word to the stopword list (is automatically converted to lower case and trimmed)
      void clear()
      removes all stopwords
      java.util.Enumeration elements()
      Returns a sorted enumeration over all stored stopwords
      java.lang.String getRevision()
      Returns the revision string.
      boolean is​(java.lang.String word)
      Returns true if the given string is a stop word.
      static boolean isStopword​(java.lang.String str)
      Returns true if the given string is a stop word.
      static void main​(java.lang.String[] args)
      Accepts the following parameter:
      void read​(java.io.BufferedReader reader)
      Generates a new Stopwords object from the reader.
      void read​(java.io.File file)
      Generates a new Stopwords object from the given file
      void read​(java.lang.String filename)
      Generates a new Stopwords object from the given file
      boolean remove​(java.lang.String word)
      removes the word from the stopword list
      java.lang.String toString()
      returns the current stopwords in a string
      void write​(java.io.BufferedWriter writer)
      Writes the current stopwords to the given writer.
      void write​(java.io.File file)
      Writes the current stopwords to the given file
      void write​(java.lang.String filename)
      Writes the current stopwords to the given file
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • Stopwords

        public Stopwords()
        initializes the stopwords (based on Rainbow).
    • Method Detail

      • clear

        public void clear()
        removes all stopwords
      • add

        public void add​(java.lang.String word)
        adds the given word to the stopword list (is automatically converted to lower case and trimmed)
        Parameters:
        word - the word to add
      • remove

        public boolean remove​(java.lang.String word)
        removes the word from the stopword list
        Parameters:
        word - the word to remove
        Returns:
        true if the word was found in the list and then removed
      • is

        public boolean is​(java.lang.String word)
        Returns true if the given string is a stop word.
        Parameters:
        word - the word to test
        Returns:
        true if the word is a stopword
      • elements

        public java.util.Enumeration elements()
        Returns a sorted enumeration over all stored stopwords
        Returns:
        the enumeration over all stopwords
      • read

        public void read​(java.lang.String filename)
                  throws java.lang.Exception
        Generates a new Stopwords object from the given file
        Parameters:
        filename - the file to read the stopwords from
        Throws:
        java.lang.Exception - if reading fails
      • read

        public void read​(java.io.File file)
                  throws java.lang.Exception
        Generates a new Stopwords object from the given file
        Parameters:
        file - the file to read the stopwords from
        Throws:
        java.lang.Exception - if reading fails
      • read

        public void read​(java.io.BufferedReader reader)
                  throws java.lang.Exception
        Generates a new Stopwords object from the reader. The reader is closed automatically.
        Parameters:
        reader - the reader to get the stopwords from
        Throws:
        java.lang.Exception - if reading fails
      • write

        public void write​(java.lang.String filename)
                   throws java.lang.Exception
        Writes the current stopwords to the given file
        Parameters:
        filename - the file to write the stopwords to
        Throws:
        java.lang.Exception - if writing fails
      • write

        public void write​(java.io.File file)
                   throws java.lang.Exception
        Writes the current stopwords to the given file
        Parameters:
        file - the file to write the stopwords to
        Throws:
        java.lang.Exception - if writing fails
      • write

        public void write​(java.io.BufferedWriter writer)
                   throws java.lang.Exception
        Writes the current stopwords to the given writer. The writer is closed automatically.
        Parameters:
        writer - the writer to get the stopwords from
        Throws:
        java.lang.Exception - if writing fails
      • toString

        public java.lang.String toString()
        returns the current stopwords in a string
        Overrides:
        toString in class java.lang.Object
        Returns:
        the current stopwords
      • isStopword

        public static boolean isStopword​(java.lang.String str)
        Returns true if the given string is a stop word.
        Parameters:
        str - the word to test
        Returns:
        true if the word is a stopword
      • getRevision

        public java.lang.String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface RevisionHandler
        Returns:
        the revision
      • main

        public static void main​(java.lang.String[] args)
                         throws java.lang.Exception
        Accepts the following parameter:

        -i file
        loads the stopwords from the given file

        -o file
        saves the stopwords to the given file

        -p
        outputs the current stopwords on stdout

        Any additional parameters are interpreted as words to test as stopwords.

        Parameters:
        args - commandline parameters
        Throws:
        java.lang.Exception - if something goes wrong