Class Bagging

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, AdditionalMeasureProducer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler

    public class Bagging
    extends RandomizableIteratedSingleClassifierEnhancer
    implements WeightedInstancesHandler, AdditionalMeasureProducer, TechnicalInformationHandler
    Class for bagging a classifier to reduce variance. Can do classification and regression depending on the base learner.

    For more information, see

    Leo Breiman (1996). Bagging predictors. Machine Learning. 24(2):123-140.

    BibTeX:

     @article{Breiman1996,
        author = {Leo Breiman},
        journal = {Machine Learning},
        number = {2},
        pages = {123-140},
        title = {Bagging predictors},
        volume = {24},
        year = {1996}
     }
     

    Valid options are:

     -P
      Size of each bag, as a percentage of the
      training set size. (default 100)
     
     -O
      Calculate the out of bag error.
     
     -S <num>
      Random number seed.
      (default 1)
     
     -I <num>
      Number of iterations.
      (default 10)
     
     -D
      If set, classifier is run in debug mode and
      may output additional info to the console
     
     -W
      Full name of base classifier.
      (default: weka.classifiers.trees.REPTree)
     
     Options specific to classifier weka.classifiers.trees.REPTree:
     
     -M <minimum number of instances>
      Set minimum number of instances per leaf (default 2).
     
     -V <minimum variance for split>
      Set minimum numeric class variance proportion
      of train variance for split (default 1e-3).
     
     -N <number of folds>
      Number of folds for reduced error pruning (default 3).
     
     -S <seed>
      Seed for random data shuffling (default 1).
     
     -P
      No pruning.
     
     -L
      Maximum tree depth (default -1, no maximum)
     
    Options after -- are passed to the designated classifier.

    Version:
    $Revision: 11572 $
    Author:
    Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (len@reeltwo.com), Richard Kirkby (rkirkby@cs.waikato.ac.nz)
    See Also:
    Serialized Form
    • Constructor Detail

      • Bagging

        public Bagging()
        Constructor.
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing classifier
        Returns:
        a description suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -P
          Size of each bag, as a percentage of the
          training set size. (default 100)
         
         -O
          Calculate the out of bag error.
         
         -S <num>
          Random number seed.
          (default 1)
         
         -I <num>
          Number of iterations.
          (default 10)
         
         -D
          If set, classifier is run in debug mode and
          may output additional info to the console
         
         -W
          Full name of base classifier.
          (default: weka.classifiers.trees.REPTree)
         
         Options specific to classifier weka.classifiers.trees.REPTree:
         
         -M <minimum number of instances>
          Set minimum number of instances per leaf (default 2).
         
         -V <minimum variance for split>
          Set minimum numeric class variance proportion
          of train variance for split (default 1e-3).
         
         -N <number of folds>
          Number of folds for reduced error pruning (default 3).
         
         -S <seed>
          Seed for random data shuffling (default 1).
         
         -P
          No pruning.
         
         -L
          Maximum tree depth (default -1, no maximum)
         
        Options after -- are passed to the designated classifier.

        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class RandomizableIteratedSingleClassifierEnhancer
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • bagSizePercentTipText

        public java.lang.String bagSizePercentTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getBagSizePercent

        public int getBagSizePercent()
        Gets the size of each bag, as a percentage of the training set size.
        Returns:
        the bag size, as a percentage.
      • setBagSizePercent

        public void setBagSizePercent​(int newBagSizePercent)
        Sets the size of each bag, as a percentage of the training set size.
        Parameters:
        newBagSizePercent - the bag size, as a percentage.
      • calcOutOfBagTipText

        public java.lang.String calcOutOfBagTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setCalcOutOfBag

        public void setCalcOutOfBag​(boolean calcOutOfBag)
        Set whether the out of bag error is calculated.
        Parameters:
        calcOutOfBag - whether to calculate the out of bag error
      • getCalcOutOfBag

        public boolean getCalcOutOfBag()
        Get whether the out of bag error is calculated.
        Returns:
        whether the out of bag error is calculated
      • measureOutOfBagError

        public double measureOutOfBagError()
        Gets the out of bag error that was calculated as the classifier was built.
        Returns:
        the out of bag error
      • enumerateMeasures

        public java.util.Enumeration enumerateMeasures()
        Returns an enumeration of the additional measure names.
        Specified by:
        enumerateMeasures in interface AdditionalMeasureProducer
        Returns:
        an enumeration of the measure names
      • getMeasure

        public double getMeasure​(java.lang.String additionalMeasureName)
        Returns the value of the named measure.
        Specified by:
        getMeasure in interface AdditionalMeasureProducer
        Parameters:
        additionalMeasureName - the name of the measure to query for its value
        Returns:
        the value of the named measure
        Throws:
        java.lang.IllegalArgumentException - if the named measure is not supported
      • buildClassifier

        public void buildClassifier​(Instances data)
                             throws java.lang.Exception
        Bagging method.
        Overrides:
        buildClassifier in class IteratedSingleClassifierEnhancer
        Parameters:
        data - the training data to be used for generating the bagged classifier.
        Throws:
        java.lang.Exception - if the classifier could not be built successfully
      • distributionForInstance

        public double[] distributionForInstance​(Instance instance)
                                         throws java.lang.Exception
        Calculates the class membership probabilities for the given test instance.
        Overrides:
        distributionForInstance in class Classifier
        Parameters:
        instance - the instance to be classified
        Returns:
        preedicted class probability distribution
        Throws:
        java.lang.Exception - if distribution can't be computed successfully
      • toString

        public java.lang.String toString()
        Returns description of the bagged classifier.
        Overrides:
        toString in class java.lang.Object
        Returns:
        description of the bagged classifier as a string
      • main

        public static void main​(java.lang.String[] argv)
        Main method for testing this class.
        Parameters:
        argv - the options