weka.classifiers.trees
Class SimpleCart

java.lang.Object
  extended by weka.classifiers.Classifier
      extended by weka.classifiers.RandomizableClassifier
          extended by weka.classifiers.trees.SimpleCart
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, AdditionalMeasureProducer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

public class SimpleCart
extends RandomizableClassifier
implements AdditionalMeasureProducer, TechnicalInformationHandler

Class implementing minimal cost-complexity pruning.
Note when dealing with missing values, use "fractional instances" method instead of surrogate split method.

For more information, see:

Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone (1984). Classification and Regression Trees. Wadsworth International Group, Belmont, California.

BibTeX:

 @book{Breiman1984,
    address = {Belmont, California},
    author = {Leo Breiman and Jerome H. Friedman and Richard A. Olshen and Charles J. Stone},
    publisher = {Wadsworth International Group},
    title = {Classification and Regression Trees},
    year = {1984}
 }
 

Valid options are:

 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -M <min no>
  The minimal number of instances at the terminal nodes.
  (default 2)
 -N <num folds>
  The number of folds used in the minimal cost-complexity pruning.
  (default 5)
 -U
  Don't use the minimal cost-complexity pruning.
  (default yes).
 -H
  Don't use the heuristic method for binary split.
  (default true).
 -A
  Use 1 SE rule to make pruning decision.
  (default no).
 -C
  Percentage of training data size (0-1].
  (default 1).

Version:
$Revision: 1.4 $
Author:
Haijian Shi (hs69@cs.waikato.ac.nz)
See Also:
Serialized Form

Constructor Summary
SimpleCart()
           
 
Method Summary
 void buildClassifier(Instances data)
          Build the classifier.
 void calculateAlphas()
          Updates the alpha field for all nodes.
 double[] distributionForInstance(Instance instance)
          Computes class probabilities for instance using the decision tree.
 java.util.Enumeration enumerateMeasures()
          Return an enumeration of the measure names.
 Capabilities getCapabilities()
          Returns default capabilities of the classifier.
 boolean getHeuristic()
          Get if use heuristic search for nominal attributes in multi-class problems.
 double getMeasure(java.lang.String additionalMeasureName)
          Returns the value of the named measure.
 double getMinNumObj()
          Get minimal number of instances at the terminal nodes.
 int getNumFoldsPruning()
          Set number of folds in internal cross-validation.
 java.lang.String[] getOptions()
          Gets the current settings of the classifier.
 java.lang.String getRevision()
          Returns the revision string.
 double getSizePer()
          Get training set size.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 boolean getUseOneSE()
          Get if use the 1SE rule to choose final model.
 boolean getUsePrune()
          Get if use minimal cost-complexity pruning.
 java.lang.String globalInfo()
          Return a description suitable for displaying in the explorer/experimenter.
 java.lang.String heuristicTipText()
          Returns the tip text for this property
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] args)
          Main method.
 double measureTreeSize()
          Return number of tree size.
 java.lang.String minNumObjTipText()
          Returns the tip text for this property
 void modelErrors()
          Updates the numIncorrectModel field for all nodes when subtree (to be pruned) is rooted.
 java.lang.String numFoldsPruningTipText()
          Returns the tip text for this property
 int numInnerNodes()
          Method to count the number of inner nodes in the tree.
 int numLeaves()
          Compute number of leaf nodes.
 int numNodes()
          Compute size of the tree.
 void prune(double alpha)
          Prunes the original tree using the CART pruning scheme, given a cost-complexity parameter alpha.
 int prune(double[] alphas, double[] errors, Instances test)
          Method for performing one fold in the cross-validation of minimal cost-complexity pruning.
 void setHeuristic(boolean value)
          Set if use heuristic search for nominal attributes in multi-class problems.
 void setMinNumObj(double value)
          Set minimal number of instances at the terminal nodes.
 void setNumFoldsPruning(int value)
          Set number of folds in internal cross-validation.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setSizePer(double value)
          Set training set size.
 void setUseOneSE(boolean value)
          Set if use the 1SE rule to choose final model.
 void setUsePrune(boolean value)
          Set if use minimal cost-complexity pruning.
 java.lang.String sizePerTipText()
          Returns the tip text for this property
 java.lang.String toString()
          Prints the decision tree using the protected toString method from below.
 void treeErrors()
          Updates the numIncorrectTree field for all nodes.
 java.lang.String useOneSETipText()
          Returns the tip text for this property
 java.lang.String usePruneTipText()
          Return the tip text for this property
 
Methods inherited from class weka.classifiers.RandomizableClassifier
getSeed, seedTipText, setSeed
 
Methods inherited from class weka.classifiers.Classifier
classifyInstance, debugTipText, forName, getDebug, makeCopies, makeCopy, setDebug
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SimpleCart

public SimpleCart()
Method Detail

globalInfo

public java.lang.String globalInfo()
Return a description suitable for displaying in the explorer/experimenter.

Returns:
a description suitable for displaying in the explorer/experimenter

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

getCapabilities

public Capabilities getCapabilities()
Returns default capabilities of the classifier.

Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class Classifier
Returns:
the capabilities of this classifier
See Also:
Capabilities

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception
Build the classifier.

Specified by:
buildClassifier in class Classifier
Parameters:
data - the training instances
Throws:
java.lang.Exception - if something goes wrong

prune

public void prune(double alpha)
           throws java.lang.Exception
Prunes the original tree using the CART pruning scheme, given a cost-complexity parameter alpha.

Parameters:
alpha - the cost-complexity parameter
Throws:
java.lang.Exception - if something goes wrong

prune

public int prune(double[] alphas,
                 double[] errors,
                 Instances test)
          throws java.lang.Exception
Method for performing one fold in the cross-validation of minimal cost-complexity pruning. Generates a sequence of alpha-values with error estimates for the corresponding (partially pruned) trees, given the test set of that fold.

Parameters:
alphas - array to hold the generated alpha-values
errors - array to hold the corresponding error estimates
test - test set of that fold (to obtain error estimates)
Returns:
the iteration of the pruning
Throws:
java.lang.Exception - if something goes wrong

modelErrors

public void modelErrors()
                 throws java.lang.Exception
Updates the numIncorrectModel field for all nodes when subtree (to be pruned) is rooted. This is needed for calculating the alpha-values.

Throws:
java.lang.Exception - if something goes wrong

treeErrors

public void treeErrors()
                throws java.lang.Exception
Updates the numIncorrectTree field for all nodes. This is needed for calculating the alpha-values.

Throws:
java.lang.Exception - if something goes wrong

calculateAlphas

public void calculateAlphas()
                     throws java.lang.Exception
Updates the alpha field for all nodes.

Throws:
java.lang.Exception - if something goes wrong

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Computes class probabilities for instance using the decision tree.

Overrides:
distributionForInstance in class Classifier
Parameters:
instance - the instance for which class probabilities is to be computed
Returns:
the class probabilities for the given instance
Throws:
java.lang.Exception - if something goes wrong

toString

public java.lang.String toString()
Prints the decision tree using the protected toString method from below.

Overrides:
toString in class java.lang.Object
Returns:
a textual description of the classifier

numNodes

public int numNodes()
Compute size of the tree.

Returns:
size of the tree

numInnerNodes

public int numInnerNodes()
Method to count the number of inner nodes in the tree.

Returns:
the number of inner nodes

numLeaves

public int numLeaves()
Compute number of leaf nodes.

Returns:
number of leaf nodes

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class RandomizableClassifier
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -M <min no>
  The minimal number of instances at the terminal nodes.
  (default 2)
 -N <num folds>
  The number of folds used in the minimal cost-complexity pruning.
  (default 5)
 -U
  Don't use the minimal cost-complexity pruning.
  (default yes).
 -H
  Don't use the heuristic method for binary split.
  (default true).
 -A
  Use 1 SE rule to make pruning decision.
  (default no).
 -C
  Percentage of training data size (0-1].
  (default 1).

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class RandomizableClassifier
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an options is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the classifier.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class RandomizableClassifier
Returns:
the current setting of the classifier

enumerateMeasures

public java.util.Enumeration enumerateMeasures()
Return an enumeration of the measure names.

Specified by:
enumerateMeasures in interface AdditionalMeasureProducer
Returns:
an enumeration of the measure names

measureTreeSize

public double measureTreeSize()
Return number of tree size.

Returns:
number of tree size

getMeasure

public double getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure.

Specified by:
getMeasure in interface AdditionalMeasureProducer
Parameters:
additionalMeasureName - the name of the measure to query for its value
Returns:
the value of the named measure
Throws:
java.lang.IllegalArgumentException - if the named measure is not supported

minNumObjTipText

public java.lang.String minNumObjTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setMinNumObj

public void setMinNumObj(double value)
Set minimal number of instances at the terminal nodes.

Parameters:
value - minimal number of instances at the terminal nodes

getMinNumObj

public double getMinNumObj()
Get minimal number of instances at the terminal nodes.

Returns:
minimal number of instances at the terminal nodes

numFoldsPruningTipText

public java.lang.String numFoldsPruningTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumFoldsPruning

public void setNumFoldsPruning(int value)
Set number of folds in internal cross-validation.

Parameters:
value - number of folds in internal cross-validation.

getNumFoldsPruning

public int getNumFoldsPruning()
Set number of folds in internal cross-validation.

Returns:
number of folds in internal cross-validation.

usePruneTipText

public java.lang.String usePruneTipText()
Return the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui.

setUsePrune

public void setUsePrune(boolean value)
Set if use minimal cost-complexity pruning.

Parameters:
value - if use minimal cost-complexity pruning

getUsePrune

public boolean getUsePrune()
Get if use minimal cost-complexity pruning.

Returns:
if use minimal cost-complexity pruning

heuristicTipText

public java.lang.String heuristicTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui.

setHeuristic

public void setHeuristic(boolean value)
Set if use heuristic search for nominal attributes in multi-class problems.

Parameters:
value - if use heuristic search for nominal attributes in multi-class problems

getHeuristic

public boolean getHeuristic()
Get if use heuristic search for nominal attributes in multi-class problems.

Returns:
if use heuristic search for nominal attributes in multi-class problems

useOneSETipText

public java.lang.String useOneSETipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui.

setUseOneSE

public void setUseOneSE(boolean value)
Set if use the 1SE rule to choose final model.

Parameters:
value - if use the 1SE rule to choose final model

getUseOneSE

public boolean getUseOneSE()
Get if use the 1SE rule to choose final model.

Returns:
if use the 1SE rule to choose final model

sizePerTipText

public java.lang.String sizePerTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui.

setSizePer

public void setSizePer(double value)
Set training set size.

Parameters:
value - training set size

getSizePer

public double getSizePer()
Get training set size.

Returns:
training set size

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Returns:
the revision

main

public static void main(java.lang.String[] args)
Main method.

Parameters:
args - the options for the classifier