Using Regression Tree

The options below appear on one of the three Regression Tree dialogs.

Regression Tree Dialog, Data dialog

The following options appears on the Regression Tree Data dialog.

Regression Tree Prediction Data Dialog Variables In Input Data

All variables in the dataset are listed here.

Selected Variables

Variables listed here will be utilized in the Analytic Solver Data Science output.

Output Variable

Select the variable whose outcome is to be predicted here.

Regression Tree Dialog, Parameters dialog

The following options appear on the Regression Tree Parameters dialog.

Random Trees Parameters Diameters Partition Data

Analytic Solver Data Science includes the ability to partition a dataset from within a classification or prediction method by clicking Partition Data on the Parameters dialog. Analytic Solver Data Science will partition your dataset (according to the partition options you set) immediately before running the prediction method. If partitioning has already occurred on the dataset, this option will be disabled. For more information on partitioning, please see the Data Science Partitioning chapter.

Rescale Data

Click Rescale Data to open the Rescaling dialog.

Use Rescaling to normalize one or more features in your data during the data preprocessing stage. Analytic Solver Data Science provides the following methods for feature scaling: Standardization, Normalization, Adjusted Normalization and Unit Norm. For more information on this new feature, see the Rescale Continuous Data section within the Transform Continuous Data chapter that occurs earlier in this guide.

Analytic Solver Data Mining: Notes on Rescaling and Simulation functionality

Tree Growth

In the Tree Growth section, select Levels, Nodes, Splits, and Records in Terminal Nodes. Values entered for these options limit tree growth, i.e. if 10 is entered for Levels, the tree will be limited to 10 levels.

Prune (Using Validation Set)

If a validation partition exists, this option is enabled. When this option is selected, Analytic Solver Data Science will prune the tree using the validation set. Pruning the tree using the validation set reduces the error from over-fitting the tree to the training data.

Show Feature Importance

Select Feature Importance to include the Features Importance table in the output. This table shows the relative importance of the feature measured as the reduction of the error criterion during the tree growth.

Maximum Number of Levels

This option specifies the maximum number of levels in the tree to be displayed in the output. Select Trees to Display to select the types of trees to display: Fully Grown, Best Pruned, Minimum Error or User Specified.

Select Full tree to "grow" a complete tree using the training data.
Select Best Pruned to create a tree with the fewest number of nodes, subject to the constraint that the error be kept below a specified level (minimum error rate plus the standard error of that error rate).
Select Minimum error to produce a tree that yields the minimum classification error rate when tested on the validation data.
To create a tree with a specified number of decision nodes select User Specified and enter the desired number of nodes.

Regression Tree Dialog, Scoring dialog

The following options appear on the Regression Tree Parameters dialog.

Score Training Data

Select these options to show an assessment of the performance of the k-Nearest Neighbors Prediction algorithm in predicting the value of the output variable in the training data. The report is displayed according to your specifications - Detailed, Summary, and Lift charts.

New in V2023: When Frequency Chart is selected, a frequency chart will be displayed when the RT_TrainingScore worksheet is selected. This chart will display an interactive application similar to the Analyze Data feature, explained in detail in the Analyze Data chapter that appears earlier in this guide. This chart will include frequency distributions of the actual and predicted responses individually, or side-by-side, depending on the user's preference, as well as basic and advanced statistics for variables, percentiles, six sigma indices.

Score Validation Data

These options are enabled when a validation data set is present. Select these options to show an assessment of the performance of the Regression Tree algorithm in predicting the value of the output variable in the validation data. The report is displayed according to your specifications - Detailed, Summary, and Lift charts. When Frequency Chart is selected, a frequency chart (described above) will be displayed when the RT_ValidationScore worksheet is selected.

Score Test Data

These options are enabled when a test set is present. Select these options to show an assessment of the performance of the Regression Tree algorithm in predicting the value of the output variable in the test data. The report is displayed according to your specifications - Detailed, Summary, and Lift charts. When Frequency Chart is selected, a frequency chart (described above) will be displayed when the RT_TestScore worksheet is selected.

Score New Data

The options in this group allow you to apply the model for scoring to an altogether new data. See the "Scoring New Data" chapter

Regression Tree Dialog, Simulation dialog

Regression Tree Simulation Dialog All supervised algorithms in V2023 include a new Simulation tab. This tab uses the functionality from the Generate Data feature (described earlier in this guide) to generate synthetic data based on the training partition, and uses the fitted model to produce predictions for the synthetic data. The resulting report, RT_Simulation, will contain the synthetic data, the predicted values and the Excel-calculated Expression column, if present. In addition, frequency charts containing the Predicted, Training, and Expression (if present) sources or a combination of any pair may be viewed, if the charts are of the same type.

Evaluation: Select Calculate Expression to amend an Expression column onto the frequency chart displayed on the RT_Simulation output tab. Expression can be any valid Excel formula that references a variable and the response as [@COLUMN_NAME]. Click the Expression Hints button for more information on entering an expression.

For more information on this dialog, see the Generate Data chapter that appears earlier in this guide.