WikiMath

Decision Tree

Python code

  %pylab inline
  pylab.rcParams['figure.figsize'] = (18,8)

  from sklearn import tree
  from sklearn.tree import DecisionTreeRegressor
  from sklearn.metrics import mean_absolute_error, mean_squared_error
  from math import sqrt

  reg = DecisionTreeRegressor(max_depth=5)
  reg.fit(X_train, y_train)
  y_test_pred = reg.predict(X_test)
  test_error = mean_squared_error(y_test, y_test_pred)
  print("  - Validation RMSE:", sqrt(test_error))
  print("  - Validation MAE:", mean_absolute_error(y_test_pred, y_test))

  tree.plot_tree(reg, filled=True, feature_names=list(df.columns))
  plt.show()

Presentation

Decision trees, which can be used for both regression and classification, make it possible to explain a value from a series of discrete or continuous variables. They are quite powerful methods, non-parametric and non-linear, consisting in:

partitioning the individuals, producing groups of individuals as homogeneous as possible from the point of view of the variable to be predicted.
by taking into account a hierarchy of the predictive capacity of the variables considered.

This hierarchy allows to visualize the results in a tree, and to build explicit predictive rules.

Several iterations are necessary. At each iteration :

the individuals are divided into k (=2) classes, to explain the output variable;
the first division is obtained by choosing the explanatory variable that will best separate the individuals;
this division defines sub-populations, represented by nodes of the tree;
each node is associated with a proportion measure, which allows to explain the membership to a class or the significance of an output variable;
the operation is repeated for each sub-population, until no more separation is possible.

Each leaf is characterized by a specific path through the tree, called a rule. The set of rules for all the leaves is the model. The interpretation of a rule is easy if we obtain pure leaves. Otherwise, one must rely on the empirical distribution of the variable to be predicted at each node of the tree.

A decision tree can very quickly lead to overlearning, so it is necessary to prune the tree: stop at an adequate number of leaves when building the tree.

To build the tree, various questions arise:

How to choose the decision variable? We must define the tree structure by selecting the variables, from the most discriminating to the least discriminating.
How to deal with continuous variables?
How to define the size of the tree ? the right balance between a trivial tree and over-learning...

Three main algorithms exist to build such decision trees by answering the above questions: CART, C4.5 and CHAID. They proceed as follows:

Choice of the decision variable :
- measure of $\chi^2$ of the independence gap for CHAID and Tchyprow's $t$,
- Gini index (or split criterion) for CART
- entropy for C4.5.
Tree size adjustment:
- post-pruning for CART and C4.5: we make the purest tree with all the segmentation, then use a criterion to compare trees of different sizes
- pre-pruning for CHAID: a stopping rule is set to stop the construction.

Decision trees are rather used as weak classifiers based on ensemblistic methods.

This is Wordpress NewsPaper Theme converted to PmWiki

Jul 03, 2025

Search

Decision Tree

Python code

Presentation

Page Actions

Recent Changes

Group & Page

Back Links