Nov 21, 2024

Wiki

Python

Aide

edit SideBar

Search

Decision Tree


Python code

  %pylab inline
  pylab.rcParams['figure.figsize'] = (18,8)

  from sklearn import tree
  from sklearn.tree import DecisionTreeRegressor
  from sklearn.metrics import mean_absolute_error, mean_squared_error
  from math import sqrt

  reg = DecisionTreeRegressor(max_depth=5)
  reg.fit(X_train, y_train)
  y_test_pred = reg.predict(X_test)
  test_error = mean_squared_error(y_test, y_test_pred)
  print("  - Validation RMSE:", sqrt(test_error))
  print("  - Validation MAE:", mean_absolute_error(y_test_pred, y_test))

  tree.plot_tree(reg, filled=True, feature_names=list(df.columns))
  plt.show()

Presentation

Decision trees, which can be used for both regression and classification, make it possible to explain a value from a series of discrete or continuous variables. They are quite powerful methods, non-parametric and non-linear, consisting in:

  • partitioning the individuals, producing groups of individuals as homogeneous as possible from the point of view of the variable to be predicted.
  • by taking into account a hierarchy of the predictive capacity of the variables considered.

This hierarchy allows to visualize the results in a tree, and to build explicit predictive rules.

Several iterations are necessary. At each iteration :

  • the individuals are divided into k (=2) classes, to explain the output variable;
  • the first division is obtained by choosing the explanatory variable that will best separate the individuals;
  • this division defines sub-populations, represented by nodes of the tree;
  • each node is associated with a proportion measure, which allows to explain the membership to a class or the significance of an output variable;
  • the operation is repeated for each sub-population, until no more separation is possible.

Each leaf is characterized by a specific path through the tree, called a rule. The set of rules for all the leaves is the model. The interpretation of a rule is easy if we obtain pure leaves. Otherwise, one must rely on the empirical distribution of the variable to be predicted at each node of the tree.

A decision tree can very quickly lead to overlearning, so it is necessary to prune the tree: stop at an adequate number of leaves when building the tree.

To build the tree, various questions arise:

  • How to choose the decision variable? We must define the tree structure by selecting the variables, from the most discriminating to the least discriminating.
  • How to deal with continuous variables?
  • How to define the size of the tree ? the right balance between a trivial tree and over-learning...

Three main algorithms exist to build such decision trees by answering the above questions: CART, C4.5 and CHAID. They proceed as follows:

  • Choice of the decision variable :
    • measure of $\chi^2$ of the independence gap for CHAID and Tchyprow's $t$,
    • Gini index (or split criterion) for CART
    • entropy for C4.5.
  • Tree size adjustment:
    • post-pruning for CART and C4.5: we make the purest tree with all the segmentation, then use a criterion to compare trees of different sizes
    • pre-pruning for CHAID: a stopping rule is set to stop the construction.

Decision trees are rather used as weak classifiers based on ensemblistic methods.

Page Actions

Recent Changes

Group & Page

Back Links