Supervised learning

Presentation

Supervised learning consists of associating a label with measures:

discreet labels: classification
continuous labels: regression

Classification can be done, for example, with KNN, SVM and decision trees, when regression can be done through linear regression, SVM, and decision tree forests, see below. Finally, regressions can be univariate or multivariate.

Their quality is measured:

traditionally, by the root mean square error RMSE, corresponding to the Euclidean norm (norm $\ell_2$).
but also, by the mean absolute error MAE, corresponding to the Manhattan standard (standard $\ell_1$), useful when there are many extreme values.
or, less frequently, the norm $\ell_k$, with $\ell_0$ for the number of non-zero elements and $\ell_\infty$ for the largest of the absolute values.

Note that:

the higher the index of the standard, the more importance it gives to large values while neglecting small ones;
The RMSE is therefore more sensitive to values deviating from normal (or outliers) than the MAE;
when outliers are exponentially rare (as in the case of a Gaussian), the RMSE works very well, and is generally preferred;
sklearn.metrics implements these standards.

Example of regression: Determine the distance of vehicles according to the brightness of their headlights.

The regressions

Regression models

Generally speaking, a "regression model" is any model designed to predict the value of a variable, called a "variable to explain", as a function of one or more other variables, called "explanatory variables". Various types of regression models exist: linear, logistic, ordered logistic...

Linear regression

Presentation

Linear regression is the oldest and most common regression model, dating back to the 1750s, and is used when the variable to be explained is quantitative.

It is based on the assumption that the variable to be explained is equal to a linear combination of the explanatory variables, plus unexplained variations called noise (or errors, or residues). The mathematical formulation of this model is therefore:

$Y_i = \beta_0 + \beta_1 X_{i,1} + \beta_2 X_{i,2} + ... \beta_p X_{i,p} + \epsilon_i$

where:

$Y_i$ is the value of the variable to be explained for the subject $i$,
$X_{i,1} ... X_{i,p}$ are the values of the explanatory variables for the subject $i$, and
$\epsilon_i$ is the noise associated with the subject $i$.

$\beta_0, \beta_1, ..., \beta_p$ are therefore the parameters of the model, to be determined.

The following matrix writing can also be used:

$Y = \beta X + \epsilon$

where:

$Y$ is the size vector $n$ which represents the values of the variable to be explained for all subjects,
$\beta = (\beta_0, \beta_1, ..., \beta_p)$ is the vector of the parameters to be estimated (so it is a vector of size $p+1$),
$X$ is the matrix size $n \times p+1$ whose rows represent the subjects and columns represent the explanatory variables, the first column being only composed of 1 in order to include the constant component (i.e. $\beta_0$) in the model.

In general, it is assumed that the noise $\epsilon$ follows a normal centered law $N(0,\sigma^2 I)$ in which the variance $\sigma^2$ is to be determined. In this case, maximizing the likelihood of the model is equivalent to minimizing the sum of the squares of the components of $\epsilon$ (also called the sum of the error squares). The least squares method is then used to estimate the model parameters.

Linear regression with sklearn

  >>> import sklearn
  >>> lin_reg_model = sklearn.linear_model.LinearRegression()
  >>> lin_reg_model = fit(X,y)
  >>> lin_reg_model.predict(X_new)

Logistic regression

Logistic regression is a regression model that applies when the variable to be explained is binary (sick or healthy individual, living or deceased, etc.).

The main hypothesis of the logistic regression is that the state of the variable to be explained $Y$ depends on a continuous variable $Y^*$ (unobserved), also called "latent trait". A linear regression can then be applied to this latent trait:

$Y_i^* = \beta_0 + \beta_1 X_{i,1} + \beta_2 X_{i,2} + ... + \beta_p X_{i,p} + \epsilon_i$.

Here $\epsilon$ is supposed to follow a standard logistic law: a normal law approximation that has the advantage of having an explicitly defined distribution function.

The logistic regression assumption is that $Y_i = 0 \Leftrightarrow Y_i^* < $0 (and therefore $Y_i = 1 \Leftrightarrow Y_i^* \geq 0$). As a result, it follows that

$P(Y_i = 1) = \Phi(\beta_0 + \beta_1 X_{i,1} + \beta_2 X_{i,2} + ... + \beta_p X_{i,p})$

where $\Phi : x \rightarrow \frac{1}{1 + e^{-x}}$ is the standard logistic law distribution function. In other words, the probability that $Y_i$ is equal to 1 is all the greater as $\beta_0 + \beta_1 X_{i,1} + \beta_2 X_{i,2} +... + \beta_p X_{i,p}$, also called the linear predictor, is large.

Some models of automatic learning

Decision trees

These are trees such that, at each node, a question is asked, to reduce the set of remaining solutions to 2 separate parts as large as possible. This process is recursively reproduced to a single solution, according to a dichotomy, median principle.

In automatic learning, decision trees are built by the algorithm:

each internal node describes a test on a learning parameter,
each branch represents a test result,
each sheet contains the value of the target variable: class label (classification) or numerical value (regression).

The algorithm is all the more efficient because it finds the parameters that maximize sharing at each node.

rem: the order in which the predictor parameters are selected influences the result.

Random forests

Random forests consist of learning from multiple decision trees, working on the most independent subsets of data possible.

This approach solves several decision tree problems, such as the impact of the order of predictor parameters, or complexity.

K nearest neighbours (KNN)

The KNN algorithm searches for the nearest N neighbours (by calculating the distance) between the data to be predicted and the known data. It returns the class of the majority of neighbors.

Relatively simple, the KNN does not calculate any information in the learning process. However, it is not suitable for large data.

  >>> import sklearn
  >>> knn_model = sklearn.neighbors.KNeighborsRegressor(n_neighbors=3)
  >>> knn_model = fit(X,y)
  >>> knn_model.predict(X_new)

Support vector machines (SVM)

This is an extension of linear regressors, adapted to data with more twisted separations. These machines can work with data with a large number of parameters.

Neural networks

There are several types of neural networks.

Auto-encoders: Able to recognize patterns and imitate them.
Convolutional neural networks: For image processing.
Recurrent neural networks: Analysis of sequences of arbitrary size: time series, natural languages.

Example of a multilayer perceptron with scikit-learn

  >>> from sklearn.linear_model import Perceptron
  >>> per = Perceptron()
  >>> per.fit(X, y) # two np.arrays
  >>> y_pred = per.predict([[ 2, 0.5] ])

WikiMath

This is Wordpress NewsPaper Theme converted to PmWiki

Oct 30, 2025

Search

Supervised learning

Presentation

The regressions

Regression models

Linear regression

Presentation

Linear regression with sklearn

Logistic regression

Some models of automatic learning

Decision trees

Random forests

K nearest neighbours (KNN)

Support vector machines (SVM)

Neural networks

Page Actions

Recent Changes

Group & Page

Back Links