The automatic learning can be:
The human knows how to do unsupervised, but the machine has more trouble.
Learning can be done online or in groups.
In order for a group learning to become aware of new data, a new version of the system must be trained on the entire dataset. This can be done periodically: once a week, with a periodic replacement of the system taking into account updates. Batch learning requires time and resources, which is not always available.
E-Learning (which is often done offline) must be able to adapt and evolve quickly and independently. It does not require storage of the dataset, but can operate on a stream, and do out-of-memory learning for large data. It depends on a learning rate, the rate at which it adapts to changing data. In the event of a high rate, the system adapts quickly to new data... and risks forgetting the old data just as quickly.
One of the major difficulties of e-learning is that if bad data is introduced into the system, it will gradually deteriorate. Possible solution: algorithm for detecting anomalies on input data.
Most automatic learning tasks involve making predictions: from a number of learning examples, the system must be able to generalize to examples it has never seen before.
Very different machine learning algorithms, some of which are rather simple, give equally good results on complex problems such as the disambiguation of natural language, provided there is a very large amount of data: this is the unreasonable efficiency of the data.
In short: data is more important than algorithms for complex problems.
There may be sampling noise, when the sample is too small, and sampling bias when the sampling method is defective (even on very large samples, see non-response bias).
Take the time to clean up your data: delete or correct outliers, and manage missing data (delete the variable, ignore the observation, or complete the data).
Garbage in, garbage out: the system can only learn if the training data contains enough relevant variables, and not too many irrelevant ones. You have to choose a good set of variables to train on:
It is the engineering of variables (feature engineering).
Do not overgeneralize (overadjustment, or overfitting): the model may work well on learning data, but it may not generalize well. Think of Lagrange interpolation: you go through the interpolation points, but between them you do anything.
Overadjustment occurs when the model is too complex for the amount of learning data and the noise it contains. We can then:
In short, a good balance must be found between a perfect adjustment of the data, and a simplicity of the model sufficient to guarantee a good generalization. The level of regularization to be applied during learning (e.g. limiting to a slight slope in a linear regression) can be controlled by a hyperparameter: a parameter of the learning algorithm (and not of the model), which remains constant during learning.
Underfitting is the opposite of overadjustment: the model is too simple to discover the underlying structure of the data. We can then:
In short, the model must not be too simple or too complex.
It is necessary to separate the data into training set (80%) and test set (20%).
The generalization error on the test set, if it is high when the learning error is low, indicates an overadjustment.