Random forests, which are used for both regression and classification, were proposed by Leo Breiman, an eminent American statistician known for his work on decision trees and on the CART method. Decision trees have indeed the following main flaws:
To overcome these problems, we use several trees. And to avoid having equal trees, we add randomness: each tree has a fragmented view of the problem, randomly drawn :
More precisely, the assembly of decision trees built on the basis of a random draw among the observations is the tree bagging algorithm. Random forests add to tree bagging a sampling on the variables: random forest = tree bagging + feature sampling.