Scikit-learn is a free Python library dedicated to automatic learning. It is developed by many contributors, particularly in the academic world, by French institutes of higher education and research such as Inria and Télécom ParisTech.
The Scikit-Learn API is consistent:
- Estimators are objects used to estimate certain parameters based on a data set.
- The estimation is performed by the fit() method, to which only one data set is transmitted as a parameter in unsupervised learning, and two parameters in supervised one (the second one containing the labels)
- Any other parameter necessary to guide the estimation process is a hyperparameter defined as an instance variable, to be passed to the constructor.
- Transformers are estimators that can transform a dataset.
- This transformation is done, after a fit(), via the transform() method, which returns the transformed dataset.
- The fit_transform() method does both, sometimes in an optimized way.
- Predictors are estimators that can make predictions from a dataset (see the LinearRegression model).
- This is done using the predict() method, which returns the corresponding predictions from a new dataset.
- A predictor also has a score() method, which uses a test set (and labels, in the supervised case) to measure the quality of predictions.
All the hyperparameters of an estimator are accessible, and all the parameters learned by the latter (via public instance variables terminated by a low dash).
Data sets are represented as numpy arrays or scipy sparce matrices. Hyperparameters are strings of characters or common python numbers: sklearn does not use exotic classes.
Finally, Scikit-Learn provides reasonable default values for most parameters, allowing you to easily and quickly create an operational basic system.