A | B | F | G | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | AB | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Task | Properties | Main sklearn models (bold = recommended) | Should scale features | Should scale target for regression tasks | Multi-target/label | Deterministic | Has partial_fit (suitable for online learning) | Has predict_proba_() | Has feature_importances_ | Key sklearn hyperparameters | Typical loss function | To increase regularization or similar effect | Typical evaluation metric | Training complexity | Prediction complexity | Space complexity | Embarrassingly parallel | Supports NaNs | Restriction | Preference | Remarks | |||
2 | Linear regression | Regression | Linear, deterministic regressor | linear_model.LinearRegression (no regularization), linear_model.Lasso (L1-regularization), linear_model.Ridge (L2-regularization), linear_model.ElasticNet (L1 and L2), linear_model.ElasticNetCV, linear_model.SGDRegressor | Yes, if regularization is applied | Yes, if regularization is applied | Yes | Yes | No (use SGDRegressor instead) | No | Use coef_ only if data is scaled | alpha | Mean squared error | Increase alpha (usually squared L2 and/or L1 penalty) | R², MSE, RMSE | O(p²n + p³) for n examples and p model parameters | O(p) | O(p) | No | No | Linear or linearized relationships | Check residuals for normality and homoscedasticity | |||
3 | Logistic regression | Classification | Binary classifier (multiclass via OVR), deterministic (depends on solver) | linear_model.LogisticRegression, linear_model.SGDClassifier | Yes, if regularization is applied | Not applicable | No | Yes | No (use SGDClassifier with log loss instead) | Yes | No | penalty, C | Cross-entropy aka log loss, aka logistic loss, aka deviance | Decrease C (usually L2 penalty), or increase alpha for SGDClassifier | Likelihood ratio, weighted F1 | O(np) | O(p) | O(p) | No | No | |||||
4 | Ridge regression classification | Classification | Kernel-based, non-linear decision boundary, binary classifier | linear_model.RidgeClassifier | Yes | Not applicable | No | Depends on solver | No | No | No | alpha | Cross-entropy aka log loss, aka logistic loss, aka deviance | Increase alpha | Weighted F1 | O(p²n + p³) for n examples and p model parameters | O(p) | O(p) | No | No | Continuous features | ||||
5 | k-nearest neighbours | Classification or Regression | Instance-based, non-parametric, multiclass classifier | neighbors.KNeighborsClassifier, neighbors.KNeighborsRegressor | Yes | No | Yes | Yes | No | Yes | No | n_neighbours | None | Increase n_neighbors | Weighted F1 (classification) or R², MSE, RMSE (regression) | O(1) for brute force, or O(n.log(n).p) for k-d tree for n examples and p features | For brute force, O(npk) for k neighbours and n training examples of dimension p (number of features ~ number of parameters). Or, for k-d tree, O(k.log(n)). | O(1) for brute force, or O(npk) for k-d tree | No | No | Small datasets, low dimensionality, dense data | Fast to train, slow to predict | |||
6 | Linear support vector machine | Classification or Regression | Linear decision boundary, binary classifier (multiclass via OVR) | svm.SVC, svm.SVR, svm.LinearSVC, svm.LinearSVR, svm.NuSVC, svm.NuSVR, linear_model.SGDClassifier | Yes | Yes | No | Only if probability=False | No (use SGD instead) | Only if probability=True | No | gamma, C (or alpha for SGDClassifier) | Hinge loss | Decrease C (squared L2 penalty), or increase alpha for SGDClassifier | Weighted F1 (classification) or R², MSE, RMSE (regression) | O(n²p) for n examples and p model parameters | O(sp) for s support vectors and p model parameters | O(sp) for s support vectors and p model parameters | No | No | Linearly separable classes | ||||
7 | Nonlinear support vector machine | Classification or Regression | Kernel-based, non-linear decision boundary, binary classifier (multiclass via OVR) | svm.SVC, svm.SVR, svm.NuSVC, svm.NuSVR | Yes | Yes | No | Only if probability=False | No (use SGD instead) | Only if probability=True | No | kernel, gamma, C | Hinge loss | Decrease C (squared L2 penalty) | Weighted F1 (classification) or R², MSE, RMSE (regression) | O(n²p + n³) for n examples and p model parameters | O(sp) for s support vectors and p model parameters | O(sp) for s support vectors and p model parameters | No | No | |||||
8 | Decision tree | Classification or Regression | Non-parametric, multiclass classifier | tree.DecisionTreeClassifier, tree.DecisionTreeRegressor | No | No | Yes | Yes | No | Yes | Yes | max_features, max_depth, min_samples_leaf, min_samples_split | Gini (per split, not global so not strictly a loss function per se) | Decrease max_depth, max_features, or increase min_samples_split, min_samples_leaf | Weighted F1 (classification) or R², MSE, RMSE (regression) | O(nzp) for n examples, p model parameters, if depth is limited to z. | O(z) for max depth z | O(z) | Yes | No | Prone to overfitting | ||||
9 | Random forest | Classification or Regression | Stochastic, ensemble-based multiclass classifier | ensemble.RandomForestClassifier, ensemble.RandomForestRegressor | No | No | Yes | No | No | Yes | Yes | n_estimators, max_features, max_depth, min_samples_leaf, min_samples_split | Gini (per split, not global so not strictly a loss function per se) | Decrease max_depth, max_features, or increase min_samples_split, min_samples_leaf | Weighted F1 (classification) or R², MSE, RMSE (regression) | O(nzpt) for n examples, p model parameters, max depth z, and t trees | O(zt) | O(zt) | Yes | No | |||||
10 | Extremely randomized trees | Classification or Regression | Stochastic, ensemble multiclass classifier (ExtraTrees is to ExtraTree as RandomForest is to DecisionTree) | ensemble.ExtraTreesClassifier, ensemble.ExtraTreesRegressor | No | No | Yes | No | No | Yes | Yes | n_estimators, max_features, max_depth, min_samples_leaf, min_samples_split | Gini (per split, not global so not strictly a loss function per se) | Decrease max_depth, max_features, or increase min_samples_split, min_samples_leaf | Weighted F1 (classification) or R², MSE, RMSE (regression) | O(nzpt) for n examples, p model parameters, max depth z, and t trees | O(zt) | O(zt) | Yes | No | |||||
11 | Gradient boosted trees | Classification or Regression | Stochastic, ensemble multiclass classifier (ExtraTrees is to ExtraTree as RandomForest is to DecisionTree) | ensemble.GradientBoostingClassifier, ensemble.GradientBoostingRegressor, ensemble.HistGradientBoostingClassifier, ensemble.HistGradientBoostingRegressor | Yes, if regularization is applied | No | No | No | No | Yes | Yes | n_estimators, max_features, max_depth, min_samples_leaf, min_samples_split | Cross-entropy aka log loss, aka logistic loss, aka deviance | Decrease max_depth, max_features, or increase min_samples_split, min_samples_leaf | Weighted F1 (classification) or R², MSE, RMSE (regression) | O(nzpt) for n examples, p model parameters, max depth z, and t trees | O(zt) | O(zt) | Yes | Yes (histogram boosting only) | |||||
12 | Gaussian process | Classification or Regression | Non-parametric, probabilistic regressor | gaussian_process.GaussianProcessClassifier, gaussian_process.GaussianProcessRegressor (not including the kernels) | Ensure input is approximately Gaussian, scale sensitivity depends on kernel (RBF is sensitive) | No | No | No | No | Yes | No | kernel, alpha | NA | Increase the length scale of the kernel, or increase the noise likelihood alpha | Weighted F1 (classification) or R², MSE, RMSE (regression) | O(n³) for n examples | O(n²) for n examples | O(n²) for n examples | No | No | |||||
13 | Naive Bayes | Classification | Probabilistic, maximum a posteriori classifier with strong assumption of feature independence (naivety) | naive_bayes.GaussianNB, naive_bayes.MultinomialNB (eg for text), naive_bayes.BernoulliNB (binary values), naive_bayes.CategoricalNB (discrete values), naive_bayes.ComplementNB (for imbalance) | No | No | No | No | Yes | Yes | No | priors | NA | Weaken priors | Classification metrics | O(np) | O(cp) for c classes and p parameters | O(p) | No | No | Independent features | Often used in NLP | |||
14 | Multilayer perceptron | Classification or Regression | Deep feedforward artificial neural network | neural_network.MLPClassifier, neural_network.MLPRegressor | Yes | Yes | Yes | No | Yes | Yes | No | hidden_layer_sizes, activation, alpha, learning_rate_init, max_iter | Cross-entropy aka log loss, aka logistic loss, aka deviance (classification) or squared error (regression) | Increase alpha (L2 penalty) | Weighted F1 (classification) or R², MSE, RMSE (regression) | 😬 | O(p) for p parameters (weights) | O(p) | No | No | |||||
15 | Stochastic gradient descent | Classification or Regression | Regularized linear models optimized with stochastic gradient descent. By default, classifier fits a linear SVM and regressor fits a linear model with L2 regularization. | linear_model.SGDClassifier, linear_model.SGDRegressor | Yes | Yes | Yes | No | Yes | Yes | No | loss, alpha, penalty | Depends on learning algorithm, default for classifier is hinge loss, for regressor is squared loss | Increase alpha | Weighted F1 (classification) or R², MSE, RMSE (regression) | O(knp) for k iterations over n examples with p features | O(p) | O(np) | |||||||
16 | Convolutional neural network | Feature extraction + classification | Yes | Not applicable | Yes | No | NA | NA | NA | NA | Cross-entropy aka log loss, aka logistic loss, aka deviance (classification) or squared error (regression) | O(n·L**d·c·k**d) per layer for n examples of length L in each of d dimensions using c kernels of length k in each of d dimensions. | O(L**d·c·k**d) per layer for 1 example of length L in each of d dimensions using c kernels of length k in each of d dimensions. | O(p), or O(c·k**d) per layer for c kernels of length k in each of d dimensions | Yes | No | Spatially correlated data | Often coupled to an MLP for classification. | |||||||
17 | Restricted Boltzmann machine | Feature extraction | neural_network.BernoulliRBM | Yes | NA | No | No | ||||||||||||||||||
18 | Principal component analysis | Dimensionality reduction | decomposition.PCA | Yes | Not applicable (unsupervised) | Yes | decomposition.IncrementalPCA.partial_fit | Rayleigh quotient | O(2nd² + d³ + n + nd) for truncated SVD. | O(np + p²) | No | No | |||||||||||||
19 | |||||||||||||||||||||||||
20 | Independent component analysis | Dimensionality reduction | decomposition.FastICA | Not applicable (unsupervised) | Yes | Sparsity penalty (L1) | No | No | |||||||||||||||||
21 |