ABFGIJKLMNOPQRSTUVWXYZAAAB
1
TaskPropertiesMain sklearn models (bold = recommended)Should scale featuresShould scale target for regression tasksMulti-target/labelDeterministicHas partial_fit (suitable for online learning)Has predict_proba_()Has feature_importances_Key sklearn hyperparametersTypical loss functionTo increase regularization or similar effectTypical evaluation metricTraining complexityPrediction complexitySpace complexityEmbarrassingly parallelSupports NaNsRestrictionPreferenceRemarks
2
Linear regressionRegressionLinear, deterministic regressorlinear_model.LinearRegression (no regularization), linear_model.Lasso (L1-regularization), linear_model.Ridge (L2-regularization), linear_model.ElasticNet (L1 and L2), linear_model.ElasticNetCV, linear_model.SGDRegressorYes, if regularization is appliedYes, if regularization is appliedYesYesNo (use SGDRegressor instead)NoUse coef_ only if data is scaledalphaMean squared errorIncrease alpha (usually squared L2 and/or L1 penalty)R², MSE, RMSEO(p²n + p³) for n examples and p model parametersO(p)O(p)NoNoLinear or linearized relationshipsCheck residuals for normality and homoscedasticity
3
Logistic regressionClassificationBinary classifier (multiclass via OVR), deterministic (depends on solver)linear_model.LogisticRegression, linear_model.SGDClassifierYes, if regularization is appliedNot applicableNoYesNo (use SGDClassifier with log loss instead)YesNopenalty, CCross-entropy aka log loss, aka logistic loss, aka devianceDecrease C (usually L2 penalty), or increase alpha for SGDClassifierLikelihood ratio, weighted F1O(np)O(p)O(p)NoNo
4
Ridge regression classificationClassificationKernel-based, non-linear decision boundary, binary classifierlinear_model.RidgeClassifierYesNot applicableNoDepends on solverNoNoNoalphaCross-entropy aka log loss, aka logistic loss, aka devianceIncrease alphaWeighted F1O(p²n + p³) for n examples and p model parametersO(p)O(p)NoNoContinuous features
5
k-nearest neighboursClassification or RegressionInstance-based, non-parametric, multiclass classifierneighbors.KNeighborsClassifier, neighbors.KNeighborsRegressorYesNoYesYesNoYesNon_neighboursNoneIncrease n_neighborsWeighted F1 (classification) or R², MSE, RMSE (regression)O(1) for brute force, or O(n.log(n).p) for k-d tree for n examples and p featuresFor brute force, O(npk) for k neighbours and n training examples of dimension p (number of features ~ number of parameters). Or, for k-d tree, O(k.log(n)).O(1) for brute force, or O(npk) for k-d treeNoNoSmall datasets, low dimensionality, dense dataFast to train, slow to predict
6
Linear support vector machineClassification or RegressionLinear decision boundary, binary classifier (multiclass via OVR)svm.SVC, svm.SVR, svm.LinearSVC, svm.LinearSVR, svm.NuSVC, svm.NuSVR, linear_model.SGDClassifierYesYesNoOnly if probability=FalseNo (use SGD instead)Only if probability=TrueNogamma, C (or alpha for SGDClassifier)Hinge lossDecrease C (squared L2 penalty), or increase alpha for SGDClassifierWeighted F1 (classification) or R², MSE, RMSE (regression)O(n²p) for n examples and p model parametersO(sp) for s support vectors and p model parametersO(sp) for s support vectors and p model parametersNoNoLinearly separable classes
7
Nonlinear support vector machineClassification or RegressionKernel-based, non-linear decision boundary, binary classifier (multiclass via OVR)svm.SVC, svm.SVR, svm.NuSVC, svm.NuSVRYesYesNoOnly if probability=FalseNo (use SGD instead)Only if probability=TrueNokernel, gamma, CHinge lossDecrease C (squared L2 penalty)Weighted F1 (classification) or R², MSE, RMSE (regression)O(n²p + n³) for n examples and p model parametersO(sp) for s support vectors and p model parametersO(sp) for s support vectors and p model parametersNoNo
8
Decision treeClassification or RegressionNon-parametric, multiclass classifiertree.DecisionTreeClassifier, tree.DecisionTreeRegressorNoNoYesYesNoYesYesmax_features, max_depth, min_samples_leaf, min_samples_splitGini (per split, not global so not strictly a loss function per se)Decrease max_depth, max_features, or increase min_samples_split, min_samples_leafWeighted F1 (classification) or R², MSE, RMSE (regression)O(nzp) for n examples, p model parameters, if depth is limited to z.O(z) for max depth zO(z)YesNoProne to overfitting
9
Random forestClassification or RegressionStochastic, ensemble-based multiclass classifierensemble.RandomForestClassifier, ensemble.RandomForestRegressorNoNoYesNoNoYesYesn_estimators, max_features, max_depth, min_samples_leaf, min_samples_splitGini (per split, not global so not strictly a loss function per se)Decrease max_depth, max_features, or increase min_samples_split, min_samples_leafWeighted F1 (classification) or R², MSE, RMSE (regression)O(nzpt) for n examples, p model parameters, max depth z, and t treesO(zt)O(zt)YesNo
10
Extremely randomized treesClassification or RegressionStochastic, ensemble multiclass classifier (ExtraTrees is to ExtraTree as RandomForest is to DecisionTree)ensemble.ExtraTreesClassifier, ensemble.ExtraTreesRegressorNoNoYesNoNoYesYesn_estimators, max_features, max_depth, min_samples_leaf, min_samples_splitGini (per split, not global so not strictly a loss function per se)Decrease max_depth, max_features, or increase min_samples_split, min_samples_leafWeighted F1 (classification) or R², MSE, RMSE (regression)O(nzpt) for n examples, p model parameters, max depth z, and t treesO(zt)O(zt)YesNo
11
Gradient boosted treesClassification or RegressionStochastic, ensemble multiclass classifier (ExtraTrees is to ExtraTree as RandomForest is to DecisionTree)ensemble.GradientBoostingClassifier, ensemble.GradientBoostingRegressor, ensemble.HistGradientBoostingClassifier, ensemble.HistGradientBoostingRegressorYes, if regularization is appliedNoNoNoNoYesYesn_estimators, max_features, max_depth, min_samples_leaf, min_samples_splitCross-entropy aka log loss, aka logistic loss, aka devianceDecrease max_depth, max_features, or increase min_samples_split, min_samples_leafWeighted F1 (classification) or R², MSE, RMSE (regression)O(nzpt) for n examples, p model parameters, max depth z, and t treesO(zt)O(zt)YesYes (histogram boosting only)
12
Gaussian processClassification or RegressionNon-parametric, probabilistic regressorgaussian_process.GaussianProcessClassifier, gaussian_process.GaussianProcessRegressor (not including the kernels)Ensure input is approximately Gaussian, scale sensitivity depends on kernel (RBF is sensitive)NoNoNoNoYesNokernel, alphaNAIncrease the length scale of the kernel, or increase the noise likelihood alphaWeighted F1 (classification) or R², MSE, RMSE (regression)O() for n examplesO() for n examplesO() for n examplesNoNo
13
Naive BayesClassificationProbabilistic, maximum a posteriori classifier with strong assumption of feature independence (naivety)naive_bayes.GaussianNB, naive_bayes.MultinomialNB (eg for text), naive_bayes.BernoulliNB (binary values), naive_bayes.CategoricalNB (discrete values), naive_bayes.ComplementNB (for imbalance)NoNoNoNoYesYesNopriorsNAWeaken priorsClassification metricsO(np)O(cp) for c classes and p parametersO(p)NoNoIndependent featuresOften used in NLP
14
Multilayer perceptronClassification or RegressionDeep feedforward artificial neural networkneural_network.MLPClassifier, neural_network.MLPRegressorYesYesYesNoYesYesNohidden_layer_sizes, activation, alpha, learning_rate_init, max_iterCross-entropy aka log loss, aka logistic loss, aka deviance (classification) or squared error (regression)Increase alpha (L2 penalty)Weighted F1 (classification) or R², MSE, RMSE (regression)😬O(p) for p parameters (weights)O(p)NoNo
15
Stochastic gradient descentClassification or RegressionRegularized linear models optimized with stochastic gradient descent. By default, classifier fits a linear SVM and regressor fits a linear model with L2 regularization. linear_model.SGDClassifier, linear_model.SGDRegressorYesYesYesNoYesYesNoloss, alpha, penaltyDepends on learning algorithm, default for classifier is hinge loss, for regressor is squared lossIncrease alphaWeighted F1 (classification) or R², MSE, RMSE (regression)O(knp) for k iterations over n examples with p featuresO(p)O(np)
16
Convolutional neural networkFeature extraction + classificationYesNot applicableYesNoNANANANACross-entropy aka log loss, aka logistic loss, aka deviance (classification) or squared error (regression)O(n·L**d·c·k**d) per layer for n examples of length L in each of d dimensions using c kernels of length k in each of d dimensions.O(L**d·c·k**d) per layer for 1 example of length L in each of d dimensions using c kernels of length k in each of d dimensions.O(p), or O(c·k**d) per layer for c kernels of length k in each of d dimensionsYesNoSpatially correlated dataOften coupled to an MLP for classification.
17
Restricted Boltzmann machineFeature extractionneural_network.BernoulliRBMYesNANoNo
18
Principal component analysisDimensionality reductiondecomposition.PCAYesNot applicable (unsupervised)Yesdecomposition.IncrementalPCA.partial_fitRayleigh quotientO(2nd² + + n + nd) for truncated SVD.O(np + p²)NoNo
19
20
Independent component analysisDimensionality reductiondecomposition.FastICANot applicable (unsupervised)YesSparsity penalty (L1)NoNo
21