Fact sheet for CHALEARN AutoML challenge
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

 
View only
 
 
Still loading...
ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAG
1
TimestampTeam nameTeam Leader NameTeam Leader Address and phone numberContact emailOther team membersTeam website URLTitle of your contributionSupplementary on-line materialGeneral descriptionReferencesFeature extractionNormalizationDimensionality reductionBase predictorLoss functionRegularizerEnsemble methodModel selection and transfer learningAlgorithmic complexityQualitative advantagesComparison with other methodsAvailabilityLanguageDetails on software implementationPlatformMemoryParallelismCode URLTotal human effortTotal machine effortChallenge duration OK?Final evaluation time (hours)
2
4/25/2014 14:36:59TestIsabelle898 werer aoubetoto@gmail.comaosoasdasfafhttp://chalearn.orgsdfasdfsdfadsfasdfasdfasdfasfdafas sdasd sdas vavads4. Trained feature extractors1. Feature standardization (for numerical variables), 3. Replacement of the missing values3. Clustering (e.g. K-means, hierarchical clustering)1. Decision tree, stub, or Random Forest, 4. Naïve Bayes1. Hinge loss (like in SVM), 3. Logistic loss or cross-entropy (like in logistic regression)3. None3. Other ensemble method, 5. Don't know1. Leaderboard performance on validation data used for model selection, 3. Virtual leave-one-out (close form estimations of LOO with a single classifier training), 8. Penalty-based method (non-Bayesian)2. Quadratic in number of features5. Self-contained (does not rely on third party libraries)afdasfsafafafaf3. Freeware or shareware in house software , 5. Off-the-shelf third party freeware or shareware4. Pythonadsfafasdfad1. Windows, 2. Linux1. <= 2 GB 2. Run in parallel different algorithms on different machinessdfaffafadfsasdfa3. 1-2 man weeks3. 1-2 weeks1. Yes33
3
4/26/2014 9:46:30acefdfbadbc4. Trained feature extractors3. Replacement of the missing values2. Non-linear dimensionality reduction (e.g. KPCA, MDS, LLE, Laplacian Eigenmaps, Kohonen maps)6. Neural Network (or Deep Learning Method)3. Logistic loss or cross-entropy (like in logistic regression)2. Two-norm (||w||^2, like in ridge regression and regular SVM)2. Bagging (check this if you use Random Forest)9. Bi-level optimization5. Quadratic in number of training examples4. Require little memoryasdf5. Off-the-shelf third party freeware or shareware2. Java or Wekasadf2. Linux2. > 2GB but <= 8 GB2. Run in parallel different algorithms on different machinesasdfs2. A few man days2. A few days1. Yessafsdf
4
4/28/2014 9:21:25Team Bennettbennek@rpi.eduTesting challenge4. Trained feature extractors1. Feature standardization (for numerical variables)2. Non-linear dimensionality reduction (e.g. KPCA, MDS, LLE, Laplacian Eigenmaps, Kohonen maps)random guessing5. None2. Two-norm (||w||^2, like in ridge regression and regular SVM)4. None 6. Other cross-validation method5. Quadratic in number of training examples6. Require only freeware libraries5. Off-the-shelf third party freeware or shareware3. Matlab or Octave2. Linux3. > 8 GB but <= 32 GBNone4. > 2 man weeks1. A few hours2. No, but I cannot spend more time2
5
2/18/2015 0:19:38Research Group on Learning, Optimization, and Automated Algorithm Design (aad_freiburg)Frank HutterInstitut für Informatik
Albert-Ludwigs-Universität Freiburg
Sekretariat Nebel/GKI
Georges-Köhler-Allee 052
79110 Freiburg, Germany

+49 761 203-67740
automl2015@informatik.uni-freiburg.deManuel Blum
Katharina Eggensperger
Stefan Falkner
Matthias Feurer
Aaron Klein
Jost Tobias Springenberg
Farooq Zuberi
aad.informatik.uni-freiburg.deAutoSklearnhttp://www.cs.ubc.ca/labs/beta/Projects/SMAC/Bayesian Optimization with Random Forests in SMAC [Hutter et al. 2011] applied to a flexible configuration space describing scikit-learn [Pedregosa et al. 2011] as done in Auto-WEKA [Thornton et al. 2013]. We initialized SMAC with Meta-Learning[Feurer et al. 2015] and constructed ensembles [unpublished] with CMA-ES.Bayesian Optimization with Random Forests applied to a flexible configuration space describing scikit-learn. Hutter, F.; Hoos, H. H.; and Leyton-Brown, K. 2011. Sequential model-based optimization for general algorithm configuration. In Proc. of LION-5, 507–523

Feurer, M.; Springenberg, T and Hutter, F. 2015. Initializing Bayesian Hyperparameter Optimization via Meta-Learning. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; and Duchesnay, E. 2011. Scikit-learn: Machine learning in Python. JMLR 12:2825–2830

Chris Thornton, Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classifiaction Algorithms. In Proc. of KDD 2013,.

Hansen, N. and A. Ostermeier (1996). Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation. In Proceedings of the 1996 IEEE International Conference on Evolutionary Computation, pp. 312-317;
1. Application of random functions, 2. Application of filter banks1. Feature standardization (for numerical variables), 3. Replacement of the missing values, Feature-Wise Min/Max Scaling1. Linear manifold transformations (e.g. factor analysis, PCA, ICA), 5. Feature selection1. Decision tree, stub, or Random Forest, 2. Linear classifier (Fisher's discriminant, SVM, linear regression), 3. Non-linear kernel method (SVM, kernel ridge regression, kernel logistic regression), 10. Nearest neighbors, Gradient Boosting, ExtraTrees1. Hinge loss (like in SVM), 2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression), Deviance1. One-norm (sum of weight magnitudes, like in Lasso), 2. Two-norm (||w||^2, like in ridge regression and regular SVM), 3. None, Elastic-Net, Randomization1. Boosting , 2. Bagging (check this if you use Random Forest), 4. Other ensemble methodTrain/validation split; with Bayesian Optimization; Meta-Learning11. Don't know; too difficult to evaluate7. Theoretically motivated, 8. Novel / original, AutoML searching through flexible space of machine learners.We used an ensemble of many methodsCombination of BSD and AGPLv3 tools; tools will be uploaded later this week1. C/C++/C#, 2. Java or Weka, 4. PythonCollection of scripts using afore-mentioned packages.2. Linux3. > 8 GB but <= 32 GB1. Multi-processor machineaad.informatik.uni-freiburg.de/downloads/automl_competition_2015_000.zip4. > 2 man weeks4. > 2 weeks1. Yes1.5h on a eight-core machine
6
2/19/2015 8:59:18tadejTadej ŠtajnerTadej Štajner
Pražakova ulica 14
1000 Ljubljana
Slovenia

+38640504984
tadej@tdj.sihttp://tdj.siAutoKit: pipeline selection and hyper-parameter optimizationhttps://github.com/tadejs/autokit/blob/master/DESCRIPTION.mdThe method is based on the hyperopt and hyperopt-sklearn [1] packages to pose the automatic machine learning problem as a hyperparameter optimization problem.

The approach that is used in this submission extends this model to include additional learning model selection that is able to determine admissible learning models given the problem description.

It also includes different pre-processing approaches in the hyperparameter search space. This allows us to not only tune individual approaches, but also enrich the data with different representation, such as clustering to have a lower dimensional representation, or kernel approximation to cover non-linearities in the model.

[1] Komer, Brent, James Bergstra, and Chris Eliasmith. "Hyperopt-sklearn: Automatic hyperparameter configuration for scikit-learn." ICML workshop on AutoML. 2014.
Besides the hyperopt-sklearn reference, no papers describe the particular method in this submission.

URL to codebase: https://github.com/tadejs/autokit
union with features from dimensionality reduction1. Feature standardization (for numerical variables)2. Non-linear dimensionality reduction (e.g. KPCA, MDS, LLE, Laplacian Eigenmaps, Kohonen maps), 3. Clustering (e.g. K-means, hierarchical clustering)1. Decision tree, stub, or Random Forest, 2. Linear classifier (Fisher's discriminant, SVM, linear regression), 6. Naïve Bayes1. Hinge loss (like in SVM), 2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression)1. One-norm (sum of weight magnitudes, like in Lasso), 2. Two-norm (||w||^2, like in ridge regression and regular SVM)1. Boosting , 2. Bagging (check this if you use Random Forest)2. K-fold or leave-one-out cross-validation (using training data)1. Linear in number of features, 4. Linear in number of training examples, 7. Linear in number of test examples, Among others, we use Random forest, which is n log n, given n training examples.1. Simple method, 2. Easy to implement, 3. Easy to parallelize, 4. Require little memory, 6. Require only freeware librariesComparing to basic hyperopt-sklearn, the critical differences were searching across the space of preprocessing techniques.3. Freeware or shareware in house software , 5. Off-the-shelf third party freeware or shareware4. PythonBased on:

Python 2.7
NumPy
SciPy
scikits-learn
hyperopt
hyperopt-sklearn
AutoML example code

The main contribution of the autokit implementation is defining the space preprocessing and learning models and their hyperparameter spaces that are subsequently sampled and evaluated via hyperopt.
1. Windows, 2. Linux, 3. Mac OS1. <= 2 GB 1. Multi-processor machine, 2. Run in parallel different algorithms on different machines, Distributed operation is possible in hyperopt, but currently not enabled.https://github.com/tadejs/autokit2. A few man days2. A few days1. Yes1
7
2/22/2015 14:12:18Ideal Intel Analyticseugene.tuv@intel.com;igor.chikalov@intel.comhttp://ideal.intel.com/boosted trees with soft dynamic feature selectionGradient boosting of trees built on a random subspaces dynamically adjusted to reflect learned features relevance. Huber loss function is used, no pre-processing was done. Borisov A., Eruhimov V. and Tuv, E. Tree-Based Ensembles with Dynamic Soft Feature Selection, In Feature Extraction Foundations and Applications Series: Studies in Fuzziness and Soft Computing , Vol. 207, Guyon, I.; Gunn, S.; Nikravesh, M.; Zadeh, L.A. (Eds.), Springer, 2006imbeddednoneimbedded 1. Decision tree, stub, or Random ForestHuber3. None1. Boosting 1. Leaderboard performance on validation data used for model selectionNlogN5. Self-contained (does not rely on third party libraries)1. Proprietary in house software1. C/C++/C#IDEAL is Intel's internal ML system 1. Windows3. > 8 GB but <= 32 GB1. Multi-processor machine, multithreaded 1. A few man hours1. A few hours1. Yesa few hours
8
2/23/2015 1:57:55jrl44James LloydTrinity College
Cambridge
CB2 1TQ
UK

00447890215148
james.robert.lloyd@gmail.comA quick implementation of Freeze-Thaw Bayesian OptimizationI quickly implemented the work described in

Swersky, K., Snoek, J. & Adams, R. P. Freeze-Thaw Bayesian Optimization. arXiv preprint 1406.3896 (2014). at <http://arxiv.org/abs/1406.3896>

to select between 4 variants of random forest and 4 variants of gradient boosting machines.
They should say I implemented a slightly simplified version of

Swersky, K., Snoek, J. & Adams, R. P. Freeze-Thaw Bayesian Optimization. arXiv preprint 1406.3896 (2014). at <http://arxiv.org/abs/1406.3896>
Example codeExample codeExample code1. Decision tree, stub, or Random ForestAUC4. Don't know1. Boosting , 2. Bagging (check this if you use Random Forest)2. K-fold or leave-one-out cross-validation (using training data), 7. Bayesian model selection, Bayesian optimisation of cross validation score11. Don't know; too difficult to evaluate6. Require only freeware libraries, 7. Theoretically motivated5. Off-the-shelf third party freeware or shareware4. Python2. Linux1. <= 2 GB 3. Nonehttps://github.com/jamesrobertlloyd/automl-phase-12. A few man days1. A few hours3. No, please extend or run another roundIt runs until the time limit by design
9
2/23/2015 9:58:55abhishek4abhishek thakurOffice:
Q2.431,
Warburger Str. 100,
33098,
Paderborn, Germany
+4915254783954
abhishek4@gmail.comPhase0Since the dataset was known in phase0, I built separate models for each of them. Mainly used were GBM and SVM. For the digits dataset cross validation was performed to select appropriate number of PCA components5. Sparse coding1. Feature standardization (for numerical variables), 2. Sample normalization (for numerical variables), 4. Grouping modalities (for categorical variables)1. Linear manifold transformations (e.g. factor analysis, PCA, ICA)1. Decision tree, stub, or Random Forest, 2. Linear classifier (Fisher's discriminant, SVM, linear regression), 3. Non-linear kernel method (SVM, kernel ridge regression, kernel logistic regression)1. Hinge loss (like in SVM), 3. Logistic loss or cross-entropy (like in logistic regression)2. Two-norm (||w||^2, like in ridge regression and regular SVM)1. Boosting 1. Leaderboard performance on validation data used for model selection, 2. K-fold or leave-one-out cross-validation (using training data)11. Don't know; too difficult to evaluate1. Simple method, 2. Easy to implement, 3. Easy to parallelize1. Proprietary in house software, 5. Off-the-shelf third party freeware or shareware4. Pythonpython + scikit-learn2. Linux2. > 2GB but <= 8 GB1. Multi-processor machine, 2. Run in parallel different algorithms on different machineshttps://github.com/abhishekkrthakur/AutoML1. A few man hours1. A few hours1. YesNA
10
5/18/2015 15:45:17sjahandidehRF-CLASSIFIERnot applicable2. Sample normalization (for numerical variables), LOG 5. Feature selection1. Decision tree, stub, or Random Forest5. None3. None5. None 1. Leaderboard performance on validation data used for model selection11. Don't know; too difficult to evaluate2. Easy to implement, 4. Require little memory1. Proprietary in house software5. R or S3. Mac OS2. > 2GB but <= 8 GB1. Multi-processor machine1. A few man hours1. A few hours1. Yes3
11
6/14/2015 21:34:55sjahandidehSamad JahandidehSanford-Burnham Medical Research Institute, La Jolla, California
8586463100 Ext. 4047
samad_jahandideh@yahoo.comSingle MemberMethod developerBriefly, I have used a random-forest-based method for classification. Also, I have used gini index for feature selection. Improving the chances of successful protein structure determination with a random forest classifier
S Jahandideh, L Jaroszewski, A Godzik
Acta Crystallographica Section D: Biological Crystallography 70 (3), 627-635
4. Trained feature extractors2. Sample normalization (for numerical variables)5. Feature selection1. Decision tree, stub, or Random Forest5. None3. None2. Bagging (check this if you use Random Forest)2. K-fold or leave-one-out cross-validation (using training data)11. Don't know; too difficult to evaluate2. Easy to implement, 4. Require little memory6. Not ready yet, but may share later3. Matlab or Octave, 5. R or S3. Mac OS2. > 2GB but <= 8 GB1. Multi-processor machine1. A few man hours1. A few hours1. Yes<1
12
9/11/2015 7:46:52asml.intel.comasml.intel.comI used classical GBT technique with prior feature selection step.nono5. Feature selection1. Decision tree, stub, or Random Forest2. Square loss (like in ridge regression)3. None1. Boosting , 2. Bagging (check this if you use Random Forest)1. Leaderboard performance on validation data used for model selection, 2. K-fold or leave-one-out cross-validation (using training data), 4. Out-of-bag estimation (for bagging methods such as Random Forest)11. Don't know; too difficult to evaluate1. Simple method, 2. Easy to implement, 6. Require only freeware libraries3. Freeware or shareware in house software 4. Python1. Windows, 2. Linux3. > 8 GB but <= 32 GB1. Multi-processor machine, 2. Run in parallel different algorithms on different machineshttps://github.com/vkocheganov/AutoML_Phase12. A few man days2. A few days1. Yes8
13
9/15/2015 6:04:54backstreet.bayesJames Lloyd111 York Street,
Cambridge,
CB1 2PZ
UK
00447890215148
james.robert.lloyd@gmail.comEmma Smith
Rowan McAllister
Natasha Latysheva
Alex Davies
Rational allocation of computational resources for ensemble construction via stackinghttps://github.com/jamesrobertlloyd/automl-phase-2I modified the freeze thaw Bayesian optimisation algorithm (http://arxiv.org/abs/1406.3896) to be applicable to ensemble construction. Several algorithms are run, an ensemble is formed by stacking, and then various probabilistic models are used to predict which computational action will most improve the performance of the ensemble.Work not yet published. Cite github for the moment (https://github.com/jamesrobertlloyd/automl-phase-2).Sample codeSample codeSample code1. Decision tree, stub, or Random Forest, 2. Linear classifier (Fisher's discriminant, SVM, linear regression), 3. Non-linear kernel method (SVM, kernel ridge regression, kernel logistic regression), 6. Naïve Bayes, 10. Nearest neighbors, Gradient boosting machines3. Logistic loss or cross-entropy (like in logistic regression)1. One-norm (sum of weight magnitudes, like in Lasso), 2. Two-norm (||w||^2, like in ridge regression and regular SVM)1. Boosting , 2. Bagging (check this if you use Random Forest), Stacking2. K-fold or leave-one-out cross-validation (using training data), 4. Out-of-bag estimation (for bagging methods such as Random Forest)10. Adaptive algorithmic complexity7. Theoretically motivated, 8. Novel / originalReasoning about benefits of different computational steps very helpful when constrained by time. Ensembles are always a good idea.3. Freeware or shareware in house software 4. Python2. Linux3. > 8 GB but <= 32 GB1. Multi-processor machinehttps://github.com/jamesrobertlloyd/automl-phase-24. > 2 man weeks1. A few hours2. No, but I cannot spend more timeHowever long I was given. Similarly for memory usage - it uses as much as it is allowed to!
14
9/17/2015 9:12:39AAD FreiburgFrank HutterGeorges-Köhler-Allee 52
Sekretariat Nebel/GKI
79110 Freiburg
Germany
The @data declaration is a single line denoting the start of the data segment in the file.Matthias Feurer
Aaron Klein
Katharina Eggensperger
Jost Tobias Springenberg
Manuel Blum
auto-sklearnhttp://aad.informatik.uni-freiburg.de/papers/15-AUTOML-AutoML.pdfWe used a predecessor of auto-sklearn. auto-sklearn combines the machine learning library scikit-learn with the state-of-the-art SMBO method SMAC to find suitable machine learning pipelines for a dataset at hand. This is basically a reimplementation of Auto-WEKA. To speed up the optimization process we employ a meta-learning technique which starts SMAC from promising configurations of scikit-learn. Furthermore, we use the outputs of all models and combine these into ensemble using ensemble selection.@INPROCEEDINGS{feurer-automl15a,
author = {M. Feurer and A. Klein and K. Eggensperger and J. Springenberg and M. Blum and F. Hutter},
title = {Methods for Improving Bayesian Optimization for AutoML},
booktitle = {ICML 2015 AutoML Workshop},
year = {2015},
month = jul,
}

Other relevant references are given in the paper.
1. Application of random functions, Kernel Approximation1. Feature standardization (for numerical variables), 2. Sample normalization (for numerical variables), 3. Replacement of the missing values, One-out-of-k Encoding; 1. Linear manifold transformations (e.g. factor analysis, PCA, ICA), 3. Clustering (e.g. K-means, hierarchical clustering), 5. Feature selection1. Decision tree, stub, or Random Forest, 2. Linear classifier (Fisher's discriminant, SVM, linear regression), 3. Non-linear kernel method (SVM, kernel ridge regression, kernel logistic regression), 6. Naïve Bayes, 10. Nearest neighbors, Quadratic classifiers1. Hinge loss (like in SVM), 2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression), 4. Exponential loss (like in boosting), 5. None, 6. Don't know1. One-norm (sum of weight magnitudes, like in Lasso), 2. Two-norm (||w||^2, like in ridge regression and regular SVM)1. Boosting , 2. Bagging (check this if you use Random Forest), 4. Other ensemble method, Ensemble Selection2. K-fold or leave-one-out cross-validation (using training data)11. Don't know; too difficult to evaluate6. Require only freeware librariesWe did not try any other method3. Freeware or shareware in house software , 5. Off-the-shelf third party freeware or shareware1. C/C++/C#, 2. Java or Weka, 4. Python, CythonMostly python; uses externool tools SMAC (in Java) and runsolver (C++).2. Linux4. > 32 GB1. Multi-processor machinehttp://aad.informatik.uni-freiburg.de/downloads/automl_competition_2015_001.zip4. > 2 man weeks4. > 2 weeks1. YesTweakathon: ca 9600; Auto: ?
15
11/25/2015 8:03:46AAD FreiburgFrank HutterFrank Hutter
Arbeitsgruppe für Lernen, Optimierung, und Automatisches Algorithmendesign
c.o. Sekretariat Nebel/GKI
Institut für Informatik
Albert-Ludwigs-Universität Freiburg
Georges-Köhler-Allee 52
79110 Freiburg im Breisgau
Germany


+49 761 203-67740
automl2015@informatik.uni-freiburg.deMatthias Feurer
Katharina Eggensperger
Aaron Klein
Stefan Falkner
Marius Lindauer
Manuel Blum
Jost Tobias Springenberg
http://aad.informatik.uni-freiburg.de/3rd place in Phase Final 2http://aad.informatik.uni-freiburg.de/papers/15-NIPS-auto-sklearn-preprint.pdfWe use auto-sklearn together with ensemble selection (but no meta-learning) as described in Section 6 of the paper reference below. Instead of a simple holdout split we used 5-fold cross validation.@inproceedings{feurer-nips2015,
booktitle = {Proceedings of the Neural Information Processing Systems Conference (NIPS)},
month = {December},
title = {Efficient and Robust Automated Machine Leraning},
author = {M. Feurer and A. Klein and K. Eggensperger and J. Springenberg and M. Blum and F. Hutter},
year = {2015},
pages = {},
}
1. Application of random functions, Kernel Approximation1. Feature standardization (for numerical variables), 2. Sample normalization (for numerical variables), 3. Replacement of the missing values, Min/Max Scaling1. Linear manifold transformations (e.g. factor analysis, PCA, ICA), 5. Feature selection, We had KPCA and hierarchical clustering in the hypothesis space, but model selection told not to use 1. Decision tree, stub, or Random Forest, 2. Linear classifier (Fisher's discriminant, SVM, linear regression), 3. Non-linear kernel method (SVM, kernel ridge regression, kernel logistic regression), Other tree-based ensemble methods1. Hinge loss (like in SVM), 3. Logistic loss or cross-entropy (like in logistic regression), 5. None2. Two-norm (||w||^2, like in ridge regression and regular SVM), elastic net2. Bagging (check this if you use Random Forest), 4. Other ensemble method2. K-fold or leave-one-out cross-validation (using training data)11. Don't know; too difficult to evaluate1. Simple method, 2. Easy to implement, 3. Easy to parallelize, 6. Require only freeware librariesWe chose a method which encompasses a lot of 'base' methods. In the reference which describes our method, one can find a comparison of our method to the 'base' methods.5. Off-the-shelf third party freeware or shareware2. Java or Weka, 4. Python2. Linux2. > 2GB but <= 8 GBModel-parallel, can be on different machines, but can also run on a single machinehttps://github.com/automl/ChaLearn_Automatic_Machine_Learning_Challenge_20152. A few man days4. > 2 weeks1. Yes~10000
16
11/28/2015 4:24:43djajeticDamir JajetićSveta Nedelja, Croatiahttps://github.com/djajeticdjajetichttps://github.com/djajetic/AutoML21. Application of random functions3. Replacement of the missing values5. Feature selection1. Decision tree, stub, or Random Forest3. Logistic loss or cross-entropy (like in logistic regression)1. One-norm (sum of weight magnitudes, like in Lasso)1. Boosting , 2. Bagging (check this if you use Random Forest)1. Leaderboard performance on validation data used for model selection, 11. Knowledge transfer from the development data of past phases1. Linear in number of features, 4. Linear in number of training examples, 7. Linear in number of test examples1. Simple method, 2. Easy to implement, 3. Easy to parallelize, 6. Require only freeware libraries5. Off-the-shelf third party freeware or shareware4. Python2. Linux3. > 8 GB but <= 32 GB1. Multi-processor machinehttps://github.com/djajetic/AutoML23. 1-2 man weeks3. 1-2 weeks1. Yes6
17
11/28/2015 4:37:19djajeticDamir JajetićSveta Nedelja, Croatiahttps://github.com/djajeticdjajetic AutoML3https://github.com/djajetic/AutoML31. Application of random functions1. Feature standardization (for numerical variables), 2. Sample normalization (for numerical variables), 3. Replacement of the missing values1. Linear manifold transformations (e.g. factor analysis, PCA, ICA), 5. Feature selection1. Decision tree, stub, or Random Forest, 2. Linear classifier (Fisher's discriminant, SVM, linear regression), 3. Non-linear kernel method (SVM, kernel ridge regression, kernel logistic regression), 6. Naïve Bayes, 10. Nearest neighbors1. Hinge loss (like in SVM), 2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression), 4. Exponential loss (like in boosting)1. One-norm (sum of weight magnitudes, like in Lasso), 2. Two-norm (||w||^2, like in ridge regression and regular SVM), 4. Don't know1. Boosting , 2. Bagging (check this if you use Random Forest), 4. Other ensemble method4. Out-of-bag estimation (for bagging methods such as Random Forest), 5. Bootstrap estimation (other than out-of-bag)11. Don't know; too difficult to evaluate3. Easy to parallelize, 6. Require only freeware libraries3. Freeware or shareware in house software 4. Python2. Linux3. > 8 GB but <= 32 GB1. Multi-processor machinehttps://github.com/djajetic/AutoML34. > 2 man weeks4. > 2 weeks1. Yes1.5
18
2/26/2016 0:27:10AAD FreiburgFrank HutterArbeitsgruppe Machine Learning for Automated Algorithm Design
Institut für Informatik
Albert-Ludwigs-Universität Freiburg
Georges-Köhler-Allee 52
79110 Freiburg im Breisgau
Germany
automl2015@informatik.uni-freiburg.deMatthias Feurer
Jost Tobias Springenberg
Katharina Eggensperger
Aaron Klein
Marius Lindauer
Manuel Blum
Stefan Falkner
http://aad.informatik.uni-freiburg.de/1st place final 3; 1st place auto4https://github.com/automl/auto-sklearnWe use auto-sklearn (https://github.com/automl/auto-sklearn) with a new python version of SMAC (http://www.cs.ubc.ca/labs/beta/Projects/SMAC/) for the auto phase. We used auto-sklearn with the Java version of SMAC for the tweakathon to tune auto-sklearn and deep neural networks implemented in Lasagne/Theano. For the tweakathon we used the following setting:
* alexis: ?
* dionis: 25 jobs; one day each; 8GB RAM; 5fold CV
* grigoris: 25 jobs; one day each; 8GB RAM; 5 fold CV
* jannis: 25 jobs; one day each; 4 GB RAM; 5 fold CV
* wallis: 25 jobs; one day each; 4 GB RAM; 5 fold CV
Feurer, M. and Klein, A. and Eggensperger, K. and Springenberg, J. and Blum, M. and Hutter, F.
Efficient and Robust Automated Machine Leraning
In: Advances in Neural Information Processing Systems 28

1. Application of random functions, 4. Trained feature extractors1. Feature standardization (for numerical variables), 2. Sample normalization (for numerical variables), 3. Replacement of the missing values, Min/Max Scaling; None1. Linear manifold transformations (e.g. factor analysis, PCA, ICA), 2. Non-linear dimensionality reduction (e.g. KPCA, MDS, LLE, Laplacian Eigenmaps, Kohonen maps), 3. Clustering (e.g. K-means, hierarchical clustering), 4. Deep Learning (e.g. stacks of auto-encoders, stacks of RBMs), 5. Feature selection1. Decision tree, stub, or Random Forest, 2. Linear classifier (Fisher's discriminant, SVM, linear regression), 3. Non-linear kernel method (SVM, kernel ridge regression, kernel logistic regression), 8. Neural Network (or Deep Learning Method), 10. Nearest neighbors, Quadratic classifiers1. Hinge loss (like in SVM), 2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression), 4. Exponential loss (like in boosting)1. One-norm (sum of weight magnitudes, like in Lasso), 2. Two-norm (||w||^2, like in ridge regression and regular SVM), 3. None1. Boosting , 2. Bagging (check this if you use Random Forest), 4. Other ensemble method2. K-fold or leave-one-out cross-validation (using training data), auto-track: knowledge transfer from data generated offline11. Don't know; too difficult to evaluate1. Simple method, 2. Easy to implement, 3. Easy to parallelize, 6. Require only freeware librariesauto-sklearn freely available; python smac will be freely available by the 4th of March1. C/C++/C#, 2. Java or Weka, 4. Python1. Windows3. > 8 GB but <= 32 GB1. Multi-processor machine, 2. Run in parallel different algorithms on different machineshttps://github.com/automl/auto-sklearn4. > 2 man weeks4. > 2 weeks1. Yes> 2500
19
2/28/2016 13:28:57djajeticDamir JajetićCroatiahttps://github.com/djajeticdjajetic Final31. Application of random functionsNoneNone1. Decision tree, stub, or Random Forest4. Exponential loss (like in boosting)1. One-norm (sum of weight magnitudes, like in Lasso), 2. Two-norm (||w||^2, like in ridge regression and regular SVM)1. Boosting , 2. Bagging (check this if you use Random Forest)1. Leaderboard performance on validation data used for model selection, 4. Out-of-bag estimation (for bagging methods such as Random Forest)2. Quadratic in number of features, 5. Quadratic in number of training examples1. Simple method, 2. Easy to implement3. Freeware or shareware in house software 4. Python2. Linux4. > 32 GB1. Multi-processor machinehttps://github.com/djajetic/AutoML3Final3. 1-2 man weeks3. 1-2 weeks1. Yes4
20
2/28/2016 13:34:00djajeticDamir Jajetićhttps://github.com/djajeticdjajetic AutoML4None3. Replacement of the missing valuesNone1. Decision tree, stub, or Random Forest, 2. Linear classifier (Fisher's discriminant, SVM, linear regression), 6. Naïve Bayes, 10. Nearest neighbors2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression), 4. Exponential loss (like in boosting)4. Don't know1. Boosting , 2. Bagging (check this if you use Random Forest), 4. Other ensemble method4. Out-of-bag estimation (for bagging methods such as Random Forest)11. Don't know; too difficult to evaluate2. Easy to implement, 3. Easy to parallelize, 6. Require only freeware libraries3. Freeware or shareware in house software 4. Python2. Linux4. > 32 GB1. Multi-processor machinehttps://github.com/djajetic/AutoML44. > 2 man weeks4. > 2 weeks1. Yes1.5
21
4/14/2016 8:29:45aad_freiburg_gpuFrank HutterInstitut für Informatik
Albert-Ludwigs-Universität Freiburg
Sekretariat Nebel/GKI
Georges-Köhler-Allee 052
79110 Freiburg, Germany

+49 761 203-67740
automl2015@informatik.uni-freiburg.deHector Mendoza
Aaron Klein
Matthias Feurer
aad.informatik.uni-freiburg.deAutonetAutosklearn with Neural Networks instead of scikit-learnNot published yetNothing1. Feature standardization (for numerical variables), 2. Sample normalization (for numerical variables), 3. Replacement of the missing values, 4. Grouping modalities (for categorical variables), NothingNothing2. Linear classifier (Fisher's discriminant, SVM, linear regression), 8. Neural Network (or Deep Learning Method)1. Hinge loss (like in SVM), 2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression)2. Two-norm (||w||^2, like in ridge regression and regular SVM)4. Other ensemble method2. K-fold or leave-one-out cross-validation (using training data), 9. Bi-level optimization4. Linear in number of training examples, 11. Don't know; too difficult to evaluateNothing3. Freeware or shareware in house software , 5. Off-the-shelf third party freeware or shareware1. C/C++/C#, 4. Python2. Linux3. > 8 GB but <= 32 GB3. None4. > 2 man weeks4. > 2 weeks2. No, but I cannot spend more time1.67 hours
22
4/14/2016 13:17:59agrigorev_GPUAlexey GrigorevLanghansstr 70
13086 Berin
Germany

48 177 490 5706
alexey.s.grigoriev@gmail.comagrigorev_GPUneural networks using kerasnone1. Feature standardization (for numerical variables)1. Linear manifold transformations (e.g. factor analysis, PCA, ICA), 4. Deep Learning (e.g. stacks of auto-encoders, stacks of RBMs)8. Neural Network (or Deep Learning Method)2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression)1. One-norm (sum of weight magnitudes, like in Lasso), 2. Two-norm (||w||^2, like in ridge regression and regular SVM)5. None 1. Leaderboard performance on validation data used for model selection, 2. K-fold or leave-one-out cross-validation (using training data)11. Don't know; too difficult to evaluate1. Simple method3. Freeware or shareware in house software 4. Pythonkeras + theano2. Linux3. > 8 GB but <= 32 GB1. Multi-processor machine3. 1-2 man weeks1. A few hours1. Yes0
23
5/3/2016 1:35:15aad_freiburg-GPUDr. Frank HutterInstitut für Informatik
Albert-Ludwigs-Universität Freiburg
Sekretariat Nebel/GKI
Georges-Köhler-Allee 052
79110 Freiburg, Germany

+49 761 203-67740
aad_freiburg@fhutter.deHector Mendoza
Matthias Feurer
Aaron Klein
http://aad.informatik.uni-freiburg.de/Automated Configuration of Neural Networkshttps://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdfWe use Bayesian Optimization to find a good configuration (instance) for hyper-parameters (learning rate, regularization factor, etc.) used by a Neural Network.Efficient and Robust Automated Machine Learning -
Feurer, M. and Klein, A. and Eggensperger, K. and Springenberg, J. and Blum, M. and Hutter, F
no feature extraction1. Feature standardization (for numerical variables), 2. Sample normalization (for numerical variables), 3. Replacement of the missing values, 4. Grouping modalities (for categorical variables), 5. Discretization (for numerical variables)1. Linear manifold transformations (e.g. factor analysis, PCA, ICA), 5. Feature selection8. Neural Network (or Deep Learning Method)2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression)2. Two-norm (||w||^2, like in ridge regression and regular SVM)4. Other ensemble method2. K-fold or leave-one-out cross-validation (using training data)11. Don't know; too difficult to evaluate1. Simple method, 2. Easy to implement, 3. Easy to parallelize, 6. Require only freeware libraries3. Freeware or shareware in house software , 6. Not ready yet, but may share later1. C/C++/C#, 4. PythonUses standard theano and lasagne code for deep networks, with in house wrapper to use it with the automatic configuration machinery2. Linux4. > 32 GB1. Multi-processor machine, 2. Run in parallel different algorithms on different machines3. 1-2 man weeks4. > 2 weeks1. Yes1.67
24
5/3/2016 11:56:59djajetic_GPUDamir JajeticGPU Final4GPU Neural network based model on Lasagne and Theano libraries. Very simple and self-explanatory source code is available at https://github.com/djajetic/GPU_djajetic3. Hand-crafted features1. Feature standardization (for numerical variables), 2. Sample normalization (for numerical variables), 3. Replacement of the missing values1. Linear manifold transformations (e.g. factor analysis, PCA, ICA)8. Neural Network (or Deep Learning Method)2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression)Early stopping5. None 1. Leaderboard performance on validation data used for model selection, 2. K-fold or leave-one-out cross-validation (using training data)11. Don't know; too difficult to evaluate1. Simple method, 6. Require only freeware libraries3. Freeware or shareware in house software 4. Python2. Linux3. > 8 GB but <= 32 GBGPUhttps://github.com/djajetic/GPU_djajetic3. 1-2 man weeks2. A few days1. Yes1
25
5/5/2016 13:50:22djajeticAutoML5Software is simple and based of ensembled unsynchronized local search models without any communication between models (let’s call it particles, although not PSO, just for easy reading) or exploiting any properties that could be obtained by swarm intelligence and is consequently insensitive of swarm false believes in non-convex search spaces.1. Application of random functions, 4. Trained feature extractors1. Feature standardization (for numerical variables), 3. Replacement of the missing values1. Linear manifold transformations (e.g. factor analysis, PCA, ICA)1. Decision tree, stub, or Random Forest, 2. Linear classifier (Fisher's discriminant, SVM, linear regression), 6. Naïve Bayes, 10. Nearest neighbors2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression), 4. Exponential loss (like in boosting)1. One-norm (sum of weight magnitudes, like in Lasso), 2. Two-norm (||w||^2, like in ridge regression and regular SVM)1. Boosting , 2. Bagging (check this if you use Random Forest), 4. Other ensemble method2. K-fold or leave-one-out cross-validation (using training data)11. Don't know; too difficult to evaluate1. Simple method, 3. Easy to parallelize, 6. Require only freeware libraries3. Freeware or shareware in house software 4. Python2. Linux4. > 32 GB1. Multi-processor machinehttps://github.com/djajetic/AutoML54. > 2 man weeks4. > 2 weeks1. Yes1.5
26
5/5/2016 23:51:44abhishek4Abhishek ThakurHabersaathstr. 26, 10115, Berlin, Germany abhishek4@gmail.comAutoCompete Again : Better selection of AlgorithmsNONE1. Feature standardization (for numerical variables), 2. Sample normalization (for numerical variables)SVD1. Decision tree, stub, or Random Forest, 8. Neural Network (or Deep Learning Method)2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression)3. None1. Boosting 2. K-fold or leave-one-out cross-validation (using training data)11. Don't know; too difficult to evaluate1. Simple method, 2. Easy to implement, 3. Easy to parallelize6. Not ready yet, but may share later4. Python2. Linux3. > 8 GB but <= 32 GB1. Multi-processor machine, 2. Run in parallel different algorithms on different machineshttps://github.com/abhishekkrthakur/AutoML1. A few man hours1. A few hours1. Yes2
27
5/5/2016 23:54:56abhhishek-GPUAbhishek Thakurabhishek4@gmail.comAutoCompete Again : Better selection of Algorithmsnone1. Feature standardization (for numerical variables), 2. Sample normalization (for numerical variables)svd8. Neural Network (or Deep Learning Method)2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression)DropoutSimple Averaging2. K-fold or leave-one-out cross-validation (using training data)11. Don't know; too difficult to evaluate2. Easy to implement, 3. Easy to parallelize6. Not ready yet, but may share later4. Python2. Linux3. > 8 GB but <= 32 GB1. Multi-processor machine, GPU Machinehttps://github.com/abhishekkrthakur/automl_gpu1. A few man hours1. A few hours1. Yes3
28
5/6/2016 0:24:15AAD FreiburgFrank HutterInstitut für Informatik
Albert-Ludwigs-Universität Freiburg
Sekretariat Nebel/GKI
Georges-Köhler-Allee 052
79110 Freiburg, Germany

Phone +49 761 203-67740
automl2015@informatik.uni-freiburg.deMatthias Feurer
Katharina Eggensperger
Aaron Klein
Hector D. Mendoza
Jost Tobias Springenberg
Manuel Blum
Marius Lindauer
Frank Hutter
http://aad.informatik.uni-freiburg.de/1st place Final 4 & AutoML5https://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdfSearch space and ensembling properties for each individual particle is defined in separate python script and in this form can be defined dynamically . As particles are created they are unaware of any other outside information except 2 stop signals yielding best results they can on dataset and reporting precision on training subset.[1] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter, “Efficient and Robust Automated Machine Learning,” in Advances in Neural Information Processing Systems, 2015, pp. 2944–2952.
[2]J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian Optimization of Machine Learning Algorithms,” in Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, 2012, pp. 2951–2959.
[3]F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Sequential Model-Based Optimization for General Algorithm Configuration,” in Proceedings of the conference on Learning and Intelligent OptimizatioN, 2011, vol. 6683, pp. 507–523.
1. Application of random functions, Kernel Approximation1. Feature standardization (for numerical variables), 2. Sample normalization (for numerical variables), 3. Replacement of the missing values, 4. Grouping modalities (for categorical variables)1. Linear manifold transformations (e.g. factor analysis, PCA, ICA), 2. Non-linear dimensionality reduction (e.g. KPCA, MDS, LLE, Laplacian Eigenmaps, Kohonen maps), 5. Feature selection, hierarchical clustering, univariate feature selection, feature selection by training models and using their weights/feature importances1. Decision tree, stub, or Random Forest, 2. Linear classifier (Fisher's discriminant, SVM, linear regression), 3. Non-linear kernel method (SVM, kernel ridge regression, kernel logistic regression), 4. Gaussian Process, 6. Naïve Bayes, 8. Neural Network (or Deep Learning Method), 10. Nearest neighbors1. Hinge loss (like in SVM), 2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression), 4. Exponential loss (like in boosting), 5. None, 6. Don't know, Huber1. One-norm (sum of weight magnitudes, like in Lasso), 2. Two-norm (||w||^2, like in ridge regression and regular SVM), 3. None, 4. Don't know, elastic net1. Boosting , 2. Bagging (check this if you use Random Forest), 5. None , Ensemble selection2. K-fold or leave-one-out cross-validation (using training data), 9. Bi-level optimization11. Don't know; too difficult to evaluate2. Easy to implement, 3. Easy to parallelize, 6. Require only freeware libraries3. Freeware or shareware in house software 1. C/C++/C#, 4. PythonSoftware freely available at github.com/automl/auto-sklearn2. Linux2. > 2GB but <= 8 GB1. Multi-processor machine, 2. Run in parallel different algorithms on different machineshttps://github.com/automl/auto-sklearn4. > 2 man weeks4. > 2 weeks1. YesBi-level optimization: > 250 CPU days; training models found by bi-level optimization: ~1 CPU day
29
5/6/2016 5:29:36postech.mlg_exbrainJungtaek KimMachine Learning Group,
Department of Computer Science and Engineering, POSTECH,
77 Cheongam-ro, Nam-gu, Pohang-si 37673,
Gyungsangbuk-do, Republic of Korea
jtkim@postech.ac.krJongheon Jeonghttps://github.com/postech-mlg-exbrain/AutoML-ChallengeAutomated Machine Learning Framework Using Random Space Partitioning OptimizerF. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for gen- eral algorithm configuration (extended version). Technical Report 10-TR-SMAC, UBC, 2010.
B. Lakshminarayanan, D. M. Roy, and Y. W. Teh. Mondrian forests for large-scale regression when uncertainty matters. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2016.
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems (NIPS), volume 28, 2015.
C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pages 847–855, 2013.
1. Application of random functions3. Replacement of the missing values, 4. Grouping modalities (for categorical variables)1. Linear manifold transformations (e.g. factor analysis, PCA, ICA), 2. Non-linear dimensionality reduction (e.g. KPCA, MDS, LLE, Laplacian Eigenmaps, Kohonen maps)1. Decision tree, stub, or Random Forest, 2. Linear classifier (Fisher's discriminant, SVM, linear regression), 3. Non-linear kernel method (SVM, kernel ridge regression, kernel logistic regression), 6. Naïve Bayes, 10. Nearest neighbors1. Hinge loss (like in SVM), 2. Square loss (like in ridge regression), 3. Logistic loss or cross-entropy (like in logistic regression), 4. Exponential loss (like in boosting)3. None5. None 2. K-fold or leave-one-out cross-validation (using training data)11. Don't know; too difficult to evaluate2. Easy to implement, 3. Easy to parallelize, 7. Theoretically motivated6. Not ready yet, but may share later4. Python2. Linux3. > 8 GB but <= 32 GB1. Multi-processor machinehttps://github.com/postech-mlg-exbrain/AutoML-Challenge3. 1-2 man weeks1. A few hours1. Yes20
30
31
Ensembling module will use only N best modes from particles based on reported precision and only M best models from each model group (that is defined in “model definition script” by assigning similar models in same group). That makes parallelization almost linear with constraint that particle should have ability to handle creating model and producing results on test datasets by itself to be included in ensemble. As there is no feasibility heuristic except if one include it in model definition script, all unfinished models are (only) wasted computing time.
32
33
Some things must be slightly changed for parallelization with infinite scalability (ie. current implementation works only on single machine, ensembling module should have ie. pruning submodule etc.)
34
35
36
Software will start to produce output immediately after starting and will stop in task defined time. For challenge purposes, there is some hardcoded memory usage limitation, and to ensure output, some particles that will use subsample of data and subsample of features. No other preprocessing is done, but can be created as pipeline models in model definition script.
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Loading...
 
 
 
Sheet1