Algorithm benchmarks shine a beacon for machine learning research. They allow us, as a community, to track progress over time, identify challenging issues, to raise the bar and learn how to do better. The OpenML platform already serves thousands of datasets together with tasks in a machine-readable way. OpenML is also integrated into many machine learning libraries, so that fine details about machine learning models (or pipelines) and their performance evaluations can be automatically collected. This integration allows experiments to be automatically shared and organized on the platform, linked to the underlying datasets and tasks. A first benchmark suite that brings together a set of machine learning papers is the OpenML-CC18, consisting out of 72 datasets that has been extensively used in scientific papers.
In order to further advance benchmarks in machine learning research, we call for additional classification and regression datasets, to be included in a newly formed benchmark suite, the OpenML curated classification & regression 2023 benchmark suite. In order to do so, we are opening a call for datasets.
Please register your dataset in the following form (so we can assess it and get back to you with feedback).
In case you have any questions, please contact: jvrijn@liacs.nl