Advances in ML: Theory meets practice
Workshop at the Applied Machine Learning Days 2018
January 28, 2018, Lausanne, Switzerland
“Can machine learning help to improve this application?”
After this question pops up in the mind of a user -- a biologist, an astrophysicist, or a social scientist -- how long would it take for her to get an answer? Our research studies how to answer this question as rapidly as possible, by accelerating the whole machine learning process.
Making a deep learning system to train faster is indispensable for this purpose, but there is far more to it than that. Our research focuses on: (1) applications, (2) systems, and (3) abstractions. For applications, I will talk about machine learning applications that we enabled by supporting a range of users, none of whom had backgrounds in computer science. For systems, we focus on understanding the system trade-off of distributed training and inference for a diverse set of machine learning models, and how to co-design machine learning algorithms and modern hardware so as to unleash the full potential of both. I will talk in detail about our recent results and their application to FPGA-based acceleration. For abstractions, I will introduce ease.ml, a high-level declarative system for machine learning, which enables the coding of many of the applications we built with just four lines of code.
Like writing and speaking, software development is an act of human communication. Humans need to understand, maintain and extend code. To achieve this efficiently, developers write code using implicit and explicit syntactic and semantic conventions that aim to ease human communication. The existence of these conventions has raised the exciting opportunity of creating machine learning models that learn from existing code and are embedded within software engineering tools.
This nascent area of "big code" or "code naturalness" lies in the intersection of the software engineering, programming languages and machine learning communities. The core challenge rests on finding methods that learn from highly structured and discrete objects with formal constraints and semantics. In this talk, I will give a brief overview of the research area, highlight a few interesting findings and discuss some of the emerging challenges for machine learning.
We introduce an exact distributed algorithm to train Random Forest models as well as other decision forest models without relying on approximating best split search. We introduce the proposed algorithm, and compare it, for various complexity measures (time, ram, disk, and network complexity analysis), to related approaches. We report its running performances on artificial and real-world datasets up to 17 billions examples. This figure is several orders of magnitude larger than datasets tackled in the existing literature. Finally, we show empirically that Random Forest benefits from being trained on more data, even in the case of already gigantic datasets. decision trees. Sprint is particularly suitable for the distributed setting, but we show that Sliq becomes better in the balanced case and/or when working with randomly drawn subsets of features; and we derive a rule for automatically switching between both methods. Given a dataset with 17.3B examples with 71 features, our implementation trains a tree in 22h.
Joint work with Mathieu Guillame-Bert.
This talk focuses on techniques to accelerate the distributed training of large-scale machine learning models in heterogeneous compute environments. Such techniques are particularly important for applications where the training time is a severe bottleneck. They can enable more agile development and thus allow to better explore the parameter and model space which in turn yields to higher quality predictions. In this talk I will give insight into recent advances in distributed optimization and primal-dual optimization methods. I will focus on how such methods can be combined with novel techniques to accelerate machine learning algorithms on heterogeneous compute resources. Putting it all together, I will demonstrate the training of a linear classifier on the criteo click prediction dataset, consisting of 1 billion training examples, in a few seconds.