1 of 11

Understanding your code with Machine Learning

The Data Platform for your �Software Development Life Cycle

2 of 11

Intro

Hugo, Data Scientist @ source{d}

Alex, Engineer @ source{d}

2

3 of 11

Agenda

  • Goals of this workshop
  • MLonCode field overview
  • Data Science project
  • Workshop tasks
  • Hands On: preparations�

3

4 of 11

Goals of the workshop

What to expect from the rest 2h:

  • Example of running a data-science project
  • Specifics of Machine Learning on Code field
  • OSS tooling to process the source code
  • Dataset and all needed software is on USB*

*Start copying the data!

4

5 of 11

Machine Learning on Code

5

6 of 11

Data Science project

Main activities during a data science project

  1. Problem statement
  2. Data collection
  3. Exploration
  4. Evaluation
  5. Communication

6

7 of 11

Workshop tasks

  • Developer & project similarity

  • Function name suggestion

7

8 of 11

Workshop tasks

Developer and project similarity

  • How to find similar developers, based on their contributions?

  • How to find similar projects, base on their topics?

Both can be done in unsupervised way,

by building a vector representation for individual developer & project.

8

9 of 11

Workshop tasks

Function name suggestion

How to predict the name of a function, based on its body text?

We are going to use a simple Machine Translation baseline* model

9

* check github.com/src-d/awesome-machine-learning-on-source-code/ for State Of the Art (SOtA) models

10 of 11

Hands On: preparations

Before we begin: necessary steps to prepare local environment

  • After you have copied the data
  • Check REAMDE.md for instructions on how to proceed
  • Load the 3 required Docker images
  • Open first Notebook

10

11 of 11

Let’s get coding!