Open Source and Reproducible MIR Research
Brian McFee
Thor Kell
Slides licensed CC BY-SA 4.0. Full tutorial at bmcfee.github.io/ismir2018-oss-tutorial
Introduction
Welcome!
This tutorial will leave you better able to manage and reproduce your research, as well as write code that’s easier to maintain and work with.
It will (hopefully) also be fun!
Introduction
Brian teaches at NYU and likes dogs.
Introduction
Thor works at Spotify and likes cats.
Introduction
This tutorial is in 8 parts! Please follow along with us at http://bit.ly/ossmir
Each part has checkpoints. We’ll go through the checkpoints together.
When you’ve got a checkpoint working, please put one of your blue stickies on your laptop, where we can see itl
If something’s not working, that’s cool! Put a red sticky on your machine, and we’ll come help.
Part 1: Installing Tools
In the first part, we’ll get a bunch of local tools installed.
Part 1: Installing Tools
First is a Python distribution called Anaconda.
It uses a package and environment manager called conda.
Part 1: Installing Tools
We also need a version control system called git.
Part 2 : Setting Up Services
On to part 2! We’ll use a bunch of web services as well, so let’s get them set up.
Part 2 : Setting Up Services
GitHub is a site for hosting your code and collaborating with others.
It’s also where other services access your code, so they can run your tests or build your docs for you.
Part 2 : Setting Up Services
Travis CI is a service that automatically runs your tests when you update your code on GitHub. We’ll get to it in Part 5, but it is super useful.
Part 3: Working With The Code
This is where we’ll start to deal with the code and seeing how it interacts with the tools we’ve installed!
Part 3: Getting The Code
We’re going to get you your own copy of the tutorial repository (called a fork), and then get that code on to your machine (a process called cloning).
Part 3: Installing Requirements
Here, we’ll use Conda to install all the packages we need, and create a new environment to install them in.
Part 3: Run the Tests
Let’s run some tests, and make sure that everything works!
Part 3: Save Our Work
Hooray, now everything’s working! Let’s commit our changes to our local repository, then push them to GitHub.
Part 3: GitHub workflow
Always make a new branch and pull request when you want to change things.
This makes it easy to test incrementally and collaborate with others.
MASTER
BRANCH
COMMIT
COMMIT
MERGE
Part 4: Python programming
How are python projects organized?
What’s a module? A package?
When should I use scripts?
Part 4: Scripts or packages?
Scripts are great for repeating a process exactly, like running a specific experiment.
Packages are great for components that can be used by many projects.
Both are important for reproducibility!
Part 5: Tests
So let’s start with why should we, as researchers, be writing tests for our software?
Part 5: Fix a Failing Test
We’ve seen all our tests fail before – now let’s fix a specific function that has specific failing tests.
Part 5: Tests Aren’t Everything
Testing is great, but it only reveals the bugs you could predict.
Use this to keep your software simple!
If you can’t design a test for a function, the function is probably too complex.
Part 5: Get Travis Working
We set up our Travis account in Part 2! Now we’re going to configure it to run our tests whenever we push a branch to GitHub.
Part 6: Documentation
Let’s talk about documentation! Docs are important for software in general, so that other people can use it. It is twice as important when writing scientific software, as you need to be able to reproduce your results and have other people use your software.
Part 6: Build Some Docs
We’re using a tool called Sphinx to make our documentation – it creates docs based on the comments in our code.
Part 6: Write Some Docs
Now that we’ve got things running, let’s add some documentation to a function that has none.
Part 6: Merge Some Docs
Now that your docs are updated, let’s send those changes back to the original repository, so we can see how to build our docs on the internet
Part 6: Licensing
A quick aside about open source and licensing – your code on GitHub is not open source unless you add a license. We’re big fans of open source, and we’d highly recommend adding a license to your code.
http://choosealicense.com can help.
Part 7: Ecosystem
OK, we’re done with the details! This next section will walk through other parts of the music/science/Python ecosystem.
Part 7: Jupyter Notebooks
Jupyter Notebooks are a great way to explore and prototype your code! Let’s try them out.
Part 7: The Core
numpy, scipy, matplotlib
Part 7: MIR Tools
librosa, mir_eval, jams
Part 7: midi / symbolic / audio
prettymidi, music21, pysoundfile
Part 8: Next Steps
We did it! Hooray! We’ll close by going over some other resources for both engineering and MIR research.
Thanks!
You’re great! Please don’t hesitate to ask us more questions over the rest of the conference, or on the internet: