1 of 25

Installing Python with Conda

Dr Andrew J. Stewart

E: drandrewjstewart@gmail.com

T: @ajstewart_lang

G: ajstewartlang

2 of 25

Goal of this session

The main goal of this session is to ensure you have a working Python installation using Conda that you can access via Jupyter Notebooks.

This installation might be on our own computer, or it might be on a University-managed device. Ideally, this will also be an Anaconda installation of Python - this will give you all the tools you need (plus the ability to manage Python libraries and dependencies) for the Python sessions in this unit.

If you don’t have access to a working Python installation by the end of this session, let me know!

3 of 25

Why learn Python?

  • Python is a general purpose programming language increasingly used for machine learning and scientific programming (as well as a bunch of other things such as the Instagram, Netflix, Spotify).

  • Many data science/analysis roles require a mix of Python and R - it’s great if you know one of these languages, even better if you know both!

  • Python and R are both interpreted languages - and code can be used together (you can run a Python script from within an R session, and vice versa).

  • You may find some things are easier/quicker to code in one language, and others in the other. So why not use both?

4 of 25

5 of 25

6 of 25

7 of 25

Installing Conda

8 of 25

More detailed instructions can be found here: https://docs.anaconda.com/anaconda/install/

9 of 25

Activating Conda and Running Python

Once you have installed Conda, you can verify it’s all working as expected. We will be working with Conda and Python via the Command Line, which in Windows you can access via “Anaconda Prompt” on Windows, or via the Terminal in MacOS and Linux.

Detailed instructions for can be found here: https://docs.anaconda.com/anaconda/install/verify-install/

Let’s now look at how we install Anaconda on Windows and Mac - and then check to make sure everything is working ok...

10 of 25

Jupyter Notebooks

Jupyter Notebooks run in your web browser and allow you to run chunks of Python code and see the output in the same window. You can save Jupyter Notebooks to share with others (so they can interact with your code too) and you can also export Jupyter Notebooks as .html files (amongst other formats) so people can see your code and output in the one document (cf. R Markdown generated .html files).

11 of 25

Creating a new environment

Rather than just use the default environment, it’s better practice from a reproducibility point of view to create a separate environment for each type of project. That way, you can have different packages - and different versions of each package - in different environments. Updating a package in one environment won’t affect the package in other environments - all this is managed via conda.

Below I use the conda create command to create a new virtual environment that I’m calling data_science. If I want just a few specific packages, I’d add the package names after data_science (I can also add the package versions too here if I want, e.g., numpy=1.19.2).

$ conda create --name data_science conda numpy pandas scipy matplotlib

12 of 25

Creating a new environment

We can check what environments we now have on our machine by typing:

$ conda info -e

# conda environments:

#

base * /home/andrew/anaconda3

data_science /home/andrew/anaconda3/envs/data_science

The * indicates which environment we are currently in. Note, if we wanted to, we can delete the data_science environment using:

$ conda remove -n data_science --all

13 of 25

Finding what packages (and version numbers) are in our environment

Typing conda list will list the packages and version numbers in the currently active environment. Note, here we’re now in the data_science environment.

$ conda activate data_science

$ conda list

# packages in environment at /home/andrew/anaconda3/envs/data_science:

#

# Name Version Build Channel

_libgcc_mutex 0.1 main

alabaster 0.7.12 py_0

anaconda 2020.11 py38_0

anaconda-client 1.7.2 py38_0

anaconda-project 0.8.4 py_0

argh 0.26.2 py38_0

argon2-cffi 20.1.0 py38h7b6447c_1

asn1crypto 1.4.0 py_0

...

14 of 25

Exporting our environment

If we need to export all the information about our environment (perhaps to share with others or to build a Docker image), we can do that using the following to produce output in YAML format:

$ conda env export

name: data_science

channels:

- defaults

dependencies:

- _libgcc_mutex=0.1=main

- alabaster=0.7.12=py_0

- anaconda=2020.11=py38_0

- anaconda-client=1.7.2=py38_0

- anaconda-project=0.8.4=py_0

- argh=0.26.2=py38_0

- argon2-cffi=20.1.0=py38h7b6447c_1

- asn1crypto=1.4.0=py_0

...

15 of 25

Creating a new Python shell and Jupyter kernel...

In the following, I activate my new environment, install the ipykernel package (if it’s not already present) and create the kernel for use in Jupyter.

(base)$ conda activate data_science

(data_science)$ conda install ipykernel

(data_science)$ python -m ipykernel install --user --name data_science --display-name "Python (data_science)"

(data_science)$ conda deactivate

We’re now ready to fire up a Jupyter Notebook. To do that, in the base environment type:

(base)$ jupyter notebook

16 of 25

Select a folder that you want to create your new Jupyter Notebook in - and click on it. I have a folder called my_jupyter_notebooks which is where I put all mine - you might want to create a new folder for this...

17 of 25

Now let’s create a new Jupyter Notebook file in our data_science environment - click on the ‘New’ tab in the top right...

18 of 25

Click here and come up with a filename for your notebook - I’m calling my one “hello_world”.

In this cell, type: print(“hello world!”) and then press CTRL-RETURN to run that cell - you should then see...

You’ll see we’re running in our data_science kernel.

19 of 25

Cells in Jupyter Notebooks can be of various types - let’s click on Insert -> Insert Cell Below to add a new cell.

in addition to Python code, we can also have a cell that’s written in Markdown.

20 of 25

And now if you press CTRL-RETURN the cell will be rendered - by adding various other bits of code you can build up a document like this.

21 of 25

Keyboard shortcuts

There are lots of Jupyter Notebook keyboard shortcuts that you can use. To see the list of them all, click on the icon here:

22 of 25

Downloading/Sharing your notebook

If you wanted to submit your Jupyter Notebook as a document alongside a journal article (say), you can download it in various formats - incl. as an .html file.

To share with others (and allow them to interact with it), you could download it as a Notebook (.ipynb) file.

23 of 25

Downloading as a Python script

You can download the script itself so you can run it outside the Jupyter Notebook. I’ll download it and save it to a folder on my machine that I’ve called python_scripts

24 of 25

Running the .py script from the Command Line

We can run Python scripts from the command line - just make sure you’re running it in the right conda environment - we can tell that we’re in the data_science environment by the fact that (data_science) is presented at the start of the prompt. From the folder where you’ve saved the hello_world.py script, you can type python hello_world.py like below.

(data_science)$ python hello_world.py

hello world!

Obviously this is a trivial example, but it’s useful to remember that you can run Python scripts like this rather than having to manually run things in Jupyter Notebooks or another interactive environment.

25 of 25

Summary

Hopefully I’ve convinced you that Python is a language worth learning - it’s a general purpose programming language and is used widely in data science.

You should understand why it’s important to ensure you write your Python code in conda environments where Python library versions are managed to ensure your Python script is reproducible.

You should now have a working Python installation on a machine you can access and are ready to start writing some Python code in a Jupyter Notebook!