Installing Python with Conda
Dr Andrew J. Stewart
E: drandrewjstewart@gmail.com
T: @ajstewart_lang
G: ajstewartlang
Goal of this session
The main goal of this session is to ensure you have a working Python installation using Conda that you can access via Jupyter Notebooks.
This installation might be on our own computer, or it might be on a University-managed device. Ideally, this will also be an Anaconda installation of Python - this will give you all the tools you need (plus the ability to manage Python libraries and dependencies) for the Python sessions in this unit.
If you don’t have access to a working Python installation by the end of this session, let me know!
Why learn Python?
Installing Conda
More detailed instructions can be found here: https://docs.anaconda.com/anaconda/install/
Activating Conda and Running Python
Once you have installed Conda, you can verify it’s all working as expected. We will be working with Conda and Python via the Command Line, which in Windows you can access via “Anaconda Prompt” on Windows, or via the Terminal in MacOS and Linux.
Detailed instructions for can be found here: https://docs.anaconda.com/anaconda/install/verify-install/
Let’s now look at how we install Anaconda on Windows and Mac - and then check to make sure everything is working ok...
Jupyter Notebooks
Jupyter Notebooks run in your web browser and allow you to run chunks of Python code and see the output in the same window. You can save Jupyter Notebooks to share with others (so they can interact with your code too) and you can also export Jupyter Notebooks as .html files (amongst other formats) so people can see your code and output in the one document (cf. R Markdown generated .html files).
Creating a new environment
Rather than just use the default environment, it’s better practice from a reproducibility point of view to create a separate environment for each type of project. That way, you can have different packages - and different versions of each package - in different environments. Updating a package in one environment won’t affect the package in other environments - all this is managed via conda.
Below I use the conda create command to create a new virtual environment that I’m calling data_science. If I want just a few specific packages, I’d add the package names after data_science (I can also add the package versions too here if I want, e.g., numpy=1.19.2).
$ conda create --name data_science conda numpy pandas scipy matplotlib
Creating a new environment
We can check what environments we now have on our machine by typing:
$ conda info -e
# conda environments:
#
base * /home/andrew/anaconda3
data_science /home/andrew/anaconda3/envs/data_science
The * indicates which environment we are currently in. Note, if we wanted to, we can delete the data_science environment using:
$ conda remove -n data_science --all
Finding what packages (and version numbers) are in our environment
Typing conda list will list the packages and version numbers in the currently active environment. Note, here we’re now in the data_science environment.
$ conda activate data_science
$ conda list
# packages in environment at /home/andrew/anaconda3/envs/data_science:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
alabaster 0.7.12 py_0
anaconda 2020.11 py38_0
anaconda-client 1.7.2 py38_0
anaconda-project 0.8.4 py_0
argh 0.26.2 py38_0
argon2-cffi 20.1.0 py38h7b6447c_1
asn1crypto 1.4.0 py_0
...
Exporting our environment
If we need to export all the information about our environment (perhaps to share with others or to build a Docker image), we can do that using the following to produce output in YAML format:
$ conda env export
name: data_science
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- alabaster=0.7.12=py_0
- anaconda=2020.11=py38_0
- anaconda-client=1.7.2=py38_0
- anaconda-project=0.8.4=py_0
- argh=0.26.2=py38_0
- argon2-cffi=20.1.0=py38h7b6447c_1
- asn1crypto=1.4.0=py_0
...
Creating a new Python shell and Jupyter kernel...
In the following, I activate my new environment, install the ipykernel package (if it’s not already present) and create the kernel for use in Jupyter.
(base)$ conda activate data_science
(data_science)$ conda install ipykernel
(data_science)$ python -m ipykernel install --user --name data_science --display-name "Python (data_science)"
(data_science)$ conda deactivate
We’re now ready to fire up a Jupyter Notebook. To do that, in the base environment type:
(base)$ jupyter notebook
Select a folder that you want to create your new Jupyter Notebook in - and click on it. I have a folder called my_jupyter_notebooks which is where I put all mine - you might want to create a new folder for this...
Now let’s create a new Jupyter Notebook file in our data_science environment - click on the ‘New’ tab in the top right...
Click here and come up with a filename for your notebook - I’m calling my one “hello_world”.
In this cell, type: print(“hello world!”) and then press CTRL-RETURN to run that cell - you should then see...
You’ll see we’re running in our data_science kernel.
Cells in Jupyter Notebooks can be of various types - let’s click on Insert -> Insert Cell Below to add a new cell.
in addition to Python code, we can also have a cell that’s written in Markdown.
And now if you press CTRL-RETURN the cell will be rendered - by adding various other bits of code you can build up a document like this.
Keyboard shortcuts
There are lots of Jupyter Notebook keyboard shortcuts that you can use. To see the list of them all, click on the icon here:
Downloading/Sharing your notebook
If you wanted to submit your Jupyter Notebook as a document alongside a journal article (say), you can download it in various formats - incl. as an .html file.
To share with others (and allow them to interact with it), you could download it as a Notebook (.ipynb) file.
Downloading as a Python script
You can download the script itself so you can run it outside the Jupyter Notebook. I’ll download it and save it to a folder on my machine that I’ve called python_scripts
Running the .py script from the Command Line
We can run Python scripts from the command line - just make sure you’re running it in the right conda environment - we can tell that we’re in the data_science environment by the fact that (data_science) is presented at the start of the prompt. From the folder where you’ve saved the hello_world.py script, you can type python hello_world.py like below.
(data_science)$ python hello_world.py
hello world!
Obviously this is a trivial example, but it’s useful to remember that you can run Python scripts like this rather than having to manually run things in Jupyter Notebooks or another interactive environment.
Summary
Hopefully I’ve convinced you that Python is a language worth learning - it’s a general purpose programming language and is used widely in data science.
You should understand why it’s important to ensure you write your Python code in conda environments where Python library versions are managed to ensure your Python script is reproducible.
You should now have a working Python installation on a machine you can access and are ready to start writing some Python code in a Jupyter Notebook!