Some best practices for executable documents


When authoring an executable document – such as the data analysis report we will explore today – the goal is to make the code as readable as possible, highlighting what is computed, and abstracting away the technical details of how it is computed.

With Jupyter notebooks, this is typically achieved by implementing a collection of utilities in separate python code files. If you want to explore the code more in depth, you can use introspection (using “?” after the function, see examples below) to quickly look at the documentation and code of the utilities.

For this assignment, several utilities are provided in the Python module Some of them are incomplete: you will be prompted to implement them while you progress through this assignment. We do not expect you to be able to rewrite all the others on your own; however you should definitely check them out and try to understand them.

Making data available for analysis may also require some care. In general, it’s a whole subject in itself; see for example the FAIR principles. At the scale of the analysis we will conduct in this course, the main concern is to make the data easily accessible from the Jupyter notebook without duplicating it everywhere – and in particular in your submissions on GitLab – to save space. For this purpose, the datasets come preinstalled with the software stack of the course.

In practice

This configures Python to automatically reload modules whenever they are changed. Thereby, you won’t need to restart the kernel whenever you modify

%load_ext autoreload                  
%autoreload 2

Let’s import all the provided utilities:

from utilities import *

We can now use introspection to look at the documentation of the utility load_images:


or even at its code:


Datasets for this course are provided in the following directory:

from intro_science_donnees import data_dir

Here is the list of the currently available data sets:


Today, we will be interested in the dataset apples_and_bananas_simple:

import os.path
dataset_dir = os.path.join(data_dir, 'apples_and_bananas_simple')

It consists in a collection of images:


Your task now is to load all these images in a variable images and to display them.
Hint: look at the documentation of load_images and image_grid.

images = load_images(dataset_dir, '*.png')
image_grid(images, titles=images.index)
assert isinstance(images, pd.Series)
assert len(images) == 20
assert images.index[0] == "a01.png"

Recall from the documentation thatimages is a Panda Series indexed by the names of the images:


So you can recover an individual images either by its name or number:


Now load just the images whose name starts with b in the variables bimages, and display them using their filename as titles.
Hint: if needed, look again at the documentations!

bimages = load_images(dataset_dir, 'b*.png')
image_grid(bimages, titles=bimages.index)
assert isinstance(bimages, pd.Series)
assert len(bimages) == 10
assert bimages.index[0] == "b01.png"

Collapsible sections

When manipulating a long document like a report, it soon becomes tedious to scroll back and forth to navigate the document. To help with this, the Jupyter extension «Collapsible Headings» allows for collapsing (folding) sections and subsections. This extension should be enabled by default in the course environment: look for little grey triangles to the left of section titles, and try clicking on them.

All the cells below (code or text) are contained into the collapsible section.



Now that you have seen some of the best practice that we will follow when authoring reports, you may come back to exploring today’s data analysis.