Some best practices for executable documents¶
When authoring an executable document – such as the data analysis report we will explore today – the goal is to make the code as readable as possible, highlighting what is computed, and abstracting away the technical details of how it is computed.
With Jupyter notebooks, this is typically achieved by implementing a collection of utilities in separate python code files. If you want to explore the code more in depth, you can use introspection (using “?” after the function, see examples below) to quickly look at the documentation and code of the utilities.
For this assignment, several utilities are provided in the Python module utilities.py. Some of them are incomplete: you will be prompted to implement them while you progress through this assignment. We do not expect you to be able to rewrite all the others on your own; however you should definitely check them out and try to understand them.
Making data available for analysis may also require some care. In general, it’s a whole subject in itself; see for example the FAIR principles. At the scale of the analysis we will conduct in this course, the main concern is to make the data easily accessible from the Jupyter notebook without duplicating it everywhere – and in particular in your submissions on GitLab – to save space. For this purpose, the datasets come preinstalled with the software stack of the course.
This configures Python to automatically reload modules whenever
they are changed. Thereby, you won’t need to restart the kernel
whenever you modify
%load_ext autoreload %autoreload 2
Let’s import all the provided utilities:
from utilities import *
We can now use introspection to look at the documentation of the utility
or even at its code:
Datasets for this course are provided in the following directory:
from intro_science_donnees import data_dir data_dir
Here is the list of the currently available data sets:
Today, we will be interested in the dataset
import os.path dataset_dir = os.path.join(data_dir, 'apples_and_bananas_simple') dataset_dir
It consists in a collection of images:
Your task now is to load all these images in a variable
to display them.
Hint: look at the documentation of
### BEGIN SOLUTION images = load_images(dataset_dir, '*.png') image_grid(images, titles=images.index) ### END SOLUTION
assert isinstance(images, pd.Series) assert len(images) == 20 assert images.index == "a01.png"
Recall from the documentation that
images is a Panda Series
indexed by the names of the images:
So you can recover an individual images either by its name or number:
Now load just the images whose name starts with
b in the variables
and display them using their filename as titles.
Hint: if needed, look again at the documentations!
### BEGIN SOLUTION bimages = load_images(dataset_dir, 'b*.png') image_grid(bimages, titles=bimages.index) ### END SOLUTION
assert isinstance(bimages, pd.Series) assert len(bimages) == 10 assert bimages.index == "b01.png"
When manipulating a long document like a report, it soon becomes tedious to scroll back and forth to navigate the document. To help with this, the Jupyter extension «Collapsible Headings» allows for collapsing (folding) sections and subsections. This extension should be enabled by default in the course environment: look for little grey triangles to the left of section titles, and try clicking on them.
All the cells below (code or text) are contained into the collapsible section.