# Feature extraction: redness and elongation¶

In this worksheet, you will implement the extraction of the redness and elongation features of the images.

```from PIL import Image
import matplotlib.pyplot as plt

from utilities import *
import os.path

from intro_science_donnees import data_dir
dataset_dir = os.path.join(data_dir, 'apples_and_bananas_simple')
```

## Exercise 1: Extraction of the image foreground¶

To compute features of our images we first need to extract the foreground of the picture by separating the object from its background. For most images in this simple dataset, the object lies on a light background. So a simple strategy is to choose a threshold `theta` (seuil) and decide that any pixel whose red, green or blue value is below the threshold belongs to the foreground.

Let’s take an apple:

```img = images['a10.png']
img
```

We compute, for each pixel, the min of the red, green and blue value:

```M = np.array(img)
G = np.min(M[:,:,0:3], axis=2)
fig = Figure()
fig.colorbar(imgg)
fig
```

and derive a boolean array `F` (or black and white image) where `F[i,j]` is `True` (white) whenever the pixel of coordinates `i`, `j` is in the foreground:

```theta = 150
F = G < theta
plt.imshow(F);
```

Try again with other values for the threshold `theta`.

Using the above as inspiration, implement in `utilities.py` the function `foreground_filter(img, theta = 150)` that takes a numpy array or PIL image `img` as argument together with a threshold `theta`, and returns a thresholded image. Check it on our image:

```plt.imshow(foreground_filter(img, 150));
```
```show_source(foreground_filter)
```

Now, apply the filter with a threshold of 200 to all images in the dataset and display the result.
Hints:

• Use a comprehension `[f(x) for x in ...]` to apply the filter to all images

• Use `image_grid` to display the result

```### BEGIN SOLUTION
image_grid([foreground_filter(img, theta=200) for img in images],
titles=images.index)
### END SOLUTION
```

`utilities.py` provides a filter `transparent_background_filter` that calls `foreground_filter` and makes all pixels in the background transparent. Apply it to all images in the dataset, and try different thresholds `theta`:

```### BEGIN SOLUTION
image_grid([transparent_background_filter(img, theta=130) for img in images],
titles=images.index)
### END SOLUTION
```

## Exercise 2: Extraction of the `redness` feature¶

We now want to extract the `redness` as the average (mean) of the foreground pixels of the `red` channel (those that are `True` in `F`) minus the average of the foreground pixels in the `green` channel.

Implement the function `redness(img)` in `utilities.py`.

Hints:

• To compute the mean, it’s best to work with floating point numbers. Make sure to extract, for example, the green channel with `G = M[:, :, 1] * 1.0`.

• Recall that `np.mean(R)` computes the mean of all values of an array `R`;

• Given an array `R` and a boolean array (such as `F`) of same dimensions, `R[F]` returns an array of all values `R[i,j]` such that `F[i,j]` is True. For example:

```R = np.array([[1,2], [3,4]])
R
```
```F = np.array([[True, False], [True, True]])
F
```
```R[F]
```
```show_source(redness)
```

Check visually your `redness` function on the images of the dataset:

```image_grid(images,
titles=["{0:.2f}".format(redness(img)) for img in images])
```

Check your `redness` function with these automated tests:

```assert abs(redness(images['b01.png']) -  0   ) < 0.1
assert abs(redness(images['a01.png']) - 41.48) < 0.1
assert abs(redness(images['a09.png']) - -3.66) < 0.1
```

## Question 3: Extraction of the `elongation` feature¶

As a second feature to distinguish apples from bananas, we extract the elongation of the fruit: the ratio over the length over the width of the object. But how to measure these in the first place, when the fruits can have any orientation, and there can be noise in the picture?

We will use the occasion to show a nifty trick, implemented in the `elongation` function. Display the elongation for all the fruits in the data set as computed by this function, and check visually that it’s plausible. You may want to use a ruler!

```### BEGIN SOLUTION
image_grid(images,
titles=["{0:.2f}".format(elongation(img)) for img in images])
### END SOLUTION
```

So, how does this work?

We convert the black and white image into a cloud of points: Each point represents the coordinates of one of the foreground pixels (similar to a matrix in sparse format). Then we find the principle axes of the cloud of points, using a well-known algorithm called singular value decomposition. The first principal axis is the direction of largest variance of the cloud of points. The second one is the direction orthogonal to the first one. The aspect ratio will simply be defined as the ratio of the standard deviations in the two principal directions.

Let’s illustrate the process on the synthetic banana:

```img = images['b01.png']
```
```# Build the cloud of points defined by the foreground image pixels
F = foreground_filter(img)
xy = np.argwhere(F)
# Build the picture
fig = Figure(figsize=(20, 5))
# Original image
subplot.imshow(img)
subplot.set_title("Original image", fontsize=18)
# The foreground as a black and white picture
subplot.imshow(foreground_filter(img))
subplot.set_title("Foreground", fontsize=18)
# The cloud of points, as a scatter plot, together with the principal axes
subplot.scatter(xy[:,1], xy[:,0])
elongation_plot(img, subplot)
subplot.set_xlim(0, 31)
subplot.set_ylim(31, 0)
subplot.set_title("Cloud of points and principal axes",  fontsize=18)
fig
```

Try again with other pictures!

The trick we just used to extract features from a cloud of points is a mainstream method in machine learning called PCA (Principal Component Analysis). See the appendix for a another example.

You will learn the mathematics behind PCA in later linear algebra courses. However, thanks to the existing libraries, you can apply it right now in just a few lines:

```show_source(elongation)
```

## Conclusion¶

We now have utilities to compute two features for our images: redness and elongations. Let’s come back to the data analysis and see if these features are sufficient to distinguish between apples and bananas!