# Feature extraction: redness and elongation¶

In this worksheet, you will implement the extraction of the redness and elongation features of the images.

```
from PIL import Image
import matplotlib.pyplot as plt
%load_ext autoreload
%autoreload 2
from utilities import *
import os.path
from intro_science_donnees import data_dir
dataset_dir = os.path.join(data_dir, 'apples_and_bananas_simple')
images = load_images(dataset_dir, "*.png")
```

## Exercise 1: Extraction of the image foreground¶

To compute features of our images we first need to extract the
foreground of the picture by separating the object from its
background. For most images in this simple dataset, the object lies on
a light background. So a simple strategy is to choose a threshold
`theta`

(seuil) and decide that any pixel whose red, green or blue value
is below the threshold belongs to the foreground.

Let’s take an apple:

```
img = images['a10.png']
img
```

We compute, for each pixel, the min of the red, green and blue value:

```
M = np.array(img)
G = np.min(M[:,:,0:3], axis=2)
fig = Figure()
imgg = fig.add_subplot().imshow(G, cmap='Greys_r')
fig.colorbar(imgg)
fig
```

and derive a boolean array `F`

(or black and white image) where `F[i,j]`

is `True`

(white) whenever the pixel
of coordinates `i`

, `j`

is in the foreground:

```
theta = 150
F = G < theta
plt.imshow(F);
```

Try again with other values for the threshold `theta`

.

Using the above as inspiration, implement in `utilities.py`

the
function `foreground_filter(img, theta = 150)`

that takes a numpy
array or PIL image `img`

as argument together with a threshold
`theta`

, and returns a thresholded image. Check it on our image:

```
plt.imshow(foreground_filter(img, 150));
```

```
show_source(foreground_filter)
```

Now, apply the filter with a threshold of 200 to all images in the dataset and display the result.

**Hints:**

Use a comprehension

`[f(x) for x in ...]`

to apply the filter to all imagesUse

`image_grid`

to display the result

```
### BEGIN SOLUTION
image_grid([foreground_filter(img, theta=200) for img in images],
titles=images.index)
### END SOLUTION
```

`utilities.py`

provides a filter `transparent_background_filter`

that
calls `foreground_filter`

and makes all pixels in the background
transparent. Apply it to all images in the dataset, and try different
thresholds `theta`

:

```
### BEGIN SOLUTION
image_grid([transparent_background_filter(img, theta=130) for img in images],
titles=images.index)
### END SOLUTION
```

## Exercise 2: Extraction of the `redness`

feature¶

We now want to extract the `redness`

as the average (mean) of the
foreground pixels of the `red`

channel (those that are `True`

in `F`

)
minus the average of the foreground pixels in the `green`

channel.

Implement the function `redness(img)`

in `utilities.py`

.

**Hints:**

To compute the mean, it’s best to work with floating point numbers. Make sure to extract, for example, the green channel with

`G = M[:, :, 1] * 1.0`

.Recall that

`np.mean(R)`

computes the mean of all values of an array`R`

;Given an array

`R`

and a boolean array (such as`F`

) of same dimensions,`R[F]`

returns an array of all values`R[i,j]`

such that`F[i,j]`

is True. For example:

```
R = np.array([[1,2], [3,4]])
R
```

```
F = np.array([[True, False], [True, True]])
F
```

```
R[F]
```

```
show_source(redness)
```

Check visually your `redness`

function on the images of the dataset:

```
image_grid(images,
titles=["{0:.2f}".format(redness(img)) for img in images])
```

Check your `redness`

function with these automated tests:

```
assert abs(redness(images['b01.png']) - 0 ) < 0.1
assert abs(redness(images['a01.png']) - 41.48) < 0.1
assert abs(redness(images['a09.png']) - -3.66) < 0.1
```

## Question 3: Extraction of the `elongation`

feature¶

As a second feature to distinguish apples from bananas, we extract
the **elongation** of the fruit: the ratio over
the length over the width of the object. But how to measure these
in the first place, when the fruits can have any orientation, and
there can be noise in the picture?

We will use the occasion to show a nifty trick, implemented
in the `elongation`

function. Display the elongation
for all the fruits in the data set as computed by this function,
and check visually that it’s plausible. You may want to use a ruler!

```
### BEGIN SOLUTION
image_grid(images,
titles=["{0:.2f}".format(elongation(img)) for img in images])
### END SOLUTION
```

So, how does this work?

We convert the black and white image into a cloud of points:
*Each point represents the coordinates of one of
the foreground pixels (similar to a matrix in sparse format)*. Then we
find the **principle axes** of the cloud of points, using a well-known
algorithm called **singular value decomposition**. The first principal
axis is the direction of **largest variance** of the cloud of
points. The second one is the direction orthogonal to the first
one. The aspect ratio will simply be defined as the ratio of the
standard deviations in the two principal directions.

Let’s illustrate the process on the synthetic banana:

```
img = images['b01.png']
```

```
# Build the cloud of points defined by the foreground image pixels
F = foreground_filter(img)
xy = np.argwhere(F)
# Build the picture
fig = Figure(figsize=(20, 5))
# Original image
subplot = fig.add_subplot(1, 3, 1)
subplot.imshow(img)
subplot.set_title("Original image", fontsize=18)
# The foreground as a black and white picture
subplot = fig.add_subplot(1, 3, 2)
subplot.imshow(foreground_filter(img))
subplot.set_title("Foreground", fontsize=18)
# The cloud of points, as a scatter plot, together with the principal axes
subplot = fig.add_subplot(1, 3, 3)
subplot.scatter(xy[:,1], xy[:,0])
elongation_plot(img, subplot)
subplot.set_xlim(0, 31)
subplot.set_ylim(31, 0)
subplot.set_aspect('equal', adjustable='box')
subplot.set_title("Cloud of points and principal axes", fontsize=18)
fig
```

Try again with other pictures!

The trick we just used to extract features from a cloud of points is a mainstream method in machine learning called PCA (Principal Component Analysis). See the appendix for a another example.

You will learn the mathematics behind PCA in later linear algebra courses. However, thanks to the existing libraries, you can apply it right now in just a few lines:

```
show_source(elongation)
```

## Conclusion¶

We now have utilities to compute two features for our images: redness and elongations. Let’s come back to the data analysis and see if these features are sufficient to distinguish between apples and bananas!