Python Tutorial: Image Processing with NumPy and matplotlib¶

by Pascal Attwenger
SIP 2019S, University of Vienna

For this tutorial we'll be using Python 3.x with the packages NumPy and matplotlib.
If you don't already have them installed you can get them with pip install numpy, matplotlib.

After installing we have to import them. We could just use import numpy, but then we'd have to type the full name every single time we call a function. Instead we can give them a shorter name with as:

import numpy as np
import matplotlib.pyplot as plt

Alternatively we can directly import functions into our namespace. Don't use from somemodule import * as that will pollute your namespace with stuff you might not be aware of. List only your required functions instead:

from matplotlib.image import imread

Now we can load our first image. Get the images from the course website and unzip them into your working directory, then call:

im = imread('lena.png')

This will load the image as a NumPy array; you can check that with:

type(im)

numpy.ndarray

If we print out the image we see that it is now represented as 32-bit floats, with 0 being black and 1 being white:

im

array([[0.63529414, 0.63529414, 0.63529414, ..., 0.6666667 , 0.60784316,
        0.5019608 ],
       [0.63529414, 0.63529414, 0.63529414, ..., 0.6666667 , 0.60784316,
        0.5019608 ],
       [0.63529414, 0.63529414, 0.63529414, ..., 0.6666667 , 0.60784316,
        0.5019608 ],
       ...,
       [0.16862746, 0.16862746, 0.19607843, ..., 0.40784314, 0.39215687,
        0.38431373],
       [0.17254902, 0.17254902, 0.21568628, ..., 0.40784314, 0.4117647 ,
        0.42352942],
       [0.17254902, 0.17254902, 0.21568628, ..., 0.40784314, 0.4117647 ,
        0.42352942]], dtype=float32)

To display the image for human eyes we can plot it with matplotlib:

plt.imshow(im)

<matplotlib.image.AxesImage at 0x119542630>

The default color map might not be what you're expecting. You can set it to grayscale with one of the following calls:

plt.imshow(im, cmap='gray')
# or
plt.imshow(im, cmap=plt.cm.gray)

<matplotlib.image.AxesImage at 0x11c350eb8>

If you don't want to repeat that for every single plot you can set it once like this:

plt.set_cmap('gray')
# or
plt.set_cmap(plt.cm.gray)
# or even
plt.gray()

plt.imshow(im)

<matplotlib.image.AxesImage at 0x11ea696d8>

Normally, when dealing with plain Python objects, we'd use len() to get the size of a list. For NumPy arrays that gives us only the size of the first dimension, so it's not very useful. Instead, use shape to get the dimensions of the array as a tuple:

len(im)

512

im.shape

(512, 512)

There's also a property called size that contains the number of elements:

im.size

262144

Now we're going to load a second image. If you run the code below, you might get an error saying that it cannot be loaded. That's because matplotlib on its own only can deal with PNG files, not with TIFFs.
For it to work you'll first need to install a more powerful imaging library called Pillow with pip install Pillow.

im_tiff = imread('cameraman.tif')

If we again look at the data, we see that this time it's not represented as floating point numbers but as 8-bit integers between 0 and 255:

im_tiff

array([[156, 159, 158, ..., 151, 152, 152],
       [160, 154, 157, ..., 154, 155, 153],
       [156, 159, 158, ..., 151, 152, 152],
       ...,
       [114, 132, 123, ..., 135, 137, 114],
       [121, 126, 130, ..., 133, 130, 113],
       [121, 126, 130, ..., 133, 130, 113]], dtype=uint8)

For many use cases that's perfectly fine, but often we have to be aware of the difference and be able to convert between the formats. One way to do that would be the following function:

def im2float(im):
    return np.array(im / 255, dtype='float32')

im2float(im_tiff)

array([[0.6117647 , 0.62352943, 0.61960787, ..., 0.5921569 , 0.59607846,
        0.59607846],
       [0.627451  , 0.6039216 , 0.6156863 , ..., 0.6039216 , 0.60784316,
        0.6       ],
       [0.6117647 , 0.62352943, 0.61960787, ..., 0.5921569 , 0.59607846,
        0.59607846],
       ...,
       [0.44705883, 0.5176471 , 0.48235294, ..., 0.5294118 , 0.5372549 ,
        0.44705883],
       [0.4745098 , 0.49411765, 0.50980395, ..., 0.52156866, 0.50980395,
        0.44313726],
       [0.4745098 , 0.49411765, 0.50980395, ..., 0.52156866, 0.50980395,
        0.44313726]], dtype=float32)

This just divides every value by 255 and casts the resulting doubles down to float32. Be aware that this assumes Python 3.x; Python 2.x had a different way of doing integer division (cp. PEP 238)

For optimal rescaling results I'd recommend to use a properly tested package like scikit-image for these kinds of operations instead:

import skimage
skimage.img_as_float32(im_tiff)

array([[0.6117647 , 0.62352943, 0.61960787, ..., 0.5921569 , 0.59607846,
        0.59607846],
       [0.627451  , 0.6039216 , 0.6156863 , ..., 0.6039216 , 0.60784316,
        0.6       ],
       [0.6117647 , 0.62352943, 0.61960787, ..., 0.5921569 , 0.59607846,
        0.59607846],
       ...,
       [0.44705886, 0.5176471 , 0.48235297, ..., 0.5294118 , 0.5372549 ,
        0.44705886],
       [0.47450984, 0.49411768, 0.50980395, ..., 0.52156866, 0.50980395,
        0.4431373 ],
       [0.47450984, 0.49411768, 0.50980395, ..., 0.52156866, 0.50980395,
        0.4431373 ]], dtype=float32)

Use Case: Find the Difference¶

We'll now load two more images that look very similar, and we'll try to visualize their differences, thereby learning more about NumPy and matplotlib.

im1 = imread('sushi1.png')
im2 = imread('sushi2.png')

It would be helpful to display those two images next to each other; we can do that with subplots:

fig, ax = plt.subplots(1, 2)

Here we created 2 subplots (1 row, 2 columns). We got two return values from the function call: the Figure object, and an array of Axes objects that we can use to plot into:

fig, ax = plt.subplots(1, 2)
ax[0].imshow(im1)
ax[1].imshow(im2)

<matplotlib.image.AxesImage at 0x12299ba20>

We can add titles to the subplots but also to the figure itself:

fig, ax = plt.subplots(1, 2)
ax[0].imshow(im1)
ax[1].imshow(im2)

ax[0].set_title("original")
ax[1].set_title("modified")
fig.suptitle("Find the difference")

Text(0.5, 0.98, 'Find the difference')

If we take a look at shape of one of the new new images, we see that it has one more dimension compared to the grayscale images from before. That's because of its different color channels. Here we used an RGBA image, so it has four channels: red, green, blue, and alpha (transparency):

im1.shape

(628, 500, 4)

To first get rid of the alpha channel, we can use array slicing: Take all the rows and all the columns, but only the first three color channels:

im1_no_alpha = im1[0:628, 0:500, 0:3]
im1_no_alpha.shape

(628, 500, 3)

If we want the entire range of values over an axis we can omit the indices and only put the "empty" colon operator there, so a better way to write it would be:

im1_no_alpha = im1[:, :, :3]

Alternatively, instead of explicitly selecting the first three channels, we could also use a negative index to select "all but the last" channel, which we can use on the other image to get the same results:

im2_no_alpha = im2[:, :, :-1]

Please be aware that array slices don't copy the data but provide a view onto the original array. So if we were to change any values in im1_no_alpha, those changes would be reflected in the original im1.

Now that we've reduced the images to RGB with 3 channels, we can simply take the mean over those channels to get a grayscale image.
NumPy has a function mean, but by default it will return the single mean value of the array:

np.mean(im1_no_alpha)

0.4479883

If we want the function to be applied over a specific axis, in our case the third one, we have to specify that:

im1_gray = np.mean(im1_no_alpha, axis=2)
im1_gray.shape

(628, 500)

For the other image we do the same. The plot shows that we have indeed achieved a grayscale image.
(In this tutorial we're doing it this way to show off some of NumPy's capabilities. In a real world scenario you should probably use something like skimage.color.rgb2gray instead, which will also do a better job of weighting the three channels.)

im2_gray = np.mean(im2_no_alpha, axis=2)
plt.imshow(im2_gray)

<matplotlib.image.AxesImage at 0x12400aac8>

Now we can very easily calculate and plot the difference between the two images:

diff = im1_gray - im2_gray
plt.imshow(diff)

<matplotlib.image.AxesImage at 0x1244ae240>

If you're dealing with image diffs, you can always end up with negative values in your result:

(diff.min(), diff.max())

(-0.9189543, 0.82745105)

imshow automatically deals with this by scaling the lightness values appropriately, so the smallest (negative) values will be displayed black, the values close to zero will be gray, and the highest values will be white.
Often, this is exactly what you want. If you want to ignore the negative values instead, you can specify clipping limits: plt.imshow(im, clim=(0, 1)).

In our example we will deal with the negative values another way: By doing a power transform (i.e. squaring) on the images, we get rid of the small noise values close to zero and map all the values to the positive range at the same time:

diff_squared = diff**2
plt.imshow(diff_squared)

<matplotlib.image.AxesImage at 0x12476bb38>

The differences are not very visible here, so we'll also apply a threshold:

diff_thresholded = diff_squared > 0.05
plt.imshow(diff_thresholded)

<matplotlib.image.AxesImage at 0x124a306d8>

This actually returns an array with dtype('bool'), which will help us later.

diff_thresholded

array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])

Our next goal is to mark the differences we found as a red overlay on the grayscale image.
For that, we'll have to bring it back to three color channels. One way to to that is with np.tile(A, reps) which will "Construct an array by repeating A the number of times given by reps."

For that to work, we'll first have to convert the image from a 2D ($x \times y$) to a 3D ($x \times y \times 1$) array. We can use np.reshape:

xdim, ydim = im1_gray.shape
im1_gray_3d = np.reshape(im1_gray, (xdim, ydim, 1))
im1_gray_3d.shape

(628, 500, 1)

Alternatively we can use another form of array slicing which introduces a new axis. If you want to be more explicit you could use np.newaxis instead of None:

im1_gray_3d = im1_gray[:, :, None]
im1_gray_3d.shape

(628, 500, 1)

Now we can finally use np.tile to replicate the image on three channels:

diff_rgb = np.tile(im1_gray_3d, (1, 1, 3))
diff_rgb.shape

(628, 500, 3)

The final step is to color all the pixels where there's a difference red.
We can use the boolean diff_thresholded array from before to address all the pixels where it is True ("logical indexing"). Since we don't only want to change one value but all three color values, we again use slicing over the third axis, this time to set the values:

red = [1, 0, 0]
diff_rgb[diff_thresholded, :] = red
plt.imshow(diff_rgb)

<matplotlib.image.AxesImage at 0x124b58400>

On to the final plot!
Instead of just accepting an array of Axes we can also use "unwrapping" to assign all three Axes to their own variable like this:

fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
ax1.imshow(im1)
ax2.imshow(im2)
ax3.imshow(diff_rgb)

<matplotlib.image.AxesImage at 0x12589d7b8>

The layout of this figure does not really fit. You could use a constrained layout or a tight layout to remedy this.
But why not just make the figure bigger? A figsize of 20 by 7 inches works well here:

fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20, 7))
ax1.imshow(im1)
ax2.imshow(im2)
ax3.imshow(diff_rgb)

<matplotlib.image.AxesImage at 0x1266a7940>

Now all that's missing is some titles. We could set them line by line as we did before, but once your plots start getting more complex it's a good idea to solve it in a loop. We can put our images and titles in a list:

images = [im1, im2, diff_rgb]
titles = ["original", "modified", "diff"]

Now we can use the in-built Python method zip() to iterate over our two lists and the Axes array together as if they were a list of tuples:

fig, axes = plt.subplots(1, 3, figsize=(20, 7))
for ax, im, title in zip(axes, images, titles):
    ax.imshow(im)
    ax.set_title(title)

fig.suptitle("Found the difference!", fontsize=16)
fig.savefig('sushi_diff.png', bbox_inches='tight')

At the end we set the supertitle to a readable font size and save the figure as a PNG. With bbox_inches='tight' we suppress the big white borders of the PNG's bounding box.