by Pascal Attwenger
SIP 2019S, University of Vienna
For this tutorial we'll be using Python 3.x with the packages NumPy and matplotlib.
If you don't already have them installed you can get them with pip install numpy, matplotlib
.
After installing we have to import them. We could just use import numpy
, but then we'd have to type the full name every single time we call a function. Instead we can give them a shorter name with as
:
import numpy as np
import matplotlib.pyplot as plt
Alternatively we can directly import functions into our namespace. Don't use from somemodule import *
as that will pollute your namespace with stuff you might not be aware of. List only your required functions instead:
from matplotlib.image import imread
Now we can load our first image. Get the images from the course website and unzip them into your working directory, then call:
im = imread('lena.png')
This will load the image as a NumPy array; you can check that with:
type(im)
If we print out the image we see that it is now represented as 32-bit float
s, with 0 being black and 1 being white:
im
To display the image for human eyes we can plot it with matplotlib:
plt.imshow(im)
The default color map might not be what you're expecting. You can set it to grayscale with one of the following calls:
plt.imshow(im, cmap='gray')
# or
plt.imshow(im, cmap=plt.cm.gray)
If you don't want to repeat that for every single plot you can set it once like this:
plt.set_cmap('gray')
# or
plt.set_cmap(plt.cm.gray)
# or even
plt.gray()
plt.imshow(im)
Normally, when dealing with plain Python objects, we'd use len()
to get the size of a list. For NumPy arrays that gives us only the size of the first dimension, so it's not very useful. Instead, use shape
to get the dimensions of the array as a tuple:
len(im)
im.shape
There's also a property called size
that contains the number of elements:
im.size
Now we're going to load a second image. If you run the code below, you might get an error saying that it cannot be loaded. That's because matplotlib on its own only can deal with PNG files, not with TIFFs.
For it to work you'll first need to install a more powerful imaging library called Pillow with pip install Pillow
.
im_tiff = imread('cameraman.tif')
If we again look at the data, we see that this time it's not represented as floating point numbers but as 8-bit integers between 0 and 255:
im_tiff
For many use cases that's perfectly fine, but often we have to be aware of the difference and be able to convert between the formats. One way to do that would be the following function:
def im2float(im):
return np.array(im / 255, dtype='float32')
im2float(im_tiff)
This just divides every value by 255 and casts the resulting double
s down to float32
. Be aware that this assumes Python 3.x; Python 2.x had a different way of doing integer division (cp. PEP 238)
For optimal rescaling results I'd recommend to use a properly tested package like scikit-image for these kinds of operations instead:
import skimage
skimage.img_as_float32(im_tiff)
We'll now load two more images that look very similar, and we'll try to visualize their differences, thereby learning more about NumPy and matplotlib.
im1 = imread('sushi1.png')
im2 = imread('sushi2.png')
It would be helpful to display those two images next to each other; we can do that with subplots
:
fig, ax = plt.subplots(1, 2)
Here we created 2 subplots (1 row, 2 columns). We got two return values from the function call: the Figure
object, and an array of Axes
objects that we can use to plot into:
fig, ax = plt.subplots(1, 2)
ax[0].imshow(im1)
ax[1].imshow(im2)
We can add titles to the subplots but also to the figure itself:
fig, ax = plt.subplots(1, 2)
ax[0].imshow(im1)
ax[1].imshow(im2)
ax[0].set_title("original")
ax[1].set_title("modified")
fig.suptitle("Find the difference")
If we take a look at shape
of one of the new new images, we see that it has one more dimension compared to the grayscale images from before. That's because of its different color channels. Here we used an RGBA image, so it has four channels: red, green, blue, and alpha (transparency):
im1.shape
To first get rid of the alpha channel, we can use array slicing: Take all the rows and all the columns, but only the first three color channels:
im1_no_alpha = im1[0:628, 0:500, 0:3]
im1_no_alpha.shape
If we want the entire range of values over an axis we can omit the indices and only put the "empty" colon operator there, so a better way to write it would be:
im1_no_alpha = im1[:, :, :3]
Alternatively, instead of explicitly selecting the first three channels, we could also use a negative index to select "all but the last" channel, which we can use on the other image to get the same results:
im2_no_alpha = im2[:, :, :-1]
Please be aware that array slices don't copy the data but provide a view onto the original array. So if we were to change any values in im1_no_alpha
, those changes would be reflected in the original im1
.
Now that we've reduced the images to RGB with 3 channels, we can simply take the mean over those channels to get a grayscale image.
NumPy has a function mean
, but by default it will return the single mean value of the array:
np.mean(im1_no_alpha)
If we want the function to be applied over a specific axis, in our case the third one, we have to specify that:
im1_gray = np.mean(im1_no_alpha, axis=2)
im1_gray.shape
For the other image we do the same. The plot shows that we have indeed achieved a grayscale image.
(In this tutorial we're doing it this way to show off some of NumPy's capabilities. In a real world scenario you should probably use something like skimage.color.rgb2gray
instead, which will also do a better job of weighting the three channels.)
im2_gray = np.mean(im2_no_alpha, axis=2)
plt.imshow(im2_gray)
Now we can very easily calculate and plot the difference between the two images:
diff = im1_gray - im2_gray
plt.imshow(diff)
If you're dealing with image diffs, you can always end up with negative values in your result:
(diff.min(), diff.max())
imshow
automatically deals with this by scaling the lightness values appropriately, so the smallest (negative) values will be displayed black, the values close to zero will be gray, and the highest values will be white.
Often, this is exactly what you want. If you want to ignore the negative values instead, you can specify clipping limits: plt.imshow(im, clim=(0, 1))
.
In our example we will deal with the negative values another way: By doing a power transform (i.e. squaring) on the images, we get rid of the small noise values close to zero and map all the values to the positive range at the same time:
diff_squared = diff**2
plt.imshow(diff_squared)
The differences are not very visible here, so we'll also apply a threshold:
diff_thresholded = diff_squared > 0.05
plt.imshow(diff_thresholded)
This actually returns an array with dtype('bool')
, which will help us later.
diff_thresholded
Our next goal is to mark the differences we found as a red overlay on the grayscale image.
For that, we'll have to bring it back to three color channels. One way to to that is with np.tile(A, reps)
which will "Construct an array by repeating A the number of times given by reps."
For that to work, we'll first have to convert the image from a 2D ($x \times y$) to a 3D ($x \times y \times 1$) array. We can use np.reshape
:
xdim, ydim = im1_gray.shape
im1_gray_3d = np.reshape(im1_gray, (xdim, ydim, 1))
im1_gray_3d.shape
Alternatively we can use another form of array slicing which introduces a new axis. If you want to be more explicit you could use np.newaxis
instead of None
:
im1_gray_3d = im1_gray[:, :, None]
im1_gray_3d.shape
Now we can finally use np.tile
to replicate the image on three channels:
diff_rgb = np.tile(im1_gray_3d, (1, 1, 3))
diff_rgb.shape
The final step is to color all the pixels where there's a difference red.
We can use the boolean diff_thresholded
array from before to address all the pixels where it is True
("logical indexing"). Since we don't only want to change one value but all three color values, we again use slicing over the third axis, this time to set the values:
red = [1, 0, 0]
diff_rgb[diff_thresholded, :] = red
plt.imshow(diff_rgb)
On to the final plot!
Instead of just accepting an array of Axes
we can also use "unwrapping" to assign all three Axes
to their own variable like this:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
ax1.imshow(im1)
ax2.imshow(im2)
ax3.imshow(diff_rgb)
The layout of this figure does not really fit. You could use a constrained layout or a tight layout to remedy this.
But why not just make the figure bigger? A figsize
of 20 by 7 inches works well here:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20, 7))
ax1.imshow(im1)
ax2.imshow(im2)
ax3.imshow(diff_rgb)
Now all that's missing is some titles. We could set them line by line as we did before, but once your plots start getting more complex it's a good idea to solve it in a loop. We can put our images and titles in a list:
images = [im1, im2, diff_rgb]
titles = ["original", "modified", "diff"]
Now we can use the in-built Python method zip()
to iterate over our two lists and the Axes
array together as if they were a list of tuples:
fig, axes = plt.subplots(1, 3, figsize=(20, 7))
for ax, im, title in zip(axes, images, titles):
ax.imshow(im)
ax.set_title(title)
fig.suptitle("Found the difference!", fontsize=16)
fig.savefig('sushi_diff.png', bbox_inches='tight')
At the end we set the supertitle to a readable font size and save the figure as a PNG. With bbox_inches='tight'
we suppress the big white borders of the PNG's bounding box.