Autoencoder: Denoise image using UpSampling2D and Conv2DTranspose Layers (Part: 2)
For better understanding, this post is divided into three parts:
Part 1: GAN, Autoencoders: UpSampling2D and Conv2DTranspose
In this introductory part, I will cover fundamental terms and procedures used in this tutorial, enabling us to grasp the concept and comprehend the following sections of this tutorial more effectively.
Part 2: Denoising image with Upsampling Layer
This part will demonstrate how we can use upsampling method for denoising an image from their input. This part will be implemented using the notMNIST dataset.
Part 3: Denoising image with Transposed Convolution Layer
This part is similar to the previous part but I will use transposed convolution for denoising. This part will be covered using the infamous MNIST dataset.
Let’s start …
Part 2: Denoising image with Upsampling Layer
In this part, we will use UpSampling2D
layer to denoise sample images from nonMNIST dataset. First will make noisy images by adding some noise to the dataset and then train our model using these images.
Dataset and related libraries
First, we need to import the necessary libraries for this project. We will use Keras with TensorFlow as a backend.
# importing libraries
import tensorflow as tf
import tensorflow.kerasfrom tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Conv2DTranspose
from tensorflow.keras.constraints import max_normimport matplotlib.pyplot as plt
import numpy as np
import gzip%matplotlib inline
We can download the notMNIST dataset directly from the GitHub repository. We will use this repository for downloading notMNIST dataset.
# importing dataset from github
# link: https://github.com/davidflanagan/notMNIST-to-MNIST! wget -O train-images.gz https://github.com/davidflanagan/notMNIST-to-MNIST/raw/master/train-images-idx3-ubyte.gz! wget -O test-images.gz https://github.com/davidflanagan/notMNIST-to-MNIST/raw/master/train-images-idx3-ubyte.gz! wget -O train-labels.gz https://github.com/davidflanagan/notMNIST-to-MNIST/raw/master/train-labels-idx1-ubyte.gz! wget -O test-labels.gz https://github.com/davidflanagan/notMNIST-to-MNIST/raw/master/t10k-labels-idx1-ubyte.gz
As you can see, all of the downloaded data are in compressed format, in this case .gz
format. We can manually extract those images and use them in our project. But here we will define two functions to extract and load images from this compressed dataset. The first function will extract images and the second function for loading labels. We don’t need labels for training our model but for visualization purposes, we are extracting labels too.
# function for extracting images
def image_data(filename, num_images):
with gzip.open(filename) as f:
f.read(16)
buf = f.read(28 * 28 * num_images)
data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
data = data.reshape(num_images, 28,28)
return data# function for extracting labels
def image_labels(filename, num_images):
with gzip.open(filename) as f:
f.read(8)
buf = f.read(1 * num_images)
labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int64)
return labels
Now let’s import all train and test images.
# import train and test data
train_data = image_data(‘train-images.gz’, 60000)
test_data = image_data(‘test-images.gz’, 10000)# import train and test labels
train_labels = image_labels(‘train-labels.gz’, 60000)
test_labels = image_labels(‘test-labels.gz’, 10000)
After loading, we can check the shape of train and test images:
train_data.shape, test_data.shape
Output:
((60000, 28, 28), (10000, 28, 28))
So in our train dataset, we have 60000 images and in test samples, we have 10000 images. With our predefined function, they are already converted in NumPy array.
img_width, img_height = 28, 28
input_shape = (img_width, img_height, 1)
batch_size = 120
no_epochs = 50
validation_splits = 0.2
max_norm_value = 2.0
noise_factor = 0.5
Here we defined some variables for model building purpose. These variables will give us flexibility and help us to fine-tune our model and data in the future. Now we can proceed to the next step.
Exploring Dataset
NotMINST dataset labels each image as 1 to 9 digits which represent A to J letters respectively. So we will define a dictionary and label them with the original letter.
label_dict = {i: a for i,a in zip(range(10), ‘ABCDEFGHIJ’)}
Now we can visualize some random characters with their labels:
for i in range(num_viz):
digits = [[train_data[idx], train_labels[idx]] for idx in
np.random.randint(len(train_data), size=10)]
plt.figure(figsize=(len(digits), 1))
for i, data in enumerate(digits):
plt.subplot(1, len(digits), i+1)
plt.imshow(data[0])
plt.title(label_dict[data[1]])
plt.xticks([])
plt.yticks([])
plt.show()
This dataset is a collection of single-channel greyscale images. It is better to scale each image in[0,1]
range because they are easier to deal with. This process is known as “normalization” or “transformation” and is part of the feature engineering process.
# scaling train and test images
train_data = train_data.reshape(-1, 28,28, 1)
test_data = test_data.reshape(-1, 28,28, 1)train_data = train_data / np.max(train_data)
test_data = test_data / np.max(test_data)
Generating Noisy Images
For training, we need noisy images. We need to draw random noise sample from a normal (Gaussian) distribution. We can use NumPy function np.random.normal
for this purpose. Here noise_factor
determines how much noise we want to add to our image. As defined previously, we will use noise_factor = 0.5
in this example.
# create noisy image from dataset
noise_train = train_data + noise_factor *
np.random.normal(0,1,train_data.shape)
noise_test = test_data + noise_factor *
np.random.normal(0, 1, test_data.shape)
Let’s check noisy images against their original images:
# some random images for visualization
fig, ax = plt.subplots(1,15)
fig.set_size_inches(20, 4)
for i in range(15):
curr_img = np.reshape(train_data[i], (28,28))
curr_lbl = train_labels[i]
ax[i].imshow(curr_img)
#ax[i].set_title(f’Label: {label_dict[curr_lbl]}’)
ax[i].set_xticks([])
ax[i].set_yticks([])
plt.show()fig, ax = plt.subplots(1,15)
fig.set_size_inches(20, 4)
for i in range(15):
curr_img = np.reshape(noise_train[i], (28,28))
curr_lbl = train_labels[i]
ax[i].imshow(curr_img)
#ax[i].set_title(f’Label: {label_dict[curr_lbl]}’)
ax[i].set_xticks([])
ax[i].set_yticks([])
plt.show()
Output:
Seems they are noisy enough to train our model. Now we will define our model parameters.
Defining Model
Our model consists of several Conv2D and two UpSampling2D layers. We will use MaxPooling2D in our encoder part. Later we added one Conv2D layer for output. Layer parameters are self-explanatory and easily understandable.
# model layers for autoencoder
model = Sequential()model.add(Conv2D(32, (3, 3),
activation='relu',
kernel_initializer='he_uniform',
padding='same',
input_shape=input_shape))model.add(MaxPooling2D((2, 2), padding='same'))model.add(Conv2D(64, (3, 3),
activation='relu',
kernel_initializer='he_uniform',
padding='same'))model.add(MaxPooling2D((2, 2), padding='same'))model.add(Conv2D(128, (3, 3),
activation='relu',
kernel_initializer='he_uniform',
padding='same'))model.add(Conv2D(128, (3, 3),
activation='relu',
kernel_initializer='he_uniform',
padding='same'))model.add(UpSampling2D((2, 2), interpolation='bilinear'))model.add(Conv2D(64, (3, 3),
activation='relu',
kernel_initializer='he_uniform',
padding='same'))model.add(UpSampling2D((2, 2), interpolation='bilinear'))model.add(Conv2D(1, (3, 3),
activation='sigmoid',
padding='same'))model.summary()
Output:
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_28 (Conv2D) (None, 28, 28, 32) 320 _________________________________________________________________ max_pooling2d_6 (MaxPooling2 (None, 14, 14, 32) 0 _________________________________________________________________ conv2d_29 (Conv2D) (None, 14, 14, 64) 18496 _________________________________________________________________ max_pooling2d_7 (MaxPooling2 (None, 7, 7, 64) 0 _________________________________________________________________ conv2d_30 (Conv2D) (None, 7, 7, 128) 73856 _________________________________________________________________ conv2d_31 (Conv2D) (None, 7, 7, 128) 147584 _________________________________________________________________ up_sampling2d_9 (UpSampling2 (None, 14, 14, 128) 0 _________________________________________________________________ conv2d_32 (Conv2D) (None, 14, 14, 64) 73792 _________________________________________________________________ up_sampling2d_10 (UpSampling (None, 28, 28, 64) 0 _________________________________________________________________ conv2d_33 (Conv2D) (None, 28, 28, 1) 577 ================================================================= Total params: 314,625
Trainable params: 314,625
Non-trainable params: 0 _________________________________________________________________
The dot plot of this model shows the structure for our model.
We will use adam
as an optimizer and binary_crossentropy
as a loss function. We will use 50 epochs for this training.
# compiling and fitting model
model.compile(optimizer=’adam’,
loss = ‘binary_crossentropy’)
model.fit(noise_train,
train_data,
validation_split= validation_splits,
epochs=no_epochs,
batch_size=batch_size)
Prediction and Visualization
Now we can predict some test samples and visualize them.
# model prediction
fig_samples = noise_test[:10]
fig_original = test_data[:10]
fig_denoise = model.predict(fig_samples)
Let’s compare our predicted denoised images with original test images for better understanding.
# output visualization
for i in range(0, 5):
noisy_img = noise_test[i]
original_img = test_data[i]
denoise_img = fig_denoise[i]
fig, axes = plt.subplots(1, 3)
fig.set_size_inches(6, 2.8)
axes[0].imshow(noisy_img.reshape(28, 28))
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[0].set_title('Noisy')
axes[1].imshow(original_img.reshape(28, 28))
axes[1].set_xticks([])
axes[1].set_yticks([])
axes[1].set_title('Original')
axes[2].imshow(denoise_img.reshape(28, 28))
axes[2].set_xticks([])
axes[2].set_yticks([])
axes[2].set_title('Denoised')plt.show()
Output
The output is pretty outstanding in this case. Our generated images are quite similar to original images. We can fine-tune and try to make it better for other datasets also. This is the basic concept for this model and it's up to the user how they want to play with them.
Hope you get the idea of autoencoder and denoising images. We will develop another model using Conv2DTranspose
layer using different datasets in the next part of the tutorial.
All code samples for this part can be found here: Colab Link
🅽🅴🆇🆃 ⫸ Part 3: Denoising image using Transposed Convolution Layer
Happy coding!