Autoencoder: Denoise image using UpSampling2D and Conv2DTranspose Layers (Part: 1)

Photo by Bekky Bekks on Unsplash

For better understanding, this post is divided into three parts:

Part 1: GAN, Autoencoders: UpSampling2D and Conv2DTranspose

In this part, introductory part and I will discuss some basic terms and processes used in this tutorial. This will help us to get the concept and better understand the other parts of this tutorial.

Part 2: Denoising image with Upsampling Layer

This part will demonstrate how we can use upsampling method for denoising an image from their input. This part will be implemented using the notMNIST dataset.

Part 3: Denoising image with Transposed Convolution Layer

This part is similar to the previous part but I will use transposed convolution for denoising. This part will be covered using the infamous MNIST dataset.

Let’s start …

GAN (Generative Adversarial Network)

GANs were designed by Ian Goodfellow and other researchers at the University of Montreal in 2014. GAN modeling is an unsupervised learning process in machine learning that involves two sub-models, the generator model that trains to generate new examples, and the discriminator model that tries to evaluate examples as real or generated. The process operates in terms of data distributions. Typically, the generative network learns to map from a latent space to a data distribution of interest, while the discriminative network distinguishes candidates produced by the generator from the true data distribution. Two neural networks contest with each other in a game, the generative network’s training objective is to increase the error rate of the discriminative network by producing real candidates that the discriminator thinks are not generated.

Yann Lecun about GAN


Basic Autoencoder ( Source )

The simplest form of an autoencoder is a feedforward, non-recurrent neural network similar to single-layer perceptrons that participate in multilayer perceptrons (MLP). It may consist of an input layer and an output layer connected by one or more hidden layers. The output layer has the same number of nodes (neurons) as the input layer. Its purpose is to reconstruct its inputs (minimizing the difference between the input and the output). So autoencoders are unsupervised learning models without any labeled input data.

Two common types of layers that can be used in the generator model are a upsample layer (UpSampling2D) that simply doubles the dimensions of the input by using the nearest neighbor or bilinear upsampling and the transpose convolutional layer (Conv2DTranspose) that performs a convolution upscale operation by learning details in the training process, similar to the regular Conv2D layer

1. UpSampling2D

A simple version of an unpooling or opposite pooling layer is called an upsampling layer. It works by repeating the rows and columns of the input. Multiple layers can be used on a GAN to perform the required upsampling operation to transform a small input into a large image output.

For example, an image with 3x3 pixel as input can output 9x9 pixel in upscaling layer. We can define the interpolation method to fill in the new rows and columns. By default, the UpSampling2D layer will use the nearest neighbor algorithm to fill in the new rows and columns. This interpolation method will simply double rows and columns. Similarly, a bilinear interpolation method can be used to upscale new rows and columns. Bilinear interpolation replaces each missing pixel with a weighted average of the nearest pixels.

2. Conv2DTranspose

The Conv2DTranspose layer, which takes images as input directly and outputs the result of the operation. The Conv2DTranspose both upsamples and performs a convolution. So we must specify the number of filters and the size of the filters as we do for Conv2D layers and a stride size because the upsampling is achieved by the stride behavior of the convolution on the input.

Transposed Convolutions are the backbone of modern segmentation and super-resolution algorithms. They provide the best and most generalized upsampling of abstract representations.


1. MNIST Dataset

The MNIST database contains 60,000 training images and 10,000 testing images. Half of the training set and half of the test set were taken from NIST’s training dataset, while the other half of the training set and the other half of the test set were taken from NIST’s testing dataset.

MNIST dataset sample

2. notMNIST Dataset

notMNIST dataset sample

“Judging by the examples, one would expect this to be a harder task than MNIST. This seems to be the case — logistic regression on top of stacked auto-encoder with fine-tuning gets about 89% accuracy whereas same approach gives got 98% on MNIST. Dataset consists of small hand-cleaned part, about 19k instances, and large uncleaned dataset, 500k instances. Two parts have approximately 0.5% and 6.5% label error rate. I got this by looking through glyphs and counting how often my guess of the letter didn’t match it’s unicode value in the font file.”

— Yaroslav Bulatov

I think that’s enough for the theory, Now we will dive into our coding parts.

🅽🅴🆇🆃 ⫸ Part 2: Denoising image with Upsampling Layer

Happy coding!

Data Science Enthusiast