The basics of Autoencoders

Unsupervised learning techniques have been increasingly becoming popular. One such technique is the use of Autoencoders. In this tutorial, we will learn the basics of these autoencoders.

An autoencoder is basically a “bottleneck” architecture that consists of an encoder, code and a decoder. It works by converting input data to a smaller sized representation and then using the smaller sized representation to reconstruct the input data. The flow of activities in an autoencoder is from input to internal representation (code) to output. We can look at the following diagram in order to have a visualization of how this works.

What does bottleneck mean?

The “bottleneck” is a set of constraints imposed on the encoder such that it is forced to represent the input using limited resources. For example, given an image with 100 pixels, a constraint can be imposed so that the encoder is forced to represent the image using 50 pixels, instead of 100. This forces the autoencoder to learn a different way to represent the input data. The result of passing input data through the bottleneck is a compressed version of the input. This can also be referred to as dimensionality reduction.

Components of an autoencoder

We now already know that an autoencoder consists of an encoder, internal representation and decoder, but what are their purposes?

The encoder uses a compression technique to convert high dimensional input to a low dimensional internal representation.

The code is the compressed input, i.e. the internal representation.

The decoder then reconstructs the input from the low dimensional representation.

However, the output is not exactly the same as the input, it is degraded. This is a result of the loss that occurs during dimensionality reduction stage (bottleneck), where it only retains the important features and ignores noise.

How is this implemented?

Autoencoders are implemented using artificial neural networks. In fact, they are actually neural networks! They use unsupervised learning to learn the underlying representation of data using multiple unlabelled examples. A simple autoencoder can be visualized as a typical traditional neural network.

The input layer is the high-dimensional input. This is the input to the encoder or hidden layer. The output of the encoder is the low dimensional code or compressed version of the input. This then becomes the input to the decoder, which uses it to reconstruct the input.

With weights, W, bias, b and input, x, the idea is to learn a function, h_(W,b)(x) ≈ x such that it outputs y = x’ ≈ x. It uses backpropagation to update the weights (W) during training so that outputs are equal to inputs, i.e. y⁽ⁱ⁾ ≈ x⁽ⁱ⁾.

Uses of Autoencoders

Autoencoders are mostly used for image denoising, dimensionality reduction and in generative models. Although they perform compression, they cannot be used the same way normal compression methods are used. They lose information, which is different from lossless compression techniques. They also cannot be generalized for other applications. They are data-specific. Even if an autoencoder is trained on images, when given images of different classes of objects, it will perform poorly. An autoencoder trained on images of cats will perform poorly on images of houses.

Types of autoencoders

There various types of autoencoders which include:

Simple autoencoders
Sparse autoencoders
Deep fully connected autoencoder
Convolutional autoencoder
Sequence to sequence autoencoder
Variational autoencoder

Let’s create an autoencoder

Before we create an autoencoder, let us first identify some important concepts:

A smaller encoding layer results in more compression of the input data
Autoencoders can have many dense layers
The number of layers decreases as we move from input layer up to the compressed code and then starts to increase up to the output layer
The input and output layers have the same dimension

Simple autoencoder

A widely used dataset in tutorials for autoencoders is the mnist dataset. We are also going to use it in this tutorial. We will start with a simple autoencoder with a fully connected neural network layer.

We start with importing the neural network layers:

import keras
from keras import layers

It’s always good practice to place all imports in one place. So, we will also import the mnist data and load it.

from keras.datasets import mnist
import numpy as np

#load data
(x_train, _), (x_test, _) = mnist.load_data()

We only import x_train and x_test without the labels i.e. y_train and y_test because we do not need labels for unsupervised learning. Now, we have to create the autoencoder. First, we determine the size we want our encoded dimensions to be. We can try 32:

encoding_dim = 32  #32 floats

We define a matrix with the size of our input images:

input_img = keras.Input(shape=(784,))

Now, we define the encoded representation of the input:

encoded = layers.Dense(encoding_dim, activation='relu')(input_img)

This line takes an input with the size of the input_img and outputs into the variable encoded, a representation with the size of the encoding_dim, using the relu activation function.

Now, we define the lossy reconstruction of the input:

decoded = layers.Dense(784, activation='sigmoid')(encoded)

This line takes an input with the size of encoded and outputs into the variable decoded, a representation with the size of 784 (which is the size of the input), using the sigmoid activation function.

We also need to define a model that maps an input to its reconstruction:

autoencoder = keras.Model(input_img, decoded)

Now, we define the encoder and decoder. First, the encoder:

encoder = keras.Model(input_img, encoded)

It maps an input image to its encoded representation.

Then, the decoder:

Just like for the encoder, we need to define a matrix with the size of our compressed images. We also need to retrieve the last layer of the autoencoder model, i.e. the decoder layer:

encoded_input = keras.Input(shape=(encoding_dim,))
decoder_layer = autoencoder.layers[-1]
decoder = keras.Model(encoded_input, decoder_layer(encoded_input))

Now, we compile compile our model:

autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

We need to normalize the input data and flatten it:

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

We can view the shape of our data:

print(x_train.shape)
print(x_test.shape)

Now, we can train our autoencoder. We need to set the number of epochs, batch size and other parameters if you want. We could have separated the data into training, testing and validation sets, but for this tutorial, we used only training and testing, therefore, we used the training data as the validation set. You can also try this using the three splits.

autoencoder.fit(x_train, x_train, epochs=50, batch_size=256,
                 shuffle=True, validation_data=(x_test, x_test))

Now, we want to visualize the original images and the reconstructed input. We create the compressed representation using the encoder, and then recreate the input using the decoder:

encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)

We will use the library matplotlib.pyplot for plotting. We can keep the import down here, but it is always good practice to place it at the beginning with the other imports.

import matplotlib.pyplot as plt

n = 10 # How many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
    # Display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

This is the result:

The first row is the original and the second row is the reconstructed input. We can see that the output is degraded. There are ways of improving this, however, this is all about the basics, so we will stick to that.

Conclusion

There are various ways to improve the simple autoencoder. The other types of autoencoders above are improvements of the simple autoencoder. We can add more layers, add dropouts or regularization or use a convolutional neural network instead of an artificial neural network. Hopefully, in the next tutorials, we can explore these improvements and other types of autoencoders.