In-depth understanding of autoencoders (with code implementation)

Autoencoders can be considered a type of data compression algorithm or feature extraction algorithm. This article, authored by Nathan Hubens, introduces the basic architecture of autoencoders. It begins by introducing the concepts of encoders and decoders, then discusses "what can autoencoders do?", and finally explains four different types of autoencoders: ordinary autoencoders, multilayer autoencoders, convolutional autoencoders, and regularized autoencoders.

Deepinside:Autoencoders

Autoencoders are neural networks that attempt to make the output identical to the input. They work by compressing the input into a hidden space representation and then reconstructing the output from this representation. Such networks consist of two parts:

1. Encoder: This is part of an autoencoder network, and its function is to transform the input into a hidden spatial representation. It can be represented by an encoding function h=f(x).

2. Decoder: This part aims to reconstruct the input from the representation in the hidden space. It can be represented by the decoding function r = g(h).

An autoencoder as a whole can be described by the function g(f(x)) = r, where r is close to the original input x.

Why copy the input to the output?

If the sole purpose of autoencoders is to copy the input to the output, then they are useless. In fact, we hope that by training the autoencoder to copy the input to the output, the hidden representation h will have useful properties.

This can be achieved by adding some constraints to the copying task. One way to obtain useful features from an autoencoder is to restrict h to a dimension less than x, in which case the autoencoder is incomplete. By training the incomplete representation, we force the autoencoder to learn the most salient features of the training data. If the autoencoder's capacity is too large, it can perform the assignment task excellently without extracting any useful information from the data distribution. This also occurs if the hidden representation has the same dimension as the input, or if the hidden representation's dimension is greater than the input's dimension. In these cases, even linear encoders and decoders can copy the input to the output without knowing any useful information about the data distribution. Ideally, autoencoders can be successfully trained on any architecture, with the code dimension and capacity of the encoder and decoder chosen based on the complexity of the assignment to be performed.

What can a self-encoder be used for?

Currently, data denoising and dimensionality reduction in data visualization are considered two major practical applications of autoencoders . With appropriate dimensionality and sparsity constraints, autoencoders can learn more interesting data projections than PCA or other basic techniques.

Autoencoders learn automatically from data samples. This means it is easy to train specific algorithm instances that perform well on specific types of inputs, without requiring any new engineering, only appropriate training data.

However, autoencoders are not good at image compression. Because autoencoders are trained on a given set of data, they will achieve reasonable compression results on data similar to the training set used, but they are not good as image compressors. Compression techniques like JPEG perform much better than autoencoders.

Autoencoders are trained to retain as much information as possible after the input passes through the encoder and decoder, but they are also trained to give the new representation a variety of desirable properties. Different types of autoencoders are designed to achieve different types of properties. We will focus on four types of autoencoders.

Types of autoencoders:

This article will introduce the following four types of autoencoders:

1. Ordinary self-encoder

2. Multilayer self-encoder

3. Convolutional Autoencoder

4. Regularized autoencoder

To demonstrate the different types of autoencoders, I created examples of each type of autoencoder using the Keras framework and the MNIST dataset.

Common self encoder

A typical autoencoder is a three-layer network, i.e., a neural network with one hidden layer. The input and output are the same, and we will learn how to reconstruct the input, for example, using the Adam optimizer and the mean squared error loss function.

Here we see that we have an undercomplete autoencoder because the hidden layer dimension (64) is smaller than the input (784). This constraint will force our neural network to learn a compressed data representation.

Multilayer self-encoder

If one hidden layer is not enough, we can obviously extend the autoencoder to more hidden layers.

Our implementation now uses three hidden layers instead of one. Any hidden layer can serve as a feature representation, but we will make the network structure symmetrical and use the middle hidden layer.

Convolutional autoencoder

We might also ask ourselves: Can autoencoders be used for convolutional layers instead of fully connected layers?

The answer is yes, the principle is the same, but an image (3D vector) is used instead of a flat 1D vector. The input image is downsampled to provide a smaller hidden representation and forces the autoencoder to learn a compressed version of the image.

Regularized autoencoder

There are other methods to constrain the reconstruction of an autoencoder, rather than simply imposing a hidden layer with a dimension smaller than the input. Instead of limiting model capacity by adjusting the encoder and decoder, regularized autoencoders use a loss function that encourages the model to learn properties beyond simply copying the input to its output. In practice, we typically find two types of regularized autoencoders: sparse autoencoders and denoising autoencoders.

Sparse autoencoders: Sparse autoencoders are commonly used to learn features for other tasks such as classification. A sparse autoencoder must respond to unique statistical features of the dataset, rather than simply acting as a labeling function. In this way, training with sparsity penalties to perform a copying task can produce useful feature models.

Another way we can constrain the autoencoder's reconstruction is by imposing constraints on the loss function. For example, we can add a correction term to the loss function. Doing so will cause our autoencoder to learn a sparse representation of the data.

Note that in our regularization term, we added an L1 activation function regularizer, which will apply a penalty to the loss function during the optimization phase. As a result, the representation is now more sparse compared to a normal autoencoder.

Denoising Autoencoders: Instead of penalizing the loss function, we can obtain an autoencoder that learns something useful by changing the reconstruction error term of the loss function. This can be done by adding some noise to the input image and training the autoencoder to remove the noise. In this way, the encoder will extract the most important features and learn a more robust representation of the data.

Summarize

In this paper, we introduce the basic architecture of autoencoders. We also examine many different types of autoencoders: ordinary autoencoders, multilayer autoencoders, convolutional autoencoders, and regularized autoencoders. Depending on the constraints (reducing the size of the hidden layers or imposing other types of penalty terms), different properties can be learned for encoding.

In-depth understanding of autoencoders (with code implementation)

Read next

CATDOLL 136CM Vivian (Customer Photos)

CATDOLL Laura Hybrid Silicone Head

CATDOLL CATDOLL 115CM Shota Doll Nanako (Customer Photos)

CATDOLL 136CM Sasha