Stacked autoencoders are a commonly used deep learning model, consisting of multiple autoencoders stacked in series. The purpose of stacking multiple autoencoders is to extract higher-order features from the input data layer by layer. In this process, the dimensionality of the input data is reduced layer by layer, transforming a complex input into a series of simpler higher-order features. These higher-order features are then input into a classifier or clusterer for classification or clustering.
1. Automatic encoder models and classification
An autoencoder (AE) is a feedforward, non-returning neural network with an input layer, a hidden layer, and an output layer. A typical autoencoder structure is shown in Figure 1. The input layer takes X as input and the output layer takes Z as output. The layers are mapped using a sigmoid activation function.
Figure 1. Typical structure of an automatic encoder
The mapping from the input layer to the hidden layer can be viewed as an encoding process, where the output vector x is mapped to the hidden layer output y through a mapping function f. The process from the hidden layer to the output layer is equivalent to a decoding process, where the hidden layer output y is mapped back through a mapping function g to "reconstruct" the vector z. For each input sample x(i), after passing through the autoencoder , it will be transformed into a corresponding output vector z(i) = g[f(x(i))]. When the autoencoder is trained, the input X and output Z are exactly the same, so the output of the corresponding hidden layer can be regarded as an abstract expression of the input X, and therefore it can be used to extract features from the input data. In addition, because its hidden layer nodes are fewer than the number of input nodes, the autoencoder can also be used for dimensionality reduction and data compression. In terms of training the network parameters, the autoencoder uses the backpropagation method for training, but the autoencoder requires a large number of training samples, and as the network structure becomes more complex, the computational cost also increases.
Improving the autoencoder structure yields other types of autoencoders, most notably sparse autoencoders and denoising autoencoders. A denoising autoencoder (DAE) partially "destroys" the input data and then reconstructs the original input data by training the autoencoder model, thus improving the autoencoder's robustness. This "destruction" of the input data is analogous to adding noise. A sparse autoencoder, on the other hand, adds a regularization term, constraining most of the hidden layer neurons to output 0, with only a small portion outputting non-zero values. Sparse encoders significantly reduce the number of parameters that need to be trained, lowering the training difficulty, while overcoming the problems of autoencoders easily getting trapped in local minima and overfitting. A denoising autoencoder uses noisy input data to train the network parameters, improving the autoencoder's generalization ability.
2. SAD stacking process
The stacked autoencoder method is the same as DBN, and the specific process is described as follows: (1) Given an initial input, train the first-layer autoencoder in an unsupervised manner to reduce the reconstruction error to a set value. (2) Use the output of the hidden layer of the first autoencoder as the input of the second autoencoder and train the autoencoder in the same way as above. (3) Repeat step 2 until all autoencoders are initialized. (4) Use the output of the hidden layer of the last stacked autoencoder as the input of the classifier, and then train the parameters of the classifier in a supervised manner. Figure 2 shows the generation process of the stacked autoencoder with three AD layers.
Figure 2. Generation process of stacked three-layer autoencoder