An iconic paper dealing with autoencoders and layer by layer pretraining of deep neural networks. Layer by layer pretraining of a deep neural network allows finding a good initialization for the subsequent backprop fine tuning of the hole network. It nicely concludes "It has been obvious since the 1980s that backpropagation through deep autoencoders would be very effective for nonlinear dimensionality reduction, provided that computers were fast enough, data sets were big enough, and the initial weights were close enough to a good solution. All three conditions are now satisfied.". ### Scales quadratically with the dimensionality of the input? I read: "autoencoders [...] can be applied to very large data sets because both the pretraining and the fine-tuning scale linearly in time and space with the number of training cases." Indeed it does scale linearly with the number of training cases. However, I think that the method scales quadratically with the dimensionality of the input data as the two first layers seem to have similar sizes. Is there any way to adapt the method for very high dimensional data? Indeed, if the input dimensionality is 1M, then the method may not scale as it will require a number of connections in the order of $(1M)^2$. Is it ok if the second layer is already small compared to the dimensionality of the input data? Or is it ok if the connections between the first two layers are sparse (not all connections are there, just a linear number of them)?
Use $\LaTeX$ to type formulæ and markdown to format text.