Wednesday, 28 July 2021

Deep Learning in Mind a Gentle Introduction to Spectral Ergodicity

Preamble

 Figure: Monalisa on Eigenvector grids (Wikipedia)

In the post, A New Matrix Mathematics for Deep Learning : Random Matrix Theory of Deep Learning, we have outlined a new mathematical concepts that are aimed at deep learning but in general belonging to applied mathematics. Here, we dive into one of the concepts,  spectral ergodicity. We aimed at conveying what does it mean and how to compute spectral ergodicity for a set of matrices, i.e., ensemble. We will use a visual aid and verbal descriptions of steps to produce a quantitative measure of spectral ergodicity.

The idea of spectral ergodicity comes from quantum statistical physics but it is recently revived for deep learning as a new concept in order to accommodate mathematical needs of explaining and understanding the complexity of deep learning architectures.

Understanding Spectral Ergodicity

The concept of ergodicity can get quiet mathematical even for a professional mathematician.  A practical understanding of ergodicity  could lead to the law of large numbers statistically speaking. However, observed ergodicity for ensemble of matrices, i.e. over their eigenvalue spectrum, are not formally defined before in the literature, and only appeared in statistical quantum mechanics in a specialised case.  Here we do a formal definition gently.

The spectral ergodicity of snapshot of values from $M$ matrices, where they are $N \times N$ sizes,  denoted by $\Omega$, can be produce with the following steps:
1. Compute eigenvalues of $M$ matrices separately.
2. Produce equidistance spectra of matrices out of eigenvalues, i.e., histograms with $b_{k}$ bins. Each cell in the Figure corresponds to bin in the spectra of the matrices.
3. Compute average values over each bin across  $M$ matrices.
4. Computing root mean square deviation that went to each bin from $M$ matrices from corresponding ensemble averaged value and average over $M$ and $N$. This will give a distribution, $\Omega=\Omega(b_{k})$, which represents spectral ergodicity value, think as a snapshot value of a dynamical process.
Attentive reader would notice that normally, measures of ergodicity leads to a single value, such as in spin-glasses, but here we obtain ergodicity as a measure distribution. This stems from the fact that our observable is not univariate but it is a multivariate measure over spectra of the matrix, i.e., bins in the histogram of eigenvalues.

Why spectral ergodicity important for deep learning?

The reason why this measure is so important lies in dynamics and consistency in measuring observables (no nothing to do with quantum mechanics but time and ensemble averages classically). Normally we can't measure ensemble averages. In experimental conditions the measurement we do is usually a time averaged value. This is exactly what happens when we train deep neural network, i.e, ergodicity of weight matrices. Essentially, spectral ergodicity would capture deep neural network's characteristics.
Outlook

The way we express spectral ergodicity here would only consider all layer having the same size.  One would need a more advanced computation of spectral ergodicity for more realistic architectures, which is called cascading Periodic Spectral Ergodicity measure suitable as a complexity measure for deep learning.  The computation of such measure is more involved and spectral ergodicity we cover here is the first step.

Cite this post with  Deep Learning in Mind Very Gentle Introduction to Spectral Ergodicity, Mehmet Süzen, (2021) https://science-memo.blogspot.com/2021/07/deep-learning-random-matrix-theory-spectral-ergodicity.html