Figure: Definition of Randomness (Compagner 1991, Delft University) |
- Network architecture (topology).
- Learning algorithm.
- Data sizes and type.
- Training procedure.
Why not Hessian or loss-landscape but Weight matrices?
There are studies taking Hessian matrix as a major object, i.e., second derivative of parameters as a function of loss of the network and associate this to random matrices. However, this approach would only covers learning algorithm properties rather than architectures inference or learning capacity. For this reason, weight matrices should be taken as a primary object in any studies of random matrix theory in deep learning as they encode depth in deep learning. Similarly, loss-landscape can not capture the capacity of deep learning.
Conclusion and outlook
Further Reading
Papers introducing new mathematical concepts in deep learning are listed here, they come with associated Python codes for reproducing the concepts.
- Spectral Ergodicity in Deep Learning Architectures via Surrogate Random Matrices
- Periodic Spectral Ergodicity: A Complexity Measure for Deep Neural Networks and Neural Architecture Search
- Equivalence in Deep Neural Networks via Conjugate Matrix Ensembles
Citing this post
Glossary of New Mathematical Concepts of Deep Learning
Summary of the definition of new mathematical concepts for new matrix mathematics.
Spectral Ergodicity Measure of ergodicity in spectra of a given random matrix ensemble sizes. Given set of matrices of equal size that are coming from the same ensemble, average deviation of spectral densities of individual eigenvalues over ensemble averaged eigenvalue. This mimic standard ergodicity, instead of over states of the observable, it measures ergodicity over eigenvalue densities. $\Omega_{k}^{N}$, $k$-th eigenvalue and matrix size of $N$.
Spectral Ergodicity Distance A symmetric distance constructed with two Kullback-Leibler distances over two different size matrix ensembles, in two different direction. $D = KL(N_{a}|N_{b})+ KL(N_{b}|N_{a})$
Mixed Random Matrix Ensemble (MME) Set of matrices constructed from a random ensemble but with difference matrix sizes from N to 2, sizes determined randomly with a coefficient of mixture.
Periodic Spectral Ergodicity (PSE) A measure of Spectral ergodicity for MMEs whereby smaller matrix spectrum placed in periodic boundary conditions, i.e., cyclic list of eigenvalues, simply repeating them up to N eigenvalues.
Layer Matrices Set of learned weight matrices up to a layer in deep learning architecture. Convolutional layers mapped into a matrix, i.e. stacked up.
Cascading Periodic Spectral Ergodicity (cPSE) Measuring PSE over feedforward manner in a deep neural network. Ensemble size is taken up-to that layer matrices.
Circular Spectral Deviation (CSD) This is a measure of fluctuations in spectral density between two ensembles.
Matrix Ensemble Equivalence If CSDs are vanishing for conjugate MMEs, they are said to be equivalent.
Appendix: Practical Python Example
Complexity measure for deep architectures and random matrix ensembles: cPSE.cpse_measure_vanilla Python package Bristol (>= v0.2.12) has now a support for computing cPSE from a list of matrices, no need to put things in torch model format by default.
!pip install bristol==0.2.12
An example case:
from bristol import cPSE
import numpy as np
np.random.seed(42)
matrices = [np.random.normal(size=(64,64)) for _ in range(10)]
(d_layers, cpse) = cPSE.cpse_measure_vanilla(matrices)
d_layers is decreasing vector, it will saturate at some point, that point is where adding more
layers won’t improve the performance. This is data, learning or architecture independent measure.
Only a French word can explain the excitement here: Voilà!
No comments:
Post a Comment