Scientific Memo

Showing posts with label simulations. Show all posts

Saturday, 29 February 2020

Freeman Dyson's contribution to deep learning: Circular ensembles mimics trained deep neural networks

In memory of Professor Dyson, also see the paper Equivalence in Deep Neural Networks via Conjugate Matrix Ensembles

Preamble

Dyson 2007 (Wikipedia)

Freeman Dyson was a polymath scientist: theoretical physicist, mathematician and visionary thinker among others. In this post, we will briefly summarise his contribution to deep learning, i.e., deep neural networks. Obscure usage of his circular ensembles as a simulation tool in conjunction with the concept of ergodicity explained why deep learning systems learn in such high accuracy.

A simulation tool for deep learning: Circular (Random Matrix) Ensembles

Circular ensembles [1,2,3] developed by Dyson in 1962 for explaining quantum statistical mechanics systems as a modification of basic random matrix theory. Circular ensembles can be used in simulating deep learning architectures [4]. Basically, his three ensembles can be used to generate a "trained deep neural network". It is shown by myself with colleagues from Hamburg and Mallorca that using Dyson's ensembles generated networks, deeper they are so-called spectral ergodicity goes down [4], this is recently proved on real networks as well [5].

How to generate a simulated trained deep neural network in Python

Using Bristol python package [6] one could generate a set of weight matrices corresponding to each layer connections, i.e., weight matrices. A simple example, using Circular Unitary Ensemble (CUE), let's say we have 4 hidden layers of 64, 64, 128, 256 units. This would generate learned weight matrices of sizes 64x64, 64x128 and 128x256, One possible trained network weights can be generated: Note that we make non-square ones by simple multiplying by its transpose.

from bristol.ensembles import circular
ce = circular()
seed_v   = 997123
W1 = ce.gue(64, set_seed=True, seed=seed_v)
W2 = ce.gue(128, set_seed=True, seed=seed_v)
W3 = ce.gue(256, set_seed=True, seed=seed_v)

These are complex matrices, one could take the arguments or use them as it is if only eigenvalues are needed. An example of a trained network generation can be found in Zenedo. One can use any one of the circular ensembles.

Conclusion

Dyson's contributions are so bright that even his mathematical tools appear in modern deep learning research. He will be remembered many generations to come as a bright scientist and a polymath.

References

[1] Freeman Dyson, Journal of Mathematical Physics 3, 1199 (1962) [link]
[2] Michael Berry, New Journal of Physics 15 (2013) 013026 [link]
[3] Mehmet Süzen (2017), Summary Notebook on Circular ensembles [link]
[4] Spectral Ergodicity in Deep Learning Architectures via Surrogate Random Matrices,
Mehmet Süzen, Cornelius Weber, Joan J. Cerdà, arXiv:1704.08693 [link]
[5] Periodic Spectral Ergodicity: A Complexity Measure for Deep Neural Networks and Neural Architecture Search,
Mehmet Süzen, Cornelius Weber, Joan J. Cerdà, arXiv:1911.07831 [link]
[6] Bristol Python package [link]

Tuesday, 9 April 2013

Matrix Cumulative Coherence: Fourier Bases, Random and Sensing Matrices

Compressive sampling (CS) is revolutionizing the way we process analog to digital conversion, our understanding of linear systems and the limits of information theory. One of the key concept in CS is that a signal can be represented in a sparse bases or it is already sparse. The novelty of sparsity bases is that when signal is randomly sampled, in a nutshell it can be recovered (or a solution can be found to a linear problem) with fewer samples. A land mark example is being sparse MRI.

The concept of choosing "more" sparse bases or CS sensing matrix lies in the measure of mutual coherence. It is defined as follows for a given matrix at order k.
$$ M(A, k) = max_{p} max_{p \ne q, q \in \Omega } \sum_{q} | <a_{p}, a_{q}> | / ( |a_{p}| |a_{q}|)$$ When k=1, it is easy to understand what it means. Basically we find the largest inner product among columns of the given matrix. Lower the value better the sparsity i.e. incoherence. However a single number may not be so informative, after all how low is better. With the definition of David Donoho and Joel Tropp, if M is slowly increasing then matrix said to be enchances sparsity. Larger value of k forms a set of columns $\Omega$, and the second colums are selected from this set i.e. second max argument in the above definition.

In a recent post I have shortly reviewed my R package for CS called R1magic. Its recent version 0.2 contains a functionality to compute $M(A, k)$. Also now there is a public Github repository of the package. mutualCoherence function is written fully functional way. All operations for computing $M(A,k)$ performed in vectorial fashion in R, using function closures and apply. However, for much larger matrices, a low level implementation may be required.

Example

Here we shortly investigate coherence of Fourier, random and mixed bases in R.

require("R1magic")
set.seed(42)
A <- DFTMatrix0(10) # Fourier Bases
B <- matrix(rnorm(100), 10, 10) # Gaussian Random Matrix
C <- A %*% B # A sensing matrix with A and B as above
aa<-mutualCoherence(A, 8)
bb<-mutualCoherence(A, 8)
bb<-mutualCoherence(A, 8)
aa
[1] 1 1 1 1 1 1 1 1
bb
[1] 0.6784574 1.2011489 1.7001046 2.1713561 2.4664608 2.7302690 2.7908302
[8] 2.9623327
cc
[1] 0.7506222 1.3448452 1.8047043 2.1105348 2.3350516 2.4703822 2.5898766
[8] 2.6882250

We observe that mixed bases, where we define so called a CS matrix, matrix $C$ in above notation, with Fourier ($A$) and Random basis ($B$). Its mutual coherence increases slower than the pure random matrix with unit measure. This may signify a better incoherent basis. However, since this is a "syntetic data', it does not tell any universal behaviour at all. A real data or signal measurement matrix shall be tested for a better judgement of which bases is better to use in sparse recovery. A recent paper in optical tomography uses this concept to identify a better sensing matrix in tomographic image reconstruction.

Friday, 23 December 2011

Crowd dynamics : Fluid Dynamics to Statistical Physics

Study of crowd dynamics with techniques imported from Physical
sciences, particularly statistical physics, is not very new idea but
relatively young field. (See economist article). A recent article gives a
recent advances in crowd dynamics in analogy with hydrodynamics,
Visual crowd surveillance through a hydrodynamics lens.

Sunday, 23 October 2011

Fokker-Planck Equation with logarithmic potential

Israeli group recently shown exact solution of Fokker-Planck equation with 1D logarithmic potential [ article ]. Group explicitly construct probability density function of diffusing particle in a potential, a ln x, a >0.

Saturday, 29 January 2011

Need of physics in cell biology

Physics and cell biology used to be very distinct fields, but now a days, more and more theoretical physics oriented work force moved to this domain and life sciences, specially from statistical mechanics and complex systems point of view, and even people from string theory! A recent article that reviews this perspective appeared [link].
In the similar lines in the article [link] discussion on physics role in biology in generic terms explored. In a humorous way current funding policies are summarized : "all science is either biology or tool-making for biology or not fundable".

Tuesday, 25 January 2011

Random Walk in Confined Diffusion

A recent work explores a theory of random walk in Confined environment [link].

Equivalence of replica and cavity methods

Replica and cavity methods appear [link] in tackling problems with spin glasses and related optimization. A recent work shown the equivalence of these two methods in the context of random networks [link].

Friday, 7 January 2011

Optimal random search

A recent work shows how to exploit a priori distribution in random search.It is shown that square of the distribution must be selected as a search distribution [pre][arxiv].

Thursday, 23 December 2010

Quantum Monte Carlo: Fast force computation

A recent article [doi] by Italian group proposes usage of algorithmic differentiation for force computations appeared in Quantum Monte Carlo technique.

Sunday, 24 October 2010

Pair potentials are not that bad: Water & Silica

One of the challenges in modeling water or silica is the figure out correct description of inter-atomic potential. Usually used models for water in-cooperate 3-body interactions and charges, B.Guillot has a review on the subject [doi]. However there are attempts to use only pair potential for water [doi], as well as for amorphous silica [doi].