Sunday, 27 December 2020

Statistical Physics Origins of Connectionist Learning:
Cooperative Phenomenon to Ising-Lenz Architectures

This is an informal essay in aiming at raising awareness that Statistical Physics played a foundational role in deep learning and neural networks in general beyond being a mare analogy but its origin

Article version of this post is available here: doi. and on HAL Open Science

Preamble

A short account of origins of mathematical formalism of neural networks is presented for physicists and computer scientist in basic discrete mathematical setting informally. The discourse of the development of mathematical formalism on the dynamics of lattice models in statistical physics and learning internal representations of neural networks as discrete architectures as quantitative tools evolve in two almost distinct fields more than half a century with limited overlap. We aim at bridging the gap by claiming that the analogy between two approaches are not artificial but naturally occuring due to how modelling cooperative phenomenon is constructed. We define the Lenz-Ising architectures (ILAs) for this purpose.

Introduction


Tartan Ising Model
Figure: Tartan Ising Model
(Linas Viptas-Wikipedia)
Understanding natural or artificial phenomenon in the language of discrete mathematics is probably one of the most powerful toolbox scientist use [1]. Large portion of computer science and statistical physics deals with such finite structures. One of the most prominent successful usage of such approach was Lenz and Ising’s work on modelling ferromagnetic materials [2–5] and neural networks as a model to biological neuronal structures [6–8].

The analogy between two areas of distinct research have been pointed out by many researchers [9–13]. However, the discourse and evolution of these approaches were kept as two distinct research fields and many innovative approaches rediscovered under different names.

Cooperative Phenomenon

Statistical definition of cooperative phenomenon pioneered by Wannier and Kremer [14–16]. Even though their technical work focused on extension of Ising model to 2D with cyclic boundary condition and introduction of exact solutions with matrix algebra, they were the first to document the potential of how Lenz-Ising model actually represent a more generic system than merely model to ferromagnets, namely anything falls under cooperative phenomenon can be addressed with Lenz-Ising type model, summarised in Definition 1.

Definition 1: Cooperative phenomenon of Wannier type  [14]: Set of $N$ discrete units, $\mathscr{U}$, identified with a function $s_{i}$, i=1,..,N forms a collection or assembly. The function that identifies the units is a mapping $s_{i}: \mathbb{R} \rightarrow \mathbb{R}$. A statistic $\mathscr{S}$ applied on $\mathscr{U}$ is called cooperative phenomenon of Wannier type $\mathscr{W}$.

A statistic $\mathscr{S}$ can be any mapping or set of operations on the assembly of units $\mathscr{U}$ . For example inducing ordering on the assembly of units and summation over  $s_{i}$ values, would corresponds to non-interacting magnetic system with unit external field or non-connected set of neurons capacity of inhibition or exhibition. However, amazingly, Definition 1 is so generic that Rosenblatt’s perceptron [17], current deep learning systems [18] and complex networks [19] falls into this category as well. 

The originality of Cooperative phenomenon of Wannier type comes on a secondary concept, so called event propagation as given in Definition 2.

Definition 2. Event propagation [14] An event is defined as a snapshot of cooperative phenomenon of Wannier type $\mathscr{W}$. If an event takes place of one unit of assembly $\mathscr{U}$, the same event will be favored by other units, this is expressed as event propagation between two disjoint set of units $\mathscr{E}(u_{1}, u_{2})$, and $u_{1} \cap u_{2} = \varnothing$ and $u_{1}, u_{2} \in \mathscr{U}$ and with an additional statistic $\mathscr{S}$ is defined.

The parallels between Wannier’s event propagations are remarkably the same as of neural network formalism defined by McCulloch-Pitts-Kleene [6,7], not only conceptually but matematical treatment is identical and originates from Lenz-Ising model’s treatment of discrete units. As we mentioned, this goes beyond doubt not a simple analogy but forms a generic framework as envisioned by Wannier. The similarity between ferromagnetic systems and neural networks is probably first documented directly by Little [8]: Spin states of magnetic spins corresponds to firing state of a neuron. Unfortunately, Little only see it as simple analogy, and missed the opportunity provided by Wannier as a generic natural phenomenon of cooperation.

The conceptual similarity and inference on Wannier’s event propagation appears to be quite close to Hebb’s learning [20] and gives natural justification for backpropagation for multilayered networks. History of backpropagation is exhaustively studied elsewhere [18].

Lenz-Ising Architectures (ILAs): Ferromagnets to Nerve Nets


Ernst Ising
 Image owner APS - Physics Today :
Obituary
As we established two basic definitions of cooperative phenomenon, we can now define a generic setting of Lenz-Ising model that captures both physics literature that extensively used this in so called spin-glasses research and for neural networks. A guiding principle will be based on Wannier’s definition of cooperative phenomenon.

Definition: Lenz-Ising Architectures (ILAs) 
Given Wannier type cooperative phenomenon $\mathscr{W}$, imposing constrains on the discrete units, $\mathscr{U}^{c}$ that they should be spatially correlated on the edges $E$ of an arbitrary graph $\mathscr{G}(E, V)$ with ordering and with vertices $V$ of the arbitrary graph carring coupling weight between connected two units with biases. Set of event propagations $\mathscr{E}^{c}$ defined on the cooperative phenomeon can induce dynamics on defining vertice weights, or vice versa. ILAs are defined as statistic $\mathscr{S}$ applied to $\mathscr{U}^{c}$ with propagations $\mathscr{E}^{c}$. 

Lenz-Ising Architectures (ILAs) should not be confused with graph neural networks as it does not model data structures. It could be seen as subset of graph dynamical systems in some sense but formal connections should be established elsewhere. However, primary characteristic of ILAs are that it is conceptual and mathematical representation of spin-glass systems (including Lenz-Ising, Anderson, Sherrington-Kirkpatrick, Potts systems) and neural networks (including recurrent and convolutional networks) under the same umbrella.

 Learning representations inherent in Metropolis-Glauber dynamics

The primary originality in any neural network research papers lies in so called learning representation from data and generalisation. However, it isn’t obvious to the that community that actually spin-glasses are capable of learning representations inherently by induced dynamics such as Metropolis or Glauber dynamics by construction, as an inverse problem.

In physics literature this appears as finding a solution to the problem of how to express free energy and minimisation of this with respect to weights or coupling coefficients, This is noting but a learning represenations. Usually a simulation approach is taken as a route, for example Monte Carlo techniques [5, 21, 22] via Metropolis or Glauber dynamics. The intimate connection between concepts of ergodicity and learning in deep learning is recently shown [13,23,24] in this context.

Roy J. Glauber (Wikipedia)  
Glauber dynamics

As we argued earlier the generic definition provided by Wannier on cooperative phenomenon and ILAs; there is an intimate connection with learning and so called solving spin-glasses that usually boils down to computing free energies as mentioned. And a link between two distinct fields, computing backpropagation and free energies are natural candidates to establish equivalence relations.

Conclusions and Outlook

Apart from honouring physicists Lenz and Ising, based on understanding of cooperative phenomenon’s origins, naming the research outpus from of spin-glasses and neural networks under an umbrella term Lenz-Ising architectures (ILAs) is historically accurate and technically a resonable naming scheme under the overwhelming evidence given in the literature. This is akin to naming current computers with von Neumann architectures. This forms the origins of connectionist learning from statistical physics, where this approach currently enjoying vast engineering success today.

The rich connection between two areas in computer science and statistical physics should be celebrated. For more fruitful collaborations, both literatures, embracing large statistics literature as well, should converge much more closely. This would help communities to avoid awkward situations of reinventing the wheel again and hindering recognition of the work done by physicists decades earlies, i.e., Ising and Lenz.

 Notes

No competing or other kind of conflict of interest exists. This work is produced solely with the aim of scholarly work and does not have any personal nature at all. This essay is dedicated in memory of Ernst Ising for his contribution to physics of ferromagnetic materials, now seems to have far more implications.

References

[1] Kenneth H Rosen. Handbook of Discrete and Combinatorial Mathematics. CRC Press, 1999. 

[2] W.Lenz. Beitrag zum Verstl ̈andnis der Magnetischen Erscheinungen in Festen Korpern. Phys.Z21:613, 1920.

[3] Ernst Ising. Beitrag zur Theorie des Ferromagnetismus. Zeitschrift furr Physik, 31(1):253–258, 1925.

[4] Thomas Ising, Reinhard Folk, Ralph Kenna, Bertrand Berche, and Yurij Holovatch. The fate of Ernst Ising and the fate of his model. arXiv preprint arXiv:1706.01764, 2017.

[5] David P Landau and Kurt Binder. A guide to Monte Carlo Simulations in Statistical Physics. Cambridge University Press, 2014.

[6] W.S. McCulloch and W.H. Pitts. A Logical Calculus of the Ideas Imminent in Nervous Activity.Bull. Math. Biophys.,(5), pages 115–133.

[7] Stephen Cole Kleene. Representation of Events in Nerve Nets and Finite Automata. Technical report, RAND Project, Santa Monica, 1951.

[8] W. A. Little. The Existence of Persistent States in the Brain. Mathematical Biosciences, 19(1-2):101–120, 1974.

[9] P Peretto. Collective Properties of Neural Networks: a Statistical Physics Approach. Biological Cybernetics, 50(1):51–62, 1984.

[10] Jan L van Hemmen. Spin-glass Models of a Neural Network. Physical Review A, 34(4):3435, 1986.

[11] Haim Sompolinsky. Statistical Mechanics of Neural Networks. Physics Today, 41(21):70–80,1988.

[12] David Sherrington. Neural Networks: the Spin Glass Approach. In North-Holland MathematicalLibrary, volume 51, pages 261–291. Elsevier, 1993.

[13] Yasaman Bahri, Jonathan Kadmon, Jeffrey Pennington, Sam S Schoenholz, Jascha Sohl-Dickstein, and Surya Ganguli. Statistical Mechanics of Deep Learning. Annual Review of Condensed Matter Physics, 2020.

[14] Gregory H Wannier. The Statistical Problem in Cooperative Phenomena. Reviews of Modern Physics, 17(1):50, 1945.

[15] Hendrik A Kramers and Gregory H Wannier. Statistics of the two-dimensional ferromagnet.Part I. Physical Review, 60(3):252, 1941.

[16] Hendrik A Kramers and Gregory H Wannier. Statistics of the two-dimensional ferromagnet.Part II. Physical Review, 60(3):263, 1941.

[17] C van der Malsburg. Frank Rosenblatt: principles of neurodynamics: perceptrons and the theory of brain mechanisms. In Brain theory, pages 245–248. Springer, 1986.

[18] J. Schmidhuber. Deep learning in Neural Networks: An overview. Neural networks, 61:85–117, 2015. & Yoshua Bengio, Yann Lecun, Geoffrey Hinton, Communications of the ACM, July 2021, Vol. 64 No. 7, Pages 58-65 (2021) link

[19] Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’networks. Nature,393(6684):440, 1998.

[20] Donald Olding Hebb. The Organization of Behavior: a Neuropsychological Theory. J. Wiley;Chapman & Hall, 1949.

[21] Mehmet Suezen. Effective ergodicity in single-spin-flip dynamics. Physical Review E, 90(3):032141, 2014.

[22] Mehmet Suezen. Anomalous diffusion in convergence to effective ergodicity. arXiv preprint arXiv:1606.08693, 2016.

[23] Mehmet Suezen, Cornelius Weber, and Joan J Cerda. Spectral ergodicity in deep learning architectures via surrogate random matrices. arXiv preprint arXiv:1704.08303, 2017.

[24] Mehmet Suezen, JJ Cerda, and Cornelius Weber. Periodic Spectral Ergodicity: A Complexity Measure for Deep Neural Networks and Neural Architecture Search. arXiv preprint arXiv:1911.07831, 2019.


Postscript 1:

(Deep) Machine learning as a subfield of statistical physics

Often researchers considers some machine learning methods
under different umbrella terms compare to established
statistical physics. However, beyond being mare analogy,  
application of these methods are quite striking. Consequently,
there is a great tradition in machine learning practice 
of being sub-field of statistical physics with explicit
classification within PACS. 

Hopfield Networks <- Ising-Lenz model
Boltzmann Machines <- Sherrington-Kirkpatrick model
Diffusion Models <- Langevin Dynamics, Fokker-Planck Dynamics
Softmax <- Boltzmann-Gibbs connection to partition function 
Energy Based Models <- Spin-glasses, Hamiltonian dynamics

For this reason, we provide semi-formal mathematical definitions
in the recent article, establishing that deep learning architectures 
should be called Ising-Lenz Architectures (ILAs), akin to calling 
current computers having von Neumann architectures.

Thursday, 3 December 2020

Resolution of the dilemma in explainable Artificial Intelligence:
Who is going to explain the explainer?

Infinite Regress
 Figure: Infinite
Regress (Wikipedia)
Preamble 

Surge in usage of artificial intelligence (AI) systems, now a standard practice for mid to large scale industries. These systems can not reason by construction and the legal requirements dictates if a machine learning/AI model made a decision, such as granting a loan or not for example, people affected by this decision has right to know the reason. However, it is well known that machine learning models can not reason or provide a reasoning out of box.  Apart from modifying conventional machine learning systems that includes some form of reasoning as a research exercise, practicing or building so called explainable or interpretable machine learning solutions are very popular on top of conventional models. Though there is no accepted definition of what should entail an explanation of the machine learning systems, but in general, this field of study is called explainable  artificial intelligence.

One of the most used or popularised set of techniques essentially build a secondary model on top of the primary model's behaviour and try to come up with a story on how the primary model, AI system, brought up its answers. However, this approach sounds like a good solution at the first glance, it actually trapped us into an infinite regress, a dilemma: Who is going to explain the explainer?

Avoiding 'Who is going to explain the explainer?' dilemma

Resolution of this lies in completely avoiding explainer models or techniques rely on optimisations of a similar sort. We should rely on solely so called counterfactual generators. These generators rely on a repetitive query to the system to generate data on the behaviour of the AI system to answer what if scenarios or a set of what if scenarios, corresponding to a set of reasoning statements. 

What are counterfactual generators?

Figure: Counterfactual generator,
instance based.

These are techniques that can generate a counter factual statement on the predicted machine learning decision. For example for a loan approval model, a counterfactual statement would be 'If applicants income was 10K more a model would have approved the loan". A simplest form of counterfactual generator one can think of is Individual Conditional Expectation (ICE) curves [ Goldstein2013 ], ICE curves shows, what would happen to model decision if one of the feature, such as income, vary over set of values. The idea is simple but it is so powerful that, one can generate dataset for counterfactual reasoning, so the name counterfactual generator. These are classified as model agnostic methods in general [ Du2020, Molnar ] but distinction here we are  trying to make is avoiding building another model to explain the primary model but we solely rely on queries to the model. This rules out LIME, as it relies on building models to explain the model, we question that if linear regression is intrinsically explainable here [Lipton]. One extension to ICE is generating a falling list [ wang14 ] outputs without building models.
 
Outlook

We rule out of using secondary machine learning models or any models, including simple linear regression, in building an explanation for machine learning system. Instead we claim that reasoning can be achieved a simplest level with counterfactual generators based on systems behaviour to different query sets. This seems to be a good direction, as reasoning can be defined as  "algebraically manipulating previously acquired knowledge in order to answer a new question" by Léon Botton [ Botton ] and of course partly inline with Judea Pearl's causal inference revolution, though replacing the machine learning model with the causal model completely would be more causal inference recommendation.

References and further reading

[ Goldstein2013 ] Peeking Inside the Black Box: Visualising Statistical Learning with Plots of Individual Conditional Expectation, Goldstein et. al. arXiv
[ Lipton ] The Mythos of Model Interpretability, Z. Lipton arXiv
[ Molnar ] Interpretable ML book, C. Molnar url
[ Botton ] From machine learning to machine reasoning An essay, Léon Bottou doi
[ Du2020 ] Techniques for Interpretable Machine Learning, Du et. al, doi
[ wang14 ] Falling Rule Lists, Wang-Rudin arXiv


Monday, 30 November 2020

Re-discovery of Inverse problems: What is underspecification for machine learning models?

Radon, founder of 
inverse problems (Wikipedia)

This is a very well known concept in geophysics to image reconstruction communities many decades. Underspecification stems from Hadamard's definition of well-posed problem. It isn't a new problem. If you do a research on underspecification for machine learning, please do make sure that relevant literature on ill-posed problems are studied well before making strong statements. It would be helpful and prevent the reinvention of the wheel.
  

One technique everyone aware of is L2 regularisation, this is to reduce ill-possedness of machine learning models. In the context of how come a deployed model's performance degrade over time, ill-possedness play a role but it isn't the sole reason. There is a large literature on inverse problems dedicated  to solve these issues, and if underspecification was the sole issue for deployed machine learning systems degrading over time: we would have reduced the performance degradation by applying strong L1-regularisations to reduce "the feature selection bias",  hence the lower the effect of underspecification. Specially in deep learning models, underspecification shouldn't be an issue, due to representation learning deep learning models bring naturally, given the inputs covers the basic learning space. 





Saturday, 14 November 2020

Shannon's Entropy: Why it is called Entropy?

 

Ludwig Boltzmann
The story (legend) goes like this, von Neumann was asked by Shannon what he thinks and suggested Shannon call it entropy. 

Shannon's entropy is actually a toy version of Boltzmann's entropy. It is a toy version because it only considers configurational entropy of discrete objects without actually describing microstates. The more interesting connection where almost no-one knows is that actually, Birkhoff's ergodic theory has legitimised Shannon's entropy as his version of ergodicity is the toy version of Boltzmann. Well, Gibbs's contribution has a different angle and it is astonishing that why von Neumann omitted that is interesting.                                                                                                      

Shannon's entropy should be called von Neumann-Boltzmann-Shannon Entropy not only Shannon, maybe adding Birkhoff in the team. 


Cite as 

 @misc{suezen20sew, 
     title = {Shannon's Entropy: Why it is called Entropy? }, 
     howpublished = {\url{https://science-memo.blogspot.com/2020/11/shannons-entropy-why-it-is-called.html}, 
     author = {Mehmet Süzen},
     year = {2020}
}  


Postscripts

  • Ergodicity is an intricate subject: Boltzmann's and Birkoff's differing approaches.
  • Jaynes extensively studied the connection, and his interpretation was similar, he said von Neumann-Shannon expression being a " a more primitive concept" and using statistical mechanical ideas to bring in a mathematical tool for statistical inference. See his papers I and II


Monday, 17 August 2020

Can we simulate quantum state and qubits with conventional computers?

Proper simulation of quantum
computer is possible with
an other
quantum system.
Quantum Lattice NIST (Wikipedia)


The short answer is no. 


Certainly, the quantum state isn't merely a vector of complex numbers. It is inherent to the physical system which one observe quantum properties, i.e., the observer can not be removed from the observed. When we try to measure things we affect the result of the measurement. Unfortunately, contrary to popular belief, quantum systems can not be simulated with conventional computers. Numerical solutions to equations representing quantum systems are not simulations both philosophically and physically. 

Similarly, a qubit is not merely a linear combination of two complex vectors or being 1 and 0 at the same time. Qubit is also a property of a physical system and can not be simulated with a classical computer. 
Q.E.D.




Postscript Notes
  • This is not new of course. Feynman has expressed the same in his landmark paper from 1982, Simulating physics with computers, here".. No! This is called the hidden-variable problem: it is impossible to represent the results of quantum mechanics with a classical universal device..." Richard Feynman  Feynman, R.P. Simulating physics with computers. Int J Theor Phys 21, 467–488 (1982). doi
  • It is not about the hardness of simulating qubits, I.e, dynamical evolution of quantum states, the behaviour that would prove quantum advantage is a physical effect not a computational one. Entanglement is a physical process that provides computational advantage over classical computers, if it were to be replicated with numerical procedure we wouldn't have difficulty of building a quantum computer albeit a simulated one.
  • Let's ask a similar question: Can we prove or conceptually show that there is a quantum advantage with simulation on the classical hardware? Unfortunately it is the same answer: No, quantum advantage can not be simulated. If we could, then we could have a simulated quantum computer on a classical hardware that solves thing much faster than the host hardware.
  • Simulation is not about numerical solutions only, it means the physically intrinsic properties occurring within the simulator: This means one can simulate quantum systems or qubits only with another quantum systems, a recent example is using quantum system of ion traps to simulate another quantum system.
  • Difficulty of simulating a quantum state is not about quantum dynamics : One of the hard problems in quantum computing is simulating a quantum mechanical computing device on a classical hardware. However, this is not about solving dynamics of a quantum system rather having a quantum effect on a classical system.
  • A misconception in quantum computing frameworks: They don’t mean to simulate qubit as in having its behaviour replicated on a classical hardware. If you see a computational framework that claims that it can simulate a qubit, it doesn’t mean that classical hardware can replicate qubit’s behaviour, even if they solve full quantum hamiltonian dynamic evolution . Simulation in those framework implies given parameter settings and outcome is also set, one could think “simulation” in that setting as validation of already happened quantum measurement.   
  • Elusive quantum state simulation : No not possible on “classical machines”
    Even one of the pioneers in quantum computing express his puzzlement of what quantum state implies, i.e., Nielsen (see What does quantum state means?).  Furthermore, current quantum computing libraries presents something called simulation mode or quantum virtual machine. Those novel works do not claim that they can simulate quantum effects on classical machine rather mimics quantum states known behaviour at the time of measurement.
  • Quantum Machine Learning models can't be mapped into classical ML models
    A misconception is still repeated that we can somehow simulate or replicate artefacts of quantum computation with a classical counter part with an approximation. This is not possible due to very nature of quantum mechanical process that it can't be replicated with a classical counter part. Quantum states can't be replicated on a classical hardware, as in producing quantum advantage. 
    "..it is impossible to represent the results of quantum mechanics with a classical universal device..." Richard Feynman 
    cf.  Feynman, R.P. Simulating physics with computers. Int J Theor Phys 21, 467–488 (1982).
    More pessimistic interpretation of this statement, unfortunately, that we can't even translate data from classical hardware to quantum hardware or vice versa.
  • Quantum states can not be replicated on “classical machines”: Quantum Virtual Machines (QVMs) does not claim to replicate quantum effects on classical hardware. It is a misconception to think otherwise leading to a paradox that we could have a quantum advantage on classical devices albeit a simulated one.
  • Simulating quantum computers with LLMs & classical hardware 
    It doesn’t matter if we use LLMs: it isn’t possible to simulate quantum computers on classical hardware. Difficulty is not about exponential computational complexity.  Replication of effects of quantum systems on classical machines is akin to perpetual motion machine. 
  • Classical hardware can’t hold qubitsWhether we use LLMs or not, it isn’t possible to simulate quantum states (computer) on a classical hardware. Difficulty is not about exponential computational complexity of quantum Hamiltonian evolution. 


Tuesday, 7 April 2020

Short Opinion: Extending General Relativity to N-dimensions is not even wrong leading to inventor's paradox

Bohr and Einstein (1926)
(Wikipedia)
It is original to have a solution to the generic case, so-called arbitrary N in mathematics, such as the N-dimensional case. This is considered as a generic solution in computer science as well, inventor's paradox.

However, such generalisation to higher-order objects may not be needed for reality. Mathematical beauty does not bring reality with it by default. An example is trying to generalise General Relativity [1]. I think this is a novel work in mathematics but it may not reflect our physical world as it is. This opinion is not new and probably the reason why many decades, community resisted against the attempts to lower the status of General Relativity as a special case of something higher dimensional object [2] that can not be tested. Einstein's theory of GR is good enough to explain our universe and supported with observations [3].


Trying to the extent any physical theory to higher dimensions may not be even wrong if it can not be observed.


[1]  A Generalization of Gravity, arXiv:1409.6757 

[2] Huggett, Nick and Vistarini, Tiziana (2014) Deriving General Relativity From String Theory.
[3]  On Experimental Tests of the General Theory of Relativity, American Journal of Physics 28, 340 (1960); https://doi.org/10.1119/1.1935800

PS: Links added on 13 April 2020

Saturday, 29 February 2020

Freeman Dyson's contribution to deep learning: Circular ensembles mimics trained deep neural networks

In memory of Professor Dyson, also see the paper Equivalence in Deep Neural Networks via Conjugate Matrix Ensembles

Preamble 
Dyson 2007 (Wikipedia) 
Freeman Dyson was a polymath scientist: theoretical physicist, mathematician and visionary thinker among others. In this post, we will briefly summarise his contribution to deep learning,  i.e., deep neural networks.  Obscure usage of his circular ensembles as a simulation tool in conjunction with the concept of ergodicity explained why deep learning systems learn in such high accuracy.

A simulation tool for deep learning: Circular (Random Matrix) Ensembles

Circular ensembles [1,2,3] developed by Dyson in 1962 for explaining quantum statistical mechanics systems as a modification of basic random matrix theory. Circular ensembles can be used in simulating deep learning architectures [4]. Basically, his three ensembles can be used to generate a "trained deep neural network". It is shown by myself with colleagues from Hamburg and Mallorca that using Dyson's ensembles generated networks, deeper they are so-called spectral ergodicity goes down [4], this is recently proved on real networks as well [5].

How to generate a simulated trained deep neural network in Python

Using Bristol python package [6] one could generate a set of weight matrices corresponding to each layer connections, i.e., weight matrices. A simple example, using Circular Unitary Ensemble (CUE), let's say we have 4 hidden layers of  64, 64, 128, 256 units. This would generate learned weight matrices of sizes 64x64, 64x128 and 128x256, One possible trained network weights can be generated: Note that we make non-square ones by simple multiplying by its transpose. 


from bristol.ensembles import circular
ce = circular()
seed_v   = 997123
W1 = ce.gue(64, set_seed=True, seed=seed_v)
W2 = ce.gue(128, set_seed=True, seed=seed_v)
W3 = ce.gue(256, set_seed=True, seed=seed_v)

These are complex matrices, one could take the arguments or use them as it is if only eigenvalues are needed.  An example of a trained network generation can be found in Zenedo. One can use any one of the circular ensembles.

Conclusion

Dyson's contributions are so bright that even his mathematical tools appear in modern deep learning research. He will be remembered many generations to come as a bright scientist and a polymath. 

References 


[1] Freeman Dyson, Journal of Mathematical Physics 3, 1199 (1962) [link]
[2] Michael Berry, New Journal of Physics 15 (2013) 013026 [link]
[3] Mehmet Süzen (2017), Summary Notebook on Circular ensembles [link]
[4] Spectral Ergodicity in Deep Learning Architectures via Surrogate Random Matrices,
Mehmet Süzen, Cornelius Weber, Joan J. Cerdà, arXiv:1704.08693 [link]
[5] Periodic Spectral Ergodicity: A Complexity Measure for Deep Neural Networks and Neural Architecture Search,
 Mehmet Süzen, Cornelius Weber, Joan J. Cerdà, arXiv:1911.07831 [link]
[6] Bristol Python package [link]


(c) Copyright 2008-2024 Mehmet Suzen (suzen at acm dot org)

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.