Scientific Memo

Showing posts with label cooperative phenomenon. Show all posts

Friday, 11 February 2022

Physics origins of the most important statistical ideas of recent times

Figure: Maxwell's handwritings,
state diagram (Wikipedia)

Preamble

The modern statistics now move into an emerging field called data science that amalgamate many different fields from high performance computing to control engineering. However, the emergent behaviour from researchers in machine learning and statistics that, sometimes they omit naïvely and probably unknowingly the fact that some of the most important ideas in data sciences are actually originated from Physics discoveries and specifically developed by physicist. In this short exposition we try to review these physics origins on the areas defined by Gelman and Vehtari (doi). Additional section is also added in other possible areas that are currently the focus of active research in data sciences.

Bootstrapping and simulation based inference : Gibbs's Ensemble theory and Metropolis's simulations

Bootstrapping is a novel idea of estimations with uncertainty with given set of samples. It is mostly popularised by Efron and his contribution is immense, making this tool available to all researchers doing quantitative analysis. However, the origins of bootstrapping can be traced back to the idea of ensembles in statistical physics, which is introduced by J. Gibbs. The ensembles in physics allow us to do just what bootstrapping helps, estimating a quantity of interest with sub-sampling, in the case of statistical physics this appears as sampling a set of different microstates. Using this idea Metropolis devised a inference in 1953, to compute ensemble averages for liquids using computers. Note that, usage of Monte Carlo approach for pure mathematical nature, i.e., solving integrals, appear much earlier with von Neumann's efforts.

Causality : Hamiltonian systems to Thermodynamic potentials

Figure: Maxwell
Relations as causal
diagrams.

Even though the historical roots of causal analysis in early 20th century attributed to Wright 1923 for his definition of path analysis, causality was the core tanents of Newtonian mechanics in distinguishing left and right of the equations of motions in the form of differential equations, and the set of differential equations following that with Hamiltonian Mechanics is actually forms a graph, i.e., relationships between generalised coordinates, momentum and positions. This connection is never acknowledge in early statistical literature, and probably causal constructions from classical physics were not well known in that community or did not find its way to data-driven mechanics. Similarly, causal construction of thermodynamic potentials appear as a directed graph as in, Born wheel. It appears as a mnemonic but it is actually causally constructed via Legendre Transformations. Of course, causality, philosophically speaking, is discussed since Ancient Greece but here we restrict the discussion on solely quantitative theories after Newton.

Overparametrised models and regularisation : Poincaré classifications and astrophysical dynamics

The current deep learning systems classified as massively overparametrized systems. However, the lower dimensional understanding of this phenomenon were well studied by Poincare's classification of classical dynamics, namely the measurement problem of having overdetermined system of differential equations, i.e., whereby inverse problems are well known in astrophysics and theoretical mechanics.

High-performance computing: Big-data to GPUs

Similarly, using supercomputers or as now we call it high-performance computation with big data generating processes were actually can be traced back to Manhattan project and ENIAC that aims solving scattering equations and almost 50 years of development on this direction before 2000s.

Conclusion

The impressive development of new emergent field of data science as a larger perspective of statistics into computer science have strong origins from core Physics literature and research. These connections are not sufficiently cited or acknowledged. Our aim in this short exposition is to bring these aspects into the attention of data science practitioners and researchers alike.

Further reading

Some of the mentioned works and related reading list, papers or books.

Please cite as follows:

@misc{suezen22pom,

title = { Physics origins of the most important statistical ideas of recent times },

howpublished = {\url{http://science-memo.blogspot.com/2022/02/physics-origins-of-most-important.html},

author = {Mehmet Süzen},

year = {2022}

}

Appendix: Pearson correlation and Lattices

Auguste Bravais is famous for his contribution in foundational work on the mathematical theory for crystallography, now seems to be going far beyond periodic solids. Unknown to many, he actually first driven the expression for what we know today as correlation coefficient or Pearson’s correlation or less commonly Pearson-Galton coefficient. Interestingly, one of the grandfathers of causal analysis Wright is mentioned this in his seminal work of 1921 titled “Correlation and causation” acknowledged Bravais for his 1849 work as the first derivation of correlation.

Appendix: Partition function and set theoretic probability

Long before Kolmogorov set forward his formal foundations of probabilities, Boltzmann, Maxwell and Gibbs build theories of statistical mechanics using probabilistic language and even define settings for set theoretic foundations by introducing ensembles for thermodynamics. For example, partition function (Z) appeared as defining a normalisation factor that summation of densities should yield to 1. Apparently Kolmogorov and contemporaries inspired a lot from physics and mechanics literature.

Appendix: Generative AI

Of course now generative AI took over the hype. Indeed physics of diffusion from Fokker-Planck equation to basic Langevin dynamics is leveraged.

Appendix: Physics is fundamental for the advancement of AI research and practice

AI as a phenomena appears to be in the domain of core physics. For this reason, studying physics as a (post)-degree or as a self-study modules will give students and practitioners alike a definitive cutting-edge insights.

Statistical models based on correlations originates from physics of periodic solids and astrophysical n-body dynamics.
Neural networks originates from the modelling magnetic materials in discrete states and later named as cooperative phenomenon. Their training dynamics closely follows free-energy minimisation.
Causality roots in ensemble theory of physical entropy.
Almost all sampling based techniques are based on the idea of sampling physics of energy surfaces, i.e. Potential Energy Surfaces. (PES).
Generative AI originates from physics of diffusion of fluids: classical Liouville description of the classical mechanics, i.e, phase-space flows and generalised Fokker-Planck dynamics.
Language models based on attention are actually coarse-grained entropy-dynamics
introduced by Gibbs: ‘Attention Layers’ behaves as coarse-graining procedure, i.e, compressed
causal graphs mapping.

This is not about building analogies to physics but as foundational topics to AI.

Sunday, 27 December 2020

Statistical Physics Origins of Connectionist Learning:
Cooperative Phenomenon to Ising-Lenz Architectures

This is an informal essay in aiming at raising awareness that Statistical Physics played a foundational role in deep learning and neural networks in general beyond being a mare analogy but its origin.

Article version of this post is available here: doi. and on HAL Open Science

Preamble

A short account of origins of mathematical formalism of neural networks is presented for physicists and computer scientist in basic discrete mathematical setting informally. The discourse of the development of mathematical formalism on the dynamics of lattice models in statistical physics and learning internal representations of neural networks as discrete architectures as quantitative tools evolve in two almost distinct fields more than half a century with limited overlap. We aim at bridging the gap by claiming that the analogy between two approaches are not artificial but naturally occuring due to how modelling cooperative phenomenon is constructed. We define the Lenz-Ising architectures (ILAs) for this purpose.

Introduction

Figure: Tartan Ising Model
(Linas Viptas-Wikipedia)

Understanding natural or artificial phenomenon in the language of discrete mathematics is probably one of the most powerful toolbox scientist use [1]. Large portion of computer science and statistical physics deals with such finite structures. One of the most prominent successful usage of such approach was Lenz and Ising’s work on modelling ferromagnetic materials [2–5] and neural networks as a model to biological neuronal structures [6–8].

The analogy between two areas of distinct research have been pointed out by many researchers [9–13]. However, the discourse and evolution of these approaches were kept as two distinct research fields and many innovative approaches rediscovered under different names.

Cooperative Phenomenon

Statistical definition of cooperative phenomenon pioneered by Wannier and Kremer [14–16]. Even though their technical work focused on extension of Ising model to 2D with cyclic boundary condition and introduction of exact solutions with matrix algebra, they were the first to document the potential of how Lenz-Ising model actually represent a more generic system than merely model to ferromagnets, namely anything falls under cooperative phenomenon can be addressed with Lenz-Ising type model, summarised in Definition 1.

Definition 1: Cooperative phenomenon of Wannier type [14]: Set of $N$ discrete units, $\mathscr{U}$, identified with a function $s_{i}$, i=1,..,N forms a collection or assembly. The function that identifies the units is a mapping $s_{i}: \mathbb{R} \rightarrow \mathbb{R}$. A statistic $\mathscr{S}$ applied on $\mathscr{U}$ is called cooperative phenomenon of Wannier type $\mathscr{W}$.

A statistic $\mathscr{S}$ can be any mapping or set of operations on the assembly of units $\mathscr{U}$ . For example inducing ordering on the assembly of units and summation over $s_{i}$ values, would corresponds to non-interacting magnetic system with unit external field or non-connected set of neurons capacity of inhibition or exhibition. However, amazingly, Definition 1 is so generic that Rosenblatt’s perceptron [17], current deep learning systems [18] and complex networks [19] falls into this category as well.

The originality of Cooperative phenomenon of Wannier type comes on a secondary concept, so called event propagation as given in Definition 2.

Definition 2. Event propagation [14] An event is defined as a snapshot of cooperative phenomenon of Wannier type $\mathscr{W}$. If an event takes place of one unit of assembly $\mathscr{U}$, the same event will be favored by other units, this is expressed as event propagation between two disjoint set of units $\mathscr{E}(u_{1}, u_{2})$, and $u_{1} \cap u_{2} = \varnothing$ and $u_{1}, u_{2} \in \mathscr{U}$ and with an additional statistic $\mathscr{S}$ is defined.

The parallels between Wannier’s event propagations are remarkably the same as of neural network formalism defined by McCulloch-Pitts-Kleene [6,7], not only conceptually but matematical treatment is identical and originates from Lenz-Ising model’s treatment of discrete units. As we mentioned, this goes beyond doubt not a simple analogy but forms a generic framework as envisioned by Wannier. The similarity between ferromagnetic systems and neural networks is probably first documented directly by Little [8]: Spin states of magnetic spins corresponds to firing state of a neuron. Unfortunately, Little only see it as simple analogy, and missed the opportunity provided by Wannier as a generic natural phenomenon of cooperation.

The conceptual similarity and inference on Wannier’s event propagation appears to be quite close to Hebb’s learning [20] and gives natural justification for backpropagation for multilayered networks. History of backpropagation is exhaustively studied elsewhere [18].

Lenz-Ising Architectures (ILAs): Ferromagnets to Nerve Nets

Ernst Ising
Image owner APS - Physics Today :
Obituary

As we established two basic definitions of cooperative phenomenon, we can now define a generic setting of Lenz-Ising model that captures both physics literature that extensively used this in so called spin-glasses research and for neural networks. A guiding principle will be based on Wannier’s definition of cooperative phenomenon.

Definition: Lenz-Ising Architectures (ILAs)
Given Wannier type cooperative phenomenon $\mathscr{W}$, imposing constrains on the discrete units, $\mathscr{U}^{c}$ that they should be spatially correlated on the edges $E$ of an arbitrary graph $\mathscr{G}(E, V)$ with ordering and with vertices $V$ of the arbitrary graph carring coupling weight between connected two units with biases. Set of event propagations $\mathscr{E}^{c}$ defined on the cooperative phenomeon can induce dynamics on defining vertice weights, or vice versa. ILAs are defined as statistic $\mathscr{S}$ applied to $\mathscr{U}^{c}$ with propagations $\mathscr{E}^{c}$.

Lenz-Ising Architectures (ILAs) should not be confused with graph neural networks as it does not model data structures. It could be seen as subset of graph dynamical systems in some sense but formal connections should be established elsewhere. However, primary characteristic of ILAs are that it is conceptual and mathematical representation of spin-glass systems (including Lenz-Ising, Anderson, Sherrington-Kirkpatrick, Potts systems) and neural networks (including recurrent and convolutional networks) under the same umbrella.

Learning representations inherent in Metropolis-Glauber dynamics

The primary originality in any neural network research papers lies in so called learning representation from data and generalisation. However, it isn’t obvious to the that community that actually spin-glasses are capable of learning representations inherently by induced dynamics such as Metropolis or Glauber dynamics by construction, as an inverse problem.

In physics literature this appears as finding a solution to the problem of how to express free energy and minimisation of this with respect to weights or coupling coefficients, This is noting but a learning represenations. Usually a simulation approach is taken as a route, for example Monte Carlo techniques [5, 21, 22] via Metropolis or Glauber dynamics. The intimate connection between concepts of ergodicity and learning in deep learning is recently shown [13,23,24] in this context.

Roy J. Glauber (Wikipedia)
Glauber dynamics

As we argued earlier the generic definition provided by Wannier on cooperative phenomenon and ILAs; there is an intimate connection with learning and so called solving spin-glasses that usually boils down to computing free energies as mentioned. And a link between two distinct fields, computing backpropagation and free energies are natural candidates to establish equivalence relations.

Conclusions and Outlook

Apart from honouring physicists Lenz and Ising, based on understanding of cooperative phenomenon’s origins, naming the research outpus from of spin-glasses and neural networks under an umbrella term Lenz-Ising architectures (ILAs) is historically accurate and technically a resonable naming scheme under the overwhelming evidence given in the literature. This is akin to naming current computers with von Neumann architectures. This forms the origins of connectionist learning from statistical physics, where this approach currently enjoying vast engineering success today.

The rich connection between two areas in computer science and statistical physics should be celebrated. For more fruitful collaborations, both literatures, embracing large statistics literature as well, should converge much more closely. This would help communities to avoid awkward situations of reinventing the wheel again and hindering recognition of the work done by physicists decades earlies, i.e., Ising and Lenz.

Notes

No competing or other kind of conflict of interest exists. This work is produced solely with the aim of scholarly work and does not have any personal nature at all. This essay is dedicated in memory of Ernst Ising for his contribution to physics of ferromagnetic materials, now seems to have far more implications.

References

[1] Kenneth H Rosen. Handbook of Discrete and Combinatorial Mathematics. CRC Press, 1999.

[2] W.Lenz. Beitrag zum Verstl ̈andnis der Magnetischen Erscheinungen in Festen Korpern. Phys.Z, 21:613, 1920.

[3] Ernst Ising. Beitrag zur Theorie des Ferromagnetismus. Zeitschrift furr Physik, 31(1):253–258, 1925.

[4] Thomas Ising, Reinhard Folk, Ralph Kenna, Bertrand Berche, and Yurij Holovatch. The fate of Ernst Ising and the fate of his model. arXiv preprint arXiv:1706.01764, 2017.

[5] David P Landau and Kurt Binder. A guide to Monte Carlo Simulations in Statistical Physics. Cambridge University Press, 2014.

[6] W.S. McCulloch and W.H. Pitts. A Logical Calculus of the Ideas Imminent in Nervous Activity.Bull. Math. Biophys.,(5), pages 115–133.

[7] Stephen Cole Kleene. Representation of Events in Nerve Nets and Finite Automata. Technical report, RAND Project, Santa Monica, 1951.

[8] W. A. Little. The Existence of Persistent States in the Brain. Mathematical Biosciences, 19(1-2):101–120, 1974.

[9] P Peretto. Collective Properties of Neural Networks: a Statistical Physics Approach. Biological Cybernetics, 50(1):51–62, 1984.

[10] Jan L van Hemmen. Spin-glass Models of a Neural Network. Physical Review A, 34(4):3435, 1986.

[11] Haim Sompolinsky. Statistical Mechanics of Neural Networks. Physics Today, 41(21):70–80,1988.

[12] David Sherrington. Neural Networks: the Spin Glass Approach. In North-Holland MathematicalLibrary, volume 51, pages 261–291. Elsevier, 1993.

[13] Yasaman Bahri, Jonathan Kadmon, Jeffrey Pennington, Sam S Schoenholz, Jascha Sohl-Dickstein, and Surya Ganguli. Statistical Mechanics of Deep Learning. Annual Review of Condensed Matter Physics, 2020.

[14] Gregory H Wannier. The Statistical Problem in Cooperative Phenomena. Reviews of Modern Physics, 17(1):50, 1945.

[15] Hendrik A Kramers and Gregory H Wannier. Statistics of the two-dimensional ferromagnet.Part I. Physical Review, 60(3):252, 1941.

[16] Hendrik A Kramers and Gregory H Wannier. Statistics of the two-dimensional ferromagnet.Part II. Physical Review, 60(3):263, 1941.

[17] C van der Malsburg. Frank Rosenblatt: principles of neurodynamics: perceptrons and the theory of brain mechanisms. In Brain theory, pages 245–248. Springer, 1986.

[18] J. Schmidhuber. Deep learning in Neural Networks: An overview. Neural networks, 61:85–117, 2015. & Yoshua Bengio, Yann Lecun, Geoffrey Hinton, Communications of the ACM, July 2021, Vol. 64 No. 7, Pages 58-65 (2021) link

[19] Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’networks. Nature,393(6684):440, 1998.

[20] Donald Olding Hebb. The Organization of Behavior: a Neuropsychological Theory. J. Wiley;Chapman & Hall, 1949.

[21] Mehmet Suezen. Effective ergodicity in single-spin-flip dynamics. Physical Review E, 90(3):032141, 2014.

[22] Mehmet Suezen. Anomalous diffusion in convergence to effective ergodicity. arXiv preprint arXiv:1606.08693, 2016.

[23] Mehmet Suezen, Cornelius Weber, and Joan J Cerda. Spectral ergodicity in deep learning architectures via surrogate random matrices. arXiv preprint arXiv:1704.08303, 2017.

[24] Mehmet Suezen, JJ Cerda, and Cornelius Weber. Periodic Spectral Ergodicity: A Complexity Measure for Deep Neural Networks and Neural Architecture Search. arXiv preprint arXiv:1911.07831, 2019.

Postscript 1:

(Deep) Machine learning as a subfield of statistical physics

Often researchers considers some machine learning methods
under different umbrella terms compare to established
statistical physics. However, beyond being mare analogy,
application of these methods are quite striking. Consequently,
there is a great tradition in machine learning practice
of being sub-field of statistical physics with explicit
classification within PACS.

Hopfield Networks <- Ising-Lenz model
Boltzmann Machines <- Sherrington-Kirkpatrick model
Diffusion Models <- Langevin Dynamics, Fokker-Planck Dynamics
Softmax <- Boltzmann-Gibbs connection to partition function
Energy Based Models <- Spin-glasses, Hamiltonian dynamics

For this reason, we provide semi-formal mathematical definitions
in the recent article, establishing that deep learning architectures
should be called Ising-Lenz Architectures (ILAs), akin to calling
current computers having von Neumann architectures.

Scientific Memo

Friday, 11 February 2022

Physics origins of the most important statistical ideas of recent times

Statistical mechanics of ensemble learning, Anders Krogh and Peter Sollich (1997)

Sunday, 27 December 2020