Scientific Memo

Figure: Maxwell's handwritings,
state diagram (Wikipedia)

Preamble

The modern statistics now move into an emerging field called data science that amalgamate many different fields from high performance computing to control engineering. However, the emergent behaviour from researchers in machine learning and statistics that, sometimes they omit naïvely and probably unknowingly the fact that some of the most important ideas in data sciences are actually originated from Physics discoveries and specifically developed by physicist. In this short exposition we try to review these physics origins on the areas defined by Gelman and Vehtari (doi). Additional section is also added in other possible areas that are currently the focus of active research in data sciences.

Bootstrapping and simulation based inference : Gibbs's Ensemble theory and Metropolis's simulations

Bootstrapping is a novel idea of estimations with uncertainty with given set of samples. It is mostly popularised by Efron and his contribution is immense, making this tool available to all researchers doing quantitative analysis. However, the origins of bootstrapping can be traced back to the idea of ensembles in statistical physics, which is introduced by J. Gibbs. The ensembles in physics allow us to do just what bootstrapping helps, estimating a quantity of interest with sub-sampling, in the case of statistical physics this appears as sampling a set of different microstates. Using this idea Metropolis devised a inference in 1953, to compute ensemble averages for liquids using computers. Note that, usage of Monte Carlo approach for pure mathematical nature, i.e., solving integrals, appear much earlier with von Neumann's efforts.

Causality : Hamiltonian systems to Thermodynamic potentials

Figure: Maxwell
Relations as causal
diagrams.

Even though the historical roots of causal analysis in early 20th century attributed to Wright 1923 for his definition of path analysis, causality was the core tanents of Newtonian mechanics in distinguishing left and right of the equations of motions in the form of differential equations, and the set of differential equations following that with Hamiltonian Mechanics is actually forms a graph, i.e., relationships between generalised coordinates, momentum and positions. This connection is never acknowledge in early statistical literature, and probably causal constructions from classical physics were not well known in that community or did not find its way to data-driven mechanics. Similarly, causal construction of thermodynamic potentials appear as a directed graph as in, Born wheel. It appears as a mnemonic but it is actually causally constructed via Legendre Transformations. Of course, causality, philosophically speaking, is discussed since Ancient Greece but here we restrict the discussion on solely quantitative theories after Newton.

Overparametrised models and regularisation : Poincaré classifications and astrophysical dynamics

The current deep learning systems classified as massively overparametrized systems. However, the lower dimensional understanding of this phenomenon were well studied by Poincare's classification of classical dynamics, namely the measurement problem of having overdetermined system of differential equations, i.e., whereby inverse problems are well known in astrophysics and theoretical mechanics.

High-performance computing: Big-data to GPUs

Similarly, using supercomputers or as now we call it high-performance computation with big data generating processes were actually can be traced back to Manhattan project and ENIAC that aims solving scattering equations and almost 50 years of development on this direction before 2000s.

Conclusion

The impressive development of new emergent field of data science as a larger perspective of statistics into computer science have strong origins from core Physics literature and research. These connections are not sufficiently cited or acknowledged. Our aim in this short exposition is to bring these aspects into the attention of data science practitioners and researchers alike.

Further reading

Some of the mentioned works and related reading list, papers or books.

Please cite as follows:

@misc{suezen22pom,

title = { Physics origins of the most important statistical ideas of recent times },

howpublished = {\url{http://science-memo.blogspot.com/2022/02/physics-origins-of-most-important.html},

author = {Mehmet Süzen},

year = {2022}

}

Appendix: Pearson correlation and Lattices

Auguste Bravais is famous for his contribution in foundational work on the mathematical theory for crystallography, now seems to be going far beyond periodic solids. Unknown to many, he actually first driven the expression for what we know today as correlation coefficient or Pearson’s correlation or less commonly Pearson-Galton coefficient. Interestingly, one of the grandfathers of causal analysis Wright is mentioned this in his seminal work of 1921 titled “Correlation and causation” acknowledged Bravais for his 1849 work as the first derivation of correlation.

Appendix: Partition function and set theoretic probability

Long before Kolmogorov set forward his formal foundations of probabilities, Boltzmann, Maxwell and Gibbs build theories of statistical mechanics using probabilistic language and even define settings for set theoretic foundations by introducing ensembles for thermodynamics. For example, partition function (Z) appeared as defining a normalisation factor that summation of densities should yield to 1. Apparently Kolmogorov and contemporaries inspired a lot from physics and mechanics literature.

Appendix: Generative AI

Of course now generative AI took over the hype. Indeed physics of diffusion from Fokker-Planck equation to basic Langevin dynamics is leveraged.

Appendix: Physics is fundamental for the advancement of AI research and practice

AI as a phenomena appears to be in the domain of core physics. For this reason, studying physics as a (post)-degree or as a self-study modules will give students and practitioners alike a definitive cutting-edge insights.

Statistical models based on correlations originates from physics of periodic solids and astrophysical n-body dynamics.
Neural networks originates from the modelling magnetic materials in discrete states and later named as cooperative phenomenon. Their training dynamics closely follows free-energy minimisation.
Causality roots in ensemble theory of physical entropy.
Almost all sampling based techniques are based on the idea of sampling physics of energy surfaces, i.e. Potential Energy Surfaces. (PES).
Generative AI originates from physics of diffusion of fluids: classical Liouville description of the classical mechanics, i.e, phase-space flows and generalised Fokker-Planck dynamics.
Language models based on attention are actually coarse-grained entropy-dynamics
introduced by Gibbs: ‘Attention Layers’ behaves as coarse-graining procedure, i.e, compressed
causal graphs mapping.

This is not about building analogies to physics but as foundational topics to AI.

Ergodic vs. non-ergodic
trajectories (Wikipedia)

Many undergraduate Physics students barely study Ergodic Hypothesis in detail. It is usually manifested as ensemble averages being equal to time averages. While the concept of the statistical ensemble maybe accessible to students, when it comes to ergodic theory and theorems, where higher level mathematical jargon kicks in, it maybe confusing for the novice reader or even practicing Physicists and educator what does ergodicity really mean. For example recent pre-print titled "Is ergodicity a reasonable hypothesis?" defines the ergodicity as follows:

...In the physics literature "ergodicity" is taken to mean that a system, including a macroscopic one, visits all microscopic states in a relatively short time...[link]

Visiting all microscopic states is not a pre-condition for ergodicity from statistical physics stand point. This form of the theory is the manifestation of strong ergodic hypothesis because of the Birkhoff theorem and may not reflect the physical meaning of ergodicity. However, the originator of ergodic hypothesis, Boltzmann, had a different thing in mind in explaining how a system approaches to thermodynamic equilibrium. One of the best explanations are given in the book of J. R. Dorfman, titled An introduction to Chaos and Nonequilibrium Statistical Mechanics [link], in section 1.3, Dorfman explains what Boltzmann had in mind:

...Boltzmann then made the hypothesis that a mechanical system's trajectory in phase-space will spend equal times in regions of equal phase-space measure. If this is true, then any dynamical system will spend most of its time in phase-space region where the values of the interesting macroscopic properties are extremely close to the equilibrium values...[link]

Saying this, Boltzmann did not suggest that a system should visit ALL microscopic states. His argument only suggests that only states which are close the equilibrium has more likelihood to be visited.

Postscript (June 2022)

The sufficiency of Sparse Visits: Physical states are rarely fine-grained

A requirement for attaining ergodicity is visiting all possible states or regions due to the ergodic theorems of Birkhoff and von Neumann. This requirement is not correct for Physics. The key concepts here are coarse-graining and the sufficiency of sparse visits. Most of the physical systems have equally likely states.

The generated dynamics would rarely need to visit all accessible states or regions. Physical systems are rarely fine-grained and have a degree of sparseness, reducing their astronomically large number of states to a handful. In summary, visiting all physical states or regions in time averages is not strictly needed for the physics definition of ergodicity.

A collection of regions or multiple states with a higher probability will need to be covered to achieve thermodynamic equilibrium. A concept of “sufficiency of sparse visits”. This approach makes physical experiments possible over a finite time consistent with thermodynamics.

Scientific Memo

Friday, 11 February 2022

Physics origins of the most important statistical ideas of recent times

Statistical mechanics of ensemble learning, Anders Krogh and Peter Sollich (1997)

Tuesday, 13 May 2014

Is ergodicity a reasonable hypothesis? Understanding Boltzmann's ergodic hypothesis

Mehmet Suzen

Related

(c) Copyright 2008-2024 Mehmet Suzen (suzen at acm dot org)