Figure: Maxwell's handwritings, state diagram (Wikipedia) |

**Preamble**

**The modern statistics now move into an emerging field called data science that amalgamate many different fields from high performance computing to control engineering. However, the emergent behaviour from researchers in machine learning and statistics that, sometimes**

*they omit naïvely*and probably

*unknowingly*the fact that some of the most important ideas in data sciences are actually originated from Physics discoveries and specifically developed by physicist. In this short exposition we try to review these physics origins on the areas defined by Gelman and Vehtari (doi). Additional section is also added in other possible areas that are currently the focus of active research in data sciences.

**Bootstrapping and simulation based inference : Gibbs's Ensemble theory and Metropolis's simulations**

Bootstrapping is a novel idea of estimations with uncertainty with given set of samples. It is mostly popularised by Efron and his contribution is immense, making this tool available to all researchers doing quantitative analysis. However, the origins of bootstrapping can be traced back to the idea of ensembles in statistical physics, which is introduced by J. Gibbs. The ensembles in physics allow us to do just what bootstrapping helps, estimating a quantity of interest with sub-sampling, in the case of statistical physics this appears as sampling a set of different microstates. Using this idea Metropolis devised a inference in 1953, to compute ensemble averages for liquids using computers. Note that, usage of Monte Carlo approach for pure mathematical nature, i.e., solving integrals, appear much earlier with von Neumann's efforts.

**Causality : Hamiltonian systems to Thermodynamic potentials**

Figure: Maxwell Relations as causal diagrams. |

**Overparametrised models and regularisation : Poincaré classifications and astrophysical dynamics**

The current deep learning systems classified as massively overparametrized systems. However, the lower dimensional understanding of this phenomenon were well studied by Poincare's classification of classical dynamics, namely the measurement problem of having overdetermined system of differential equations, i.e., whereby inverse problems are well known in astrophysics and theoretical mechanics.

**High-performance computing: Big-data to GPUs**

Similarly, using supercomputers or as now we call it high-performance computation with big data generating processes were actually can be traced back to Manhattan project and ENIAC that aims solving scattering equations and almost 50 years of development on this direction before 2000s.

**Conclusion**

The impressive development of new emergent field of data science as a larger perspective of statistics into computer science have strong origins from core Physics literature and research. These connections are not sufficiently cited or acknowledged. Our aim in this short exposition is to bring these aspects into the attention of data science practitioners and researchers alike.

**Further reading**

Some of the mentioned works and related reading list, papers or books.

- What are the Most Important Statistical Ideas of the Past 50 Years? Gelman & Vehtari (2021)
- Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation, Bradley Efron and Gail Gong (1983)
- Elementary Principles in Statistical Mechanics, Gibbs (1902)
- Equation of State Calculations by Fast Computing Machines, Metropolis et. al. (1953)
- Generalized statistical mechanics: connection with thermodynamics, Curado-Tsallis (1992)
- Poincaré sections of Hamiltonian systems (1996)
### Statistical mechanics of ensemble learning, Anders Krogh and Peter Sollich (1997)

**Appendix: Pearson correlation and Lattices**

Auguste Bravais is famous for his contribution in foundational work on the mathematical theory for crystallography, now seems to be going far beyond periodic solids. Unknown to many, he actually first driven the expression for what we know today as correlation coefficient or Pearson’s correlation or less commonly Pearson-Galton coefficient. Interestingly, one of the grandfathers of causal analysis Wright is mentioned this in his seminal work of 1921 titled “Correlation and causation” acknowledged Bravais for his 1849 work as the first derivation of correlation.

**Appendix: Partition function and set theoretic probability**

Long before Kolmogorov set forward his formal foundations of probabilities, Boltzmann, Maxwell and Gibbs build theories of statistical mechanics using probabilistic language and even define settings for set theoretic foundations by introducing ensembles for thermodynamics. For example, partition function (Z) appeared as defining a normalisation factor that summation of densities should yield to 1. Apparently Kolmogorov and contemporaries inspired a lot from physics and mechanics literature.

## No comments:

## Post a Comment