Tuesday 25 October 2022

Overfitting is about complexity ranking of inductive biases : Algorithmic recipe

Preamble

    Figure: Moon patterns
human brain
 invents. (Wikipedia)
Detecting overfitting is inherently a comparison problem of the complexity of multiple objects, i.e., models or an algorithm capable of making predictions. A model is overfitted (underfitted) if we only compare it to another model. Model selection involves comparing multiple models with different complexities. The summary of this approach with basic mathematical definitions is given here.

Misconceptions: Poor generalisation is not synonymous with overfitting. 

None of these techniques would prevent us from overfitting: Cross-validation, having more data, early stopping, and comparing test-train learning curves are all about generalisation. Their purpose is not to detect overfitting.

We need at least two different models, i.e., two different inductive biases, to judge which model is overfitted. One distinct approach in deep learning, called dropout, prevents overfitting while it alternates between multiple models, i.e., multiple inductive bias. For judgment, dropout implementation has to compare those alternating model test performances during training to judge overfitting. 

What is an inductive bias? 

There are multiple inceptions of inductive bias. Here, we concentrate on a parametrised model, $\mathscr{M}(\theta)$ on a dataset $\mathscr{D}$, the selection of a model type, or modelling approach, usually manifest as a functional form $\mathscr{M}=f(x)$ or as a function approximation, i.e., for example neural network, are all manifestation of inductive biases. Different parameterisation of model learned on the subsets of the dataset are still the same inductive bias.

Complexity ranking of inductive biases: An Algorithmic recipe 

We are sketching out an algorithmic recipe for complexity ranking of inductive biases via informal steps:
  1. Define a complexity measure $\mathscr{C}$($\mathscr{M}$) over an inductive bias.
  2. Define a generalisation measure  $\mathscr{G}$($\mathscr{M}$, $\mathscr{D}$) over and inductive bias and dataset.
  3. Select a set of inductive biases, at least-two, $\mathscr{M}_{1}$ and $\mathscr{M}_{2}$.
  4. Produce complexity and generalisation measures on ($\mathscr{M}$, $\mathscr{D}$): Here for two inductive biases: $\mathscr{C}_{1}$, $\mathscr{C}_{2}$,   $\mathscr{G}_{1}$, $\mathscr{G}_{2}$.
  5. Ranking of  $\mathscr{M}_{1}$ and $\mathscr{M}_{2}$:  $argmax \{ \mathscr{G}_{1}, \mathscr{G}_{2}\}$ and $argmin \{ \mathscr{C}_{1}, \mathscr{C}_{2}\}$
The core concept appears as when generalisations are close enough we pick out the inductive bias that is less complex. 

Conclusion & Outlook

In practice,  probably due to hectic delivery constraints, or mere laziness, we still rely on simple holdout method to build models, only single test and train split, not even learning curves, specially in deep learning models without practicing Occam's razor. A major insight in this direction appears to be that, holdout approach can only help us to detect generalisation, not overfitting. We clarify this via the concept of inductive bias distinguishing that different parametrisation of the same model doesn't change the inductive bias introduced by the modelling choice. 

In fact, due to resource constraints of model life-cycle, i.e., energy consumption and cognitive load of introducing a complex model, practicing proper Occam's razor: complexity ranking of inductive biases, is much more important than ever for sustainable environment and human capital.

Further reading

Some of the posts, reverse chronological order, that this blog have tried to convey what overfitting entails and its general implications. 


Tuesday 4 October 2022

Heavy-matter-wave and ultra-sensitive interferometry: An opportunity for quantum-gravity becoming an evidence based research

    Solar Eclipse of 1919
(wikipedia)

Preamble
 


   
Cool ideas in theoretical physics are ofter opaque for general reader whether if they are backed up with any experimental evidence in the real world. The success of LIGO (Laser Interferometer Gravitational-wave Observatory) definitely proven the value of interferometry for advancement of cool ideas of theoretical physics supported by real world measurable evidence. An other type of interferometry that could be used in testing multiple-different ideas from theoretical physics is called matter-wave interferometry or atom interferometry: It's been around decades but the new developments and increased sensitivity with measurement on heavy atomic system-waves will pave the technical capabilities to test multiple ideas of theoretical physics. 

Basic mathematical principle of interferometry

Usually interferometry is explained with device and experimental setting details that could be confusing. However,  one could explain the very principle without introducing any experimental setup.  The basic idea of of interferometry is that if a simple wave, such as $\omega(t)=\sin\Theta(t)$, is first split into two waves and reflected over the same distance, one with shifted with a constant phase, in the vacuum without any interactions. A linear combination of the returned waves $\omega_{1}(t)=\sin \Theta(t)$ and  $\omega_{2}(t)=\sin( \Theta(t) + \pi))$, will yield to zero, i.e.,  an interference pattern generated by $\omega_{1}(t)+\omega_{2}(t)=0$. This very basic principle can be used to detect interactions and characteristics of those interactions wave encounter over the time it travels to reflect and come back. Of course, the basic wave used in many interferometry experiments is the laser light and interaction we measure could be gravitational wave that interacts with the laser light i.e., LIGO's set-up.

Detection of matter-waves : What is heavy and ultra-sensitivity?

Each atomic system exhibits some quantum wave properties, i.e., matter waves. It implies a given molecular system have some wave signatures-characteristics which could be extracted in the experimental setting. Instead of laser light, one could use atomic system that is reflected similar to the basic principle. However, the primary difference is that increasing mass requires orders of magnitude more sensitive wave detectors for atomic interferometers. Currently heavy means usually above ~$10^{9}$ Da (comparing to Helium-4 which  is about ~4 Da), these new heavy atomic interferometers might be able to detect gravitational-interactions within quantum-wave level due to precisions achieved ultra-sensitive. This sounds trivial but experimental connection to theories of quantum-gravity, one of the unsolved puzzles in theoretical-physics appears to be a potential break-through. One prominent example in this direction is entropic gravity and wave-function collapse theories.  

Conclusion

Recent developments in heavy matter-wave interferometry could be leveraged for testing quantum-gravity arguments and theoretical suggestions. We try to bring this idea into general attention without resorting in describing experimental details. 

Further Reading & Notes
  • Dalton, mass-unit used in matter-wave interferometry. 
  • Atom Interferometry by Prof. Pritchard YouTube.
  • Newton-Schrödinger equation.
  • A roadmap for universal high-mass matter- wave interferometry  Kilka et. al. AVS Quantum Sci. 4, 020502 (2022). doi
    • Current capabilities as of 2022, atom interferometers can reach up to ~300 kDa.
  • Testing Entropic gravity, arXiv
  • NASA early stage ideas workshops : web-archive