Tuesday 25 October 2022

Overfitting is about complexity ranking of inductive biases : Algorithmic recipe

Preamble

    Figure: Moon patterns
human brain
 invents. (Wikipedia)
Detecting overfitting is inherently a comparison problem of the complexity of multiple objects, i.e., models or an algorithm capable of making predictions. A model is overfitted (underfitted) if we only compare it to another model. Model selection involves comparing multiple models with different complexities. The summary of this approach with basic mathematical definitions is given here.

Misconceptions: Poor generalisation is not synonymous with overfitting. 

None of these techniques would prevent us from overfitting: Cross-validation, having more data, early stopping, and comparing test-train learning curves are all about generalisation. Their purpose is not to detect overfitting.

We need at least two different models, i.e., two different inductive biases, to judge which model is overfitted. One distinct approach in deep learning, called dropout, prevents overfitting while it alternates between multiple models, i.e., multiple inductive bias. For judgment, dropout implementation has to compare those alternating model test performances during training to judge overfitting. 

What is an inductive bias? 

There are multiple inceptions of inductive bias. Here, we concentrate on a parametrised model, $\mathscr{M}(\theta)$ on a dataset $\mathscr{D}$, the selection of a model type, or modelling approach, usually manifest as a functional form $\mathscr{M}=f(x)$ or as a function approximation, i.e., for example neural network, are all manifestation of inductive biases. Different parameterisation of model learned on the subsets of the dataset are still the same inductive bias.

Complexity ranking of inductive biases: An Algorithmic recipe 

We are sketching out an algorithmic recipe for complexity ranking of inductive biases via informal steps:
  1. Define a complexity measure $\mathscr{C}$($\mathscr{M}$) over an inductive bias.
  2. Define a generalisation measure  $\mathscr{G}$($\mathscr{M}$, $\mathscr{D}$) over and inductive bias and dataset.
  3. Select a set of inductive biases, at least-two, $\mathscr{M}_{1}$ and $\mathscr{M}_{2}$.
  4. Produce complexity and generalisation measures on ($\mathscr{M}$, $\mathscr{D}$): Here for two inductive biases: $\mathscr{C}_{1}$, $\mathscr{C}_{2}$,   $\mathscr{G}_{1}$, $\mathscr{G}_{2}$.
  5. Ranking of  $\mathscr{M}_{1}$ and $\mathscr{M}_{2}$:  $argmax \{ \mathscr{G}_{1}, \mathscr{G}_{2}\}$ and $argmin \{ \mathscr{C}_{1}, \mathscr{C}_{2}\}$
The core concept appears as when generalisations are close enough we pick out the inductive bias that is less complex. 

Conclusion & Outlook

In practice,  probably due to hectic delivery constraints, or mere laziness, we still rely on simple holdout method to build models, only single test and train split, not even learning curves, specially in deep learning models without practicing Occam's razor. A major insight in this direction appears to be that, holdout approach can only help us to detect generalisation, not overfitting. We clarify this via the concept of inductive bias distinguishing that different parametrisation of the same model doesn't change the inductive bias introduced by the modelling choice. 

In fact, due to resource constraints of model life-cycle, i.e., energy consumption and cognitive load of introducing a complex model, practicing proper Occam's razor: complexity ranking of inductive biases, is much more important than ever for sustainable environment and human capital.

Further reading

Some of the posts, reverse chronological order, that this blog have tried to convey what overfitting entails and its general implications. 


No comments:

Post a Comment

(c) Copyright 2008-2024 Mehmet Suzen (suzen at acm dot org)

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.