**Preamble**

Occam (Wikipedia) |

*generalisation gap*between model's training and test sets over its learning curves is still out there. Even in some prominent online lectures and blog posts, this misconception is now repeated without critical look. In general, this practice unfortunately diffuse into some academic papers and industrial, practitioners attribute poor generalisation to overfitting. We have provided a resolution of this via a new conceptual identification of complexity plots, so called

*Occam's curves*differentiating from a learning curve. An accessible mathematical definitions here will clarify the resolution of the confusion.

**Learning Curve Setting: Generalisation Gap **

Learning curves explain how a given algorithm's generalisation improves over time or experience, originating from Ebbinghaus's work on human memory. We use inductive bias to express a model, as model can manifest itself in different forms from differential equations to deep learning.

__Definition__: Given inductive bias $\mathscr{M}$ formed by $n$ datasets with monotonically increasing sizes $\mathbb{T} = \{|\mathbb{T}_{0}| > |\mathbb{T}_{1}| > ...> |\mathbb{T}_{n}| \}$. A learning curve $\mathscr{L}$ for $\mathscr{M}$ is expressed by the performance measure of the model over datasets, $\mathbb{p} = \{ p_{0}, p_{1}, ... p_{n} \}$, hence $\mathscr{L}$ is a curve on the plane of $(\mathbb{T}, p)$.

By this definition, we deduce that $\mathscr{M}$ learns if $\mathscr{L}$ increases monotonically.

*A generalisation gap* is defined as follows.

__Definition__: Generalisation gap for inductive bias $\mathscr{M}$ is the difference between its' learning curve $\mathscr{L}$ and the learning curve of the unseen datasets, i.e., so-called training, $\mathscr{L}^{train}$. The difference can be simple difference, or a measure differentiating the gap.

We conjecture the following.

__Conjecture__: Generalisation gap can't identify if $\mathscr{M}$ is an overfitted model. Overfitting is about Occam's razor, and requires a pairwise comparison between two inductive biases of different complexities.

As conjecture suggests that generalisation gap is not about overfitting, despite the common misconception. Then, why the misconception? The misconception lies on the confusion of how to produce the curve that we could judge overfitting.

**Occam Curves: Overfitting Gap [Occam's Gap] **

__Definition__: Given $n$ inductive bias $\mathscr{M_{i}}$ formed by $n$ datasets with monotonically increasing sizes $\mathbb{T} = \{|\mathbb{T}_{0}| > |\mathbb{T}_{1}| > ...> |\mathbb{T}_{n}| \}$. An Occam curve $\mathscr{O}$ for $\mathscr{M}$ is expressed by the performance measure of the model over complexity-dataset size functions $\mathbb{F} = f_{0}(\{|\mathbb{T}_{0}|, \mathscr{C_{0}}) > f_{1}(|\mathbb{T}_{1}| , \mathscr{C_{1}})> ...> f_{n}(|\mathbb{T}_{n}| , \mathscr{C_{n}}) $; Performance of each inductive bias reads $\mathbb{p} = \{ p_{0}, p_{1}, ... p_{n} \}$, hence Occam curve, $\mathscr{O}$ is a curve on the plane of $(\mathbb{F}, p)$.

**Summary and take home**

**Further reading & notes**

- Further posts and a glossary : The concept of overgeneralisation and goodness of rank.
- Double decent phenomenon, it uses Occam's curves, not learning curves.
- We use dataset size as an interpretation of
*increasing experience,*there could be other ways of expressing a gained experience, but we take the most obvious evidence.