Scientific Memo

Sunday, 16 November 2025

Why the simplest explanation is always the best

Preamble

The simplest explanation is always the best among the explanations that are representative. It is called the principle of parsimony, Occam's razor. It is the bedrock of scientific enlightenment. There is a recent development that people start to do a category error to discard this principle based on the following setting that lead to misunderstanding: if we say that the simplest model that can explain the data in one representation but another representation requires a more complex model, then we are choosing more complex model. This is obviously wrong. The simplest explanation is to be chosen from the models that captures the complexity on the given representations, not the simplest over both representations. Filter explanations first based on representations then selection follows. Here we shown the core idea via an illustrative example.

Figure: Circle has a zero Pearson
correlation. (Wikipedia)

Revisiting Occam’s Razor: A case of correlation and geometry

In order to understand this category error, we will work on a concrete example. Let's say we have a data, which is a circular shape $\mathscr{D}(x, y)$, as in figure. We have 3 models:

$$\mathscr{M}_{1} : y = a x + b $$

$$\mathscr{M}_{2} : y = \sqrt{1-x^{2}} $$

$$\mathscr{M}_{3} : y ~ NN(x) $$

And $\mathscr{U}$ a utility function, Pearson correlation $C(x, y)$. We consider performance measure in ranking for the utility and a measure of representation. $\mathscr{M}_{3}$ is a neural network with a lot of parameters.

There is no error in choosing $\mathscr{M}_{1}$ based on the similar correlations these models produce. Principle of parsimony is not violated at all. This is correct if we are only considering numerical representation. Pearson correlation as a utility works well for purely numerical representation. What about geometric representation? Then we need to change our utility function (representation measure or performance function).

Let's say if we use curvature as a utility, $\kappa(x,y)$, in this case $\mathscr{M}_{1}$ fails to capture curvature and its is filtered out before Occam's razor can be applied. Then we left with $\mathscr{M}_{2}$ and $\mathscr{M}_{3}$ .

Correlation and geometric explanations are two different things. Two vastly different geometries can produce the same correlations. A model can be quite good in explaining correlation but fails to capture geometric complexity. In this setting, it does not mean that Occam's razor is wrong. We need to apply Occam's razor across representations: numeric, geometric, algebraic, or symbolic, depending on the purpose. Always keep in mind the purpose or utility of model when invoking Occam's razor. Simplest explanation over the required representations that are relevant are the best.

On the utility, performance and representations measure

Performance function, representation measure and utility functions can be different in real life, here for illustration purposes we consider them interchangeably.

Conclusion

Nature minimises cost over complexity but under utility constraints. Minimal cost without satisfying utility or representation measure won't be chosen despite being the simplest. We need to filter first based on utility or representation measure before applying Occam's razor. The simplest explanation is always the best among the explanations that are representative.

Cite as

@misc{suzen25occam,

title = { Why the simplest explanation is always the best},

howpublished = {\url{https://science-memo.blogspot.com/2025/11/simplest-explanation-always-best.html}},

author = {Mehmet Süzen},

year = {2025}

}

Tuesday, 3 June 2025

Compressive algorithmic randomness:
Gibbs-randomness proposition for massively energy efficient deep learning

Figure: Dual Tomographic Compression
Performance, Süzen, 2025.
Preamble

Randomness is elusive and its probably one of the outstanding concepts for human scientific endeavour, along with gravity. Kolmogorov complexity, appears to be so novel in trying to answering "what is randomness?". The idea that the length of the smallest model that can generate the random sequence determines its complexity was a turning point in history of science. Similarly, it implies choosing the simplest model for explaining a phenomenon. That's why Kolmogorov's work was also supported by the ideas of Solomonoff and Chaitin. A recent work, explores this algorithmic information from compression perspective with Gibbs entropy.

A strange tale of path from applied research to fundamental proposition.

During study of model compression algorithm development, I have noticed an amazing behaviour that information, entropy and compression over compression process have a more in depth.

New concepts in compression and randomness via train-compress cycles

Here, we explain the new concepts for both deep learning model compression and on the interplay between compression and algorithmic randomness.

Inverse compressed sensing (iCS): Normally CS procedure is applied to reconstruct an unknown signal with fewer measurements. In the case of deep learning train-compress, weights are known at one point in the training cycle. If we create hypothetical measurements, using CS formulations, we can reconstruct weights sparse projection.

Dual Tomographic Compression: Applying iCS for the input and output of neuronal level, layer-wise, simultaneously.

Weight rays: An output reconstructed vector out of DTC; weights given sparsity level, though they are not generated in isolation but within train-compress cycle.

Gibbs randomness proposition : An extension of Kolmogorov complexity for a compression process. That, directed randomness is the same as complexity reduction, i.e., compression.

Conclusion

A new technique called DTC can be used to train deep learning with model compression on the fly. This gives rise to massively energy efficient deep learning, reaching almost ~98% reduction in energy use. Moreover, the technique also demonstrated an extended version of Kolmogorov complexity.

Tuesday, 22 April 2025

Ultimate physical limit of data storage:
Connecting Bekenstein Bound to Landauer Principle

Preamble
Bekenstein's Information
(Wikipedia)

Black holes are not too esoteric anymore after LIGO's success and successful imaging efforts. Their entropy behaves much different than the entropy of an ordinary matter. This leads to incredible discovery of so called holographic principle. The principle stating that we live on a projection of higher-dimensional manifestation of universe. On the other hand, Landauer made silently a discovery on energy expenditure of keeping information processing reversible. Similar bound put forward by Bekenstein for black holes made Landauer-Bekenstein bound for quantum gravity a natural avenue to study.

What is the Bekenstein Bound?

Basically, this puts limits the size of a black hole, via its entropy bound. Essentially it states that, entropy of a black hole $S$ is bounded with the radius of the black hole $R$ and Energy $E$, other constants being 1,

$$S \le R \dot E$$

What is Landauer Principle?

A limit on any process wants to delete 1 bit of information, has to dissipate energy proportional to its temperature,

$$ E \ge T ln 2$$

Again we made the constants to 1.

Physical limit of data storage: 1 BekensteinBytes

Using both Bekenstein bound and Landauer principle one can compute the a physical limit for a data storage on a unit sphere,

$$S \sim T$$

If we scaled this with inverse of Planck area $\ell_{p}$; One bit of information proportional to Planck area and maximum attainable temperature $10^{32}$ K is scaled with this. The final value corresponds to $\approx 10^{100}$ bits. This again corresponds to about 10 Giga-Quetta-Quatta-Quatta bytes (1 Quetta byte is $10^{30}$ bytes).

At this point in time, we can propose that $10^{100}$, would be to call 1 BekensteinBytes. However with more fine-grain computations, the number may change. Our purpose here is to give a very rough idea about the scale of ultimate physical limitations of data storage in BekensteinBytes.

Bekenstein Information Conjecture : One cannot compress more than 1 BekensteinBytes on a smallest patch of space.

Outlook

An interesting connections in quantum gravity and computation, provides certain physical limitations on how much information we can be stored at a given unit space. We called this Bekenstein Information Conjecture.

Further reading

Insights into Bekenstein entropy with an intuitive mathematical definitions: A look into Thermodynamics of Black-holes (2023)
Bekenstein Bound (Wikipedia)
Landauer's Principle (Wikipedia)
On the relation between Bekenstein entropy and Landauer’s principle, D. Song (2024) DOI
Planck Scales (Wikipedia)
Metric Prefixes (Wikipedia)