Scientific Memo

Monday, 1 August 2016

Understanding the empirical law of large numbers and the gambler's fallacy

Platonic dices (Wikipedia).

One of the misconceptions in our understanding of

statistics, or a counter-intuitive guess, fallacy, appears in the assumption of the existence of the law of averages. Imagine we toss a fair coin many times, most people would think that the number of heads and tails would be balanced over the increasing number of trails, which is wrong. If you don't, then you might have a very good statistical intuition. Briefly, we will illustrate this, a kind of gambler's fallacy with a simple simulation approach and discuss the empirical law of large numbers.

Empirical law of large numbers

If we repeat an experiment long enough, we would approach to expected outcome. The simplest example is a coin-toss experiment, that an expected
fair coin toss would lead to equal likelihood for head and tail, 1 or 0. This implies that, the ratio of head and tails will approach to one with increasing number of repeats. Let's say, we toss the coin $N$ times. The number of heads and tails would be $n_{1}$ and $n_{0}$. The emprical law of large numbers states

$ \frac{n_{1}}{n_{0}} \to 1.0 $ if $N$ is very large, $N >>1$

But note that, the absolute difference, $|n_{1}-n_{0}|$ does not approach to any constant, on the contrary, it will increase with increasing number of repeats. This is classic example of gambler's fallacy that an outcome would balance out as there are more repeats.

Fair coin-toss: Evolution of Bernoulli process

The Bernoulli process expresses binary outcomes, 1 or 0, i.e., success or failure, true or false. Bernoulli distribution reads,

$Ber(p, k) = p^{k} (1-p)^{1-k}$ for $k \in \{0,1\}$.

Figure 2: No law of averages. Absolute difference
of occurances
increases over repeats.

Figure 1: Empirical law of large numbers, ratio of
occurances approach to a constant.

$p$ is the probability of success. We draw 50K samples from this distribution to get a Bernoulli process with $p=0.5$ and repeat the experiment 50 times, in order to obtain a "generalised" behaviour with uncertainty. This situation corresponds to a fair coin-toss experiment.

Results

Empirical results of ratio of two outcomes and their absolute difference over repeats are reported in Figure 1 and 2 respectively..

Appendix: Source codes

R and Rcpp functions are shown in this section to reproduce the plots in this post. Source files are also available on github (here).

#'
#'  Misconception of the law of averages: Understanding the empirical law of large numbers
#' 
#'  (c) 2016 Mehmet Suzen 
#'  

rm(list=ls())
set.seed(987231)
library(Rcpp)          # law level implementations
library(LaplacesDemon) # for drawing samples from Bernoulli distribution
library(matlab)        # for tic/toc
# Set working directory to script directory 
Rcpp:::sourceCpp("empirical_law_of_large_numbers.cpp") # diff_binary_vec2
#'
#' Calculate the running difference in a binary vector 
#' 
diff_binary_vec  <- function(bvec) {
 ll <- length(bvec) 
 diff_vec  <- vector("double", ll)
 diff_vec  <- sapply(1:ll, function(i) { abs(i-2.0*sum(bvec[1:i])) } )
 diff_vec
}
#'
#' Calculate the running ratio in a binary vector 
#'
ratio_binary_vec  <- function(bvec) {
 ll         <- length(bvec) 
 ratio_vec  <- vector("double", ll)
 ratio_vec  <- sapply(1:ll, function(i) { abs(i/sum(bvec[1:i])-1.0) } )
 ratio_vec  <- sapply(ratio_vec, function(rv) {if(is.infinite(rv)) { rv <- 0; }; rv })
 ratio_vec
}

#' Wall-call timing difference 
tb <- rbern(20000, 0.5) # a fair-coin
tic()
t1 <- diff_binary_vec(tb)
toc()
tic()
t2 <- diff_binary_vec2(tb)
toc()
tic()
r1 <- ratio_binary_vec(tb)
toc()
tic()
r2 <- ratio_binary_vec2(tb)
toc()
#' 
#' Generate Bernoulli Process
#' 
nr        <- 50    # repeats
nt        <- 20000 # Bernoulli trails
tic()
bern_df <- data.frame(trail=c(), diff=c(), ratio=c())
for(i in 1:nr) {
  cat("repeat:",i, "\n")
  trail   <- rbern(nt, 0.5) # a fair-coin
  diff    <- diff_binary_vec2(trail)
  ratio   <- ratio_binary_vec2(trail)
  bern_df <- rbind(bern_df, cbind(1:nt, diff, ratio))
}

#' Now plot ratio and diff evolution with local regression in ggplot2
names(bern_df) <- c("trail", "diff", "ratio")
library(ggplot2)
p_diff <- ggplot(data=bern_df, aes(x=trail, y=diff)) + geom_smooth(formula="y~x") + 
          theme(
                panel.background = element_blank(), 
                axis.text.x      = element_text(face="bold", color="#000000", size=11),
                axis.text.y      = element_text(face="bold", color="#000000", size=11),
                axis.title.x     = element_text(face="bold", color="#000000", size=11),
                axis.title.y     = element_text(face="bold", color="#000000", size=11))  +
           xlab("Bernoulli Trails") + ylab("Difference between occurance of two outcomes") +
           ggtitle("No Law of Averages: Tail/Heads do not balanced out!")

png(file="no_law_of_averages.png")
p_diff
dev.off()

p_ratio <- ggplot(data=bern_df, aes(x=trail, y=ratio)) + geom_smooth(formula="y~x") + 
          theme(
                panel.background = element_blank(), 
                axis.text.x      = element_text(face="bold", color="#000000", size=11),
                axis.text.y      = element_text(face="bold", color="#000000", size=11),
                axis.title.x     = element_text(face="bold", color="#000000", size=11),
                axis.title.y     = element_text(face="bold", color="#000000", size=11))  +
           xlab("Bernoulli Trails") + ylab("Difference between occurance of two outcomes") +
           ggtitle("Empirical Law of Large Numbers: Ratio of Tails/Heads approach to one")

png(file="law_of_large_numbers.png")
p_ratio
dev.off()

#include <Rcpp.h>
#include <stdlib.h>
using namespace Rcpp;

//
// Sum a numeric vector up to an index
//
double sum_nv(NumericVector x, int s) {
  int i=0;
  double ss=0.0;
  for(i=0;i<(s+1);i++) {
   ss=ss+x[i];
  }
  return(ss);
}

//
// Calculate the running difference in a Bernoulli process (binary_vec)
//
// [[Rcpp::export]]
//
NumericVector diff_binary_vec2(NumericVector binary_vec) {
  int ll = binary_vec.size();
  int i  = 0;
  NumericVector diff_vec(ll, 0.0);
  for(i=0;i<ll;i++) {
    diff_vec[i] = std::abs(i+1.0-2.0*sum_nv(binary_vec, i));
  }
  return(diff_vec);
}

//
// Calculate the running ratio in a Bernoulli process (binary_vec)
//
// [[Rcpp::export]]
NumericVector ratio_binary_vec2(NumericVector binary_vec) {
  int ll = binary_vec.size();
  int i  = 0;
  NumericVector ratio_vec(ll, 0.0);
  for(i=0;i<ll;i++) {
    ratio_vec[i] = std::abs(((i+1.0)/sum_nv(binary_vec, i))-1.0);
  }
  return(ratio_vec);
}

Sunday, 3 May 2015

Constants or integrals of motion: Invariants of a dynamical flow

In this post we will shortly review a concept in dynamical systems namely of invariants of a dynamical flow with a simple derivation using famous Lotka-Volterra system as an example, due to Lotka (1925) and Volterra (1927).

Concept

A dynamical flow associated with an observation vector ${\bf y}(t)$ may have functions, $I({\bf y})$ that are time independent, being $dI/dt=0$. The number of invariants and the length of the observation vector have an effect on overall dynamics.

Lotka-Volterra (LV) System

The LV dynamics explains the behaviour between population of the prey $v$ and population of predators $u$, a case of predator-prey model. We will use a special case of the LV dynamics, remember the dot notation, meaning time derivatives, for predators,
$$ \dot{u} = u (v-2) $$
and for prays,
$$ \dot{v} = v (1-u) $$
Observation vector will consist of $y=(u,v)$.

If we divide these equations, hoping that we can collect $u$ and $v$ in separate terms,

$$
\begin{eqnarray}
\frac{\dot{u}}{\dot{v}}            & = & \frac{u (v-2)}{v(1-u)} \\
\dot{u} v (1-u)                        & = & \dot{v} u (v-2) \\
\dot{u} v (1-u) - \dot{v} u (v-2) & = & 0 \\
\dot{u} (1-u) - \dot{v} u/v (v -2) & = & 0\\
\dot{u} (1-u)/u - \dot{v}(v-2)/v   & = & 0
\end{eqnarray}
$$

If we integrate both sides over time $dt$,
$$
\begin{eqnarray}
\int \frac{1-u}{u} \frac{du}{dt} dt - \int \frac{v-2}{v} \frac{dv}{dt} dt & = &0 \\
\int \frac{1-u}{u}du - \int \frac{v-2}{v} dv & = &0 \\
\end{eqnarray}
$$

Solution of these indefinite integrals yields to a an invariant of the LV dynamics
$$ d I({\bf y})/dt = ln u - u + 2 ln v - v $$
We have shown one invariant of the system. This is important to determine the structure of the system, such as volume preserving dynamics, i.e., Hamiltonian Dynamics.

Further Reading

Geometric Numerical Integration, Ernst Hairer, Christian Lubich, Gerhard Wanner, Springer (2002)
Arnold, V. I. and A. Avez (1968). Ergodic Problems of Classical Mechanics. New York, Benjamin.

Tuesday, 13 May 2014

Is ergodicity a reasonable hypothesis? Understanding Boltzmann's ergodic hypothesis

Ergodic vs. non-ergodic
trajectories (Wikipedia)

Many undergraduate Physics students barely study Ergodic Hypothesis in detail. It is usually manifested as ensemble averages being equal to time averages. While the concept of the statistical ensemble maybe accessible to students, when it comes to ergodic theory and theorems, where higher level mathematical jargon kicks in, it maybe confusing for the novice reader or even practicing Physicists and educator what does ergodicity really mean. For example recent pre-print titled "Is ergodicity a reasonable hypothesis?" defines the ergodicity as follows:

...In the physics literature "ergodicity" is taken to mean that a system, including a macroscopic one, visits all microscopic states in a relatively short time...[link]

Visiting all microscopic states is not a pre-condition for ergodicity from statistical physics stand point. This form of the theory is the manifestation of strong ergodic hypothesis because of the Birkhoff theorem and may not reflect the physical meaning of ergodicity. However, the originator of ergodic hypothesis, Boltzmann, had a different thing in mind in explaining how a system approaches to thermodynamic equilibrium. One of the best explanations are given in the book of J. R. Dorfman, titled An introduction to Chaos and Nonequilibrium Statistical Mechanics [link], in section 1.3, Dorfman explains what Boltzmann had in mind:

...Boltzmann then made the hypothesis that a mechanical system's trajectory in phase-space will spend equal times in regions of equal phase-space measure. If this is true, then any dynamical system will spend most of its time in phase-space region where the values of the interesting macroscopic properties are extremely close to the equilibrium values...[link]

Saying this, Boltzmann did not suggest that a system should visit ALL microscopic states. His argument only suggests that only states which are close the equilibrium has more likelihood to be visited.

Postscript (June 2022)

The sufficiency of Sparse Visits: Physical states are rarely fine-grained

A requirement for attaining ergodicity is visiting all possible states or regions due to the ergodic theorems of Birkhoff and von Neumann. This requirement is not correct for Physics. The key concepts here are coarse-graining and the sufficiency of sparse visits. Most of the physical systems have equally likely states.

The generated dynamics would rarely need to visit all accessible states or regions. Physical systems are rarely fine-grained and have a degree of sparseness, reducing their astronomically large number of states to a handful. In summary, visiting all physical states or regions in time averages is not strictly needed for the physics definition of ergodicity.

A collection of regions or multiple states with a higher probability will need to be covered to achieve thermodynamic equilibrium. A concept of “sufficiency of sparse visits”. This approach makes physical experiments possible over a finite time consistent with thermodynamics.

Friday, 17 January 2014

Particle approximation to probability density functions: Dirac delta function representation

In the previous post, I have briefly shown the idea of using dirac delta function for discrete data representation. In the second example there, a histogram locations for a given set of points are presented as spike trains, where as heights are somehow given in a second sum. This is hard to follow and visualise, of course if you are not that good in reading formulation with different indexes. Due to pedagocial reasons, an easier representation of arbitrary probability density function (PDF), $p(x)$, one would simply need to couple each discrete points with a corresponding weight.

Hence, a set $\{x_{i}, \omega^{i}\}_{i=1}^{N}$ would be an estimation of PDF, $\hat{p}(x)$ . At this point we can invoke dirac delta function,

$ \hat{p}(x) = \sum_{i=1}^{N} \omega^{i} \delta(x-x_{i})$

Let's revisit the R code given there, this time let's draw uniform numbers between $[-2, 2]$ to get 100 $x_{i}$ values. Simply these numbers will indicate the locations on the x-axis, a spike train. For simplicity, let's use Gausian distribution for target PDF, $ \mathcal{N}(0, 1)$. Than, for weights we need to draw numbers using the spike locations. This approach is easier to understand compare to my previous double index notation.

R Example code

Above explained procedure is trivial to implement in R.

# Generate locations 100 x locations
# out out 1000 points in [-2.0, 2.0]
set.seed(42)
# Domain where Dirac comb operates
Xj = seq(-2,2,0.002) 
Xi = sample(Xj, 100)
# Now generate weights from N(0,1) at those given locations
Wi = dnorm(Xi)
# Now visualise
plot(Xi, Wi, type="h",xlim=c(-2.0,2.0),ylim=c(0,0.6),lwd=2,col="blue",ylab="p")

Conclusion

Above notation introduces second abuse of notation while actually there must be a secondary regular grid that pics $x_{i}$ values using dirac delta in practice. Because, the argument of $\hat{p}(x)$, is in the discrete domain. So a little better notation that reflects the above code would be

$ \hat{p}(x_j) = \sum_{i=1}^{N} \omega^{i} \delta(x_{j}-x_{i})$

The set $x_j$ is simply defined in a certain domain, for example regularly. Hence I only recommend not to introduce dirac delta for explaining a particle approximation to PDFs for novice students in the class. It will only confuse them even more.

Figure: Spike trains with weights $\hat{p}(x) = \sum_{i=1}^{N} \omega^{i} \delta(x-x_{i})$