Showing posts with label Bernoulli Trails. Show all posts
Showing posts with label Bernoulli Trails. Show all posts

Wednesday, 11 May 2022

A misconception in ergodicity: Identify ergodic regime not ergodic process

Preamble 

    Figure 1: Two observable's approach to
 ergodicity for Bernoulli Trials. 
Ergodicity appears in many fields, in physics, chemistry and natural sciences but in economics to machine learning as well. Recall that, ergodicity in physics and mathematical definition diverges significantly due to Birkhoff's statistical definition against Boltzmann's physical approach. Here we will follow Birkhoff's definition of ergodicity which is a statistical one. The basic notion of ergodicity is confusing even among experienced  academic circles. The primary misconception is that ergodicity is attributed to a process, a given process being ergodic. We address this by pointing out that ergodicity appears as a regime or a window so to speak for a given process's time-evolution and it can't be attributed to an entire generating process.    

No such thing as ergodic process but ergodic regime given observable

A process being ergodic is not entirely true identification. Ergodicity is a regime over a given time window for a given observable derived from the process. This is the basis of ensemble theory from statistical physics.  Most of the processes generates initially a non-ergodic regime given an observable.  In order to identify an ergodic regime,  we need to define for a discrete setting : 

  1. the ensemble (sample space) : In discrete dynamics we also have an alphabet that ensemble is composed of.
  2. an observable defined over the sample space.
  3. a process (usually dynamics on the sample space evolving over-time).
  4. a measure and threshold to discriminate ergodic to non-ergodic regimes. 
Interesting thing is that different observables on the  same ensemble and the process may generate different ergodic regimes.  

 What are the processes and regime mathematically?

A process is essentially a dynamical system mathematically.  this includes stochastic models and as well as deterministic systems sensitive to initial conditions. Prominently these both combined in Statistical Physics. A regime mathematically implies a range of parameters or a time-window that a system behaves very differently. 

 Identification of ergodic regime

Figure 2: Evolution of
time-averaged OR observable.
The main objective of finding out if dynamics produced by the process on our observable enters or in an ergodic regime is to measure if ensemble-averaged observable is equivalent to time-averaged observable value. Here equivalence is a difficult concept to address quantitatively. The simplest measure would be to check if $\Omega = \langle A \rangle_{ensemble} - \langle A \rangle_{time}$ is close to zero, i.e., vanishing. $ \Omega$ being the ergodicity measure and $A$ is the observable with different averaging procedure. This is the definition we will use here. However, beware that in the physics literature there are more advanced measures to detect ergodicity, such as considering diffusion-like behaviour, meaning that the transition from non-ergodic to ergodic regime is not abrupt but have a diffusing approach to ergodicity.  

Figure 3: Evolution of
time-averaged mean.

In some other academic fields approach to ergodic regime has different names not strictly but closely related, such as in chemical physics or molecular dynamics, equilibration time, relaxation time, equilibrium, steady-state for a given observable, in statistics Monte Carlo simulations, it is usually called burn out period. Not always, but in ergodic regime, observable is stationary and time-independent. In Physics, this is much easier to distinguish because time-dependence, equilibrium and stationarity are tied to energy transfer to the system. 

Ergodic regime not ergodic process : An example of Bernoulli Trials

Apart from real physical processes such as Ising Model, a basic process we can use to understand how ergodic regime could be detected using Bernoulli Trials. 

Here for a Bernoulli trials/process, we will use random number generators for a binary outcome, i.e., RNG Marsenne-Twister  to generate time evolution of an observables on two sites:  Let's say we have two sites  $x, y \in \{1, 0\}$.  The ensemble of this two site system $xy$ is simply the sample space of all possible outcomes $S=\{10, 11, 01, 00\}$. Time evolution of such two site system is formulated here as choosing $\{0,1\}$  for a given site at a given time, see Appendix Python notebook. 

Now, the most important part of checking ergodic regime is that we need to define an observable over two side trials. We denote two observable as $O_{1}$, which is an OR operation between sites, and $O_{2}$ is averaged over two sites. Since our sample space is small, we can compute the ensemble average observables analytically:

  • $O_{1}  =  (x+y)/2$  then  $10, 11, 01, 00  ;  (1/2 + 2/2 + 1/2 + 0 ) /4 = 0.5$
  • $O_{2}  =  x OR y$ then   $10, 11, 01, 00  ;  ( 1  + 1 + 1 + 0 )/4 = 0.75$   

We can compute the time-averaged observables over time via simulations, but their formulation are know as follows: 

  •  Time average for $O_{1}$ at time $t$  (current step)  is   $ \frac{1}{t} \sum_{i=0}^{t} (x_{i}+y_{i})/2.0$
  • Time average for $O_{2}$ at time $t$  (current step)  is   $ \frac{1}{t} \sum_{i=0}^{t} (x_{i} OR y_{i})$.

One of the possible trajectories are shown in Figure 2 and 3. For approach to ergodicity measure, we shown this at Figure 1. Even though, we should run multiple trajectories to have error estimates, we can clearly see that ergodicity regime starts after 10K steps, at least. Moreover, different observables have different decay rates to ergodic regime.  From preliminary simulation, it appears to be OR observable converges slower, though this is a single trajectory.

Conclusion

We have shown that manifestation of the ergodic regime depends on the time-evolution of the observable given a measure of ergodicity, i.e., a condition how ergodicity is detected. This exposition should clarify that a generating process does not get an attribute of "ergodic process" rather we talk about "ergodic regime" depending on observable and the process over temporal evolution. Interestingly, from Physics point of view, it is perfectly possible that an observable attains ergodic regime and then falls back to non-ergodic regime.

Further reading

Appendix: Code

Bernoulli Trial example we discussed is available as a Python notebook on github here

Please cite as follows:

 @misc{suezen22ergoreg, 
     title = {A misconception in ergodicity: Identify ergodic regime not ergodic process}, 
     howpublished = {\url{http://science-memo.blogspot.com/2022/05/ergodic-regime-not-process.html}, 
     author = {Mehmet Süzen},
     year = {2022}
}  

Monday, 1 August 2016

Understanding the empirical law of large numbers and the gambler's fallacy

    Platonic dices (Wikipedia).


One of the misconceptions in our understanding of 
statistics, or a counter-intuitive guess, fallacy, appears in the assumption of the existence of the law of averages. Imagine we toss a fair coin many times, most people would think that the number of heads and tails would be balanced over the increasing number of trails, which is wrong. If you don't, then you might have a very good statistical intuition. Briefly,  we will illustrate this, a kind of gambler's fallacy with a simple simulation approach and discuss the empirical law of large numbers.

Empirical law of large numbers

If we repeat an experiment long enough, we would approach to expected outcome. The simplest example is a coin-toss experiment, that an expected 
fair coin toss would lead to equal likelihood for head and tail, 1 or 0.  This implies that, the ratio of head and tails will approach to one with increasing number of repeats. Let's say, we toss the coin $N$ times. The number of  heads and tails would be $n_{1}$ and $n_{0}$. The emprical law of large numbers states

$ \frac{n_{1}}{n_{0}} \to 1.0 $ if $N$ is very large, $N >>1$


But note that, the absolute difference, $|n_{1}-n_{0}|$ does not approach to any constant, on the contrary, it will increase with increasing number of repeats. This is classic example of gambler's fallacy that an outcome would balance out as there are more repeats. 


Fair coin-toss: Evolution of Bernoulli process

The Bernoulli process expresses binary outcomes, 1 or 0, i.e., success or failure, true or false. Bernoulli distribution reads,

$Ber(p, k) = p^{k} (1-p)^{1-k}$ for $k \in \{0,1\}$.

Figure 2: No law of averages. Absolute difference
of occurances
increases over repeats.


Figure 1: Empirical law of large numbers, ratio of
occurances approach to a constant.
$p$ is the probability of success. We draw 50K samples from this distribution to get a Bernoulli process with $p=0.5$ and repeat the experiment 50 times, in order to obtain a "generalised" behaviour with uncertainty. This situation corresponds to a fair coin-toss experiment.  


Results


Empirical results of ratio of two outcomes and their absolute difference over repeats are reported in Figure 1 and 2 respectively..



Appendix: Source codes

R and Rcpp functions are shown in this section to reproduce the plots in this post. Source files are also available on github (here).
#'
#'  Misconception of the law of averages: Understanding the empirical law of large numbers
#' 
#'  (c) 2016 Mehmet Suzen 
#'  

rm(list=ls())
set.seed(987231)
library(Rcpp)          # law level implementations
library(LaplacesDemon) # for drawing samples from Bernoulli distribution
library(matlab)        # for tic/toc
# Set working directory to script directory 
Rcpp:::sourceCpp("empirical_law_of_large_numbers.cpp") # diff_binary_vec2
#'
#' Calculate the running difference in a binary vector 
#' 
diff_binary_vec  <- function(bvec) {
 ll <- length(bvec) 
 diff_vec  <- vector("double", ll)
 diff_vec  <- sapply(1:ll, function(i) { abs(i-2.0*sum(bvec[1:i])) } )
 diff_vec
}
#'
#' Calculate the running ratio in a binary vector 
#'
ratio_binary_vec  <- function(bvec) {
 ll         <- length(bvec) 
 ratio_vec  <- vector("double", ll)
 ratio_vec  <- sapply(1:ll, function(i) { abs(i/sum(bvec[1:i])-1.0) } )
 ratio_vec  <- sapply(ratio_vec, function(rv) {if(is.infinite(rv)) { rv <- 0; }; rv })
 ratio_vec
}

#' Wall-call timing difference 
tb <- rbern(20000, 0.5) # a fair-coin
tic()
t1 <- diff_binary_vec(tb)
toc()
tic()
t2 <- diff_binary_vec2(tb)
toc()
tic()
r1 <- ratio_binary_vec(tb)
toc()
tic()
r2 <- ratio_binary_vec2(tb)
toc()
#' 
#' Generate Bernoulli Process
#' 
nr        <- 50    # repeats
nt        <- 20000 # Bernoulli trails
tic()
bern_df <- data.frame(trail=c(), diff=c(), ratio=c())
for(i in 1:nr) {
  cat("repeat:",i, "\n")
  trail   <- rbern(nt, 0.5) # a fair-coin
  diff    <- diff_binary_vec2(trail)
  ratio   <- ratio_binary_vec2(trail)
  bern_df <- rbind(bern_df, cbind(1:nt, diff, ratio))
}

#' Now plot ratio and diff evolution with local regression in ggplot2
names(bern_df) <- c("trail", "diff", "ratio")
library(ggplot2)
p_diff <- ggplot(data=bern_df, aes(x=trail, y=diff)) + geom_smooth(formula="y~x") + 
          theme(
                panel.background = element_blank(), 
                axis.text.x      = element_text(face="bold", color="#000000", size=11),
                axis.text.y      = element_text(face="bold", color="#000000", size=11),
                axis.title.x     = element_text(face="bold", color="#000000", size=11),
                axis.title.y     = element_text(face="bold", color="#000000", size=11))  +
           xlab("Bernoulli Trails") + ylab("Difference between occurance of two outcomes") +
           ggtitle("No Law of Averages: Tail/Heads do not balanced out!")

png(file="no_law_of_averages.png")
p_diff
dev.off()

p_ratio <- ggplot(data=bern_df, aes(x=trail, y=ratio)) + geom_smooth(formula="y~x") + 
          theme(
                panel.background = element_blank(), 
                axis.text.x      = element_text(face="bold", color="#000000", size=11),
                axis.text.y      = element_text(face="bold", color="#000000", size=11),
                axis.title.x     = element_text(face="bold", color="#000000", size=11),
                axis.title.y     = element_text(face="bold", color="#000000", size=11))  +
           xlab("Bernoulli Trails") + ylab("Difference between occurance of two outcomes") +
           ggtitle("Empirical Law of Large Numbers: Ratio of Tails/Heads approach to one")

png(file="law_of_large_numbers.png")
p_ratio
dev.off()

#include <Rcpp.h>
#include <stdlib.h>
using namespace Rcpp;

//
// Sum a numeric vector up to an index
//
double sum_nv(NumericVector x, int s) {
  int i=0;
  double ss=0.0;
  for(i=0;i<(s+1);i++) {
   ss=ss+x[i];
  }
  return(ss);
}

//
// Calculate the running difference in a Bernoulli process (binary_vec)
//
// [[Rcpp::export]]
//
NumericVector diff_binary_vec2(NumericVector binary_vec) {
  int ll = binary_vec.size();
  int i  = 0;
  NumericVector diff_vec(ll, 0.0);
  for(i=0;i<ll;i++) {
    diff_vec[i] = std::abs(i+1.0-2.0*sum_nv(binary_vec, i));
  }
  return(diff_vec);
}

//
// Calculate the running ratio in a Bernoulli process (binary_vec)
//
// [[Rcpp::export]]
NumericVector ratio_binary_vec2(NumericVector binary_vec) {
  int ll = binary_vec.size();
  int i  = 0;
  NumericVector ratio_vec(ll, 0.0);
  for(i=0;i<ll;i++) {
    ratio_vec[i] = std::abs(((i+1.0)/sum_nv(binary_vec, i))-1.0);
  }
  return(ratio_vec);
}
(c) Copyright 2008-2024 Mehmet Suzen (suzen at acm dot org)

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.