Wednesday, 20 November 2013

Demystify Dirac delta function for data representation on discrete space

Dirac delta function is an important tool in Fourier Analysis. It is used specially in electrodynamics and signal processing routinely.  A function over set of data points
is often shown with a delta function representation. A novice reader relying on integral properties of the delta function may found this notation quite confusing.  Probably, the notation itself is an example of  abuse of notation.

One dimensional function/distribution: Sum of delta functions

Let's define a one dimensional function, $f(x)$ as follows, $x \in \mathbb{R}$ and $a$ being constant:

$ f(x) = a \sum_{i=-n}^{n} \delta(x - x_{i})$

This representation is inspired from Dirac comb and used in spike trains.  Note that set of data points in one dimension $\{x_{i} \}$ will determine the graph of this function. Using the shifting property of delta function, the value of the function will be zero every where except on data points. The constant $a$ will simply be the height of the graph at the data point.

Figure: A spike train.

Numeric Example

Let's plot $f(x)$ for some specific values of the set $\{x_{i} \} = {-0.5, -0.2, -0.1, 0.2, 0.4}$ and $a=0.5$. Here is the R code for plotting this spike train. 

x_i = c(-0.5, -0.2, -0.1, 0.2, 0.4)
a   = c(0.5, 0.5, 0.5, 0.5, 0.5)

Representing Histograms: One dimensional example

Particularly convenient representation of histograms can be developed similarly. Consider set of points $\{x_{i}\}_{i=1}^{n}$ where we would like to establish a histogram out of this set, let's say $h(x)$. If we set our histogram intervals as $\{x_{j}\}_{j=1}^{m}$. The histogram $h(x)$ then can be written as

$h(x_{j}) =  \sum_{i=1}^{n} \sum_{j=1}^{m} \delta(x_{j}- x_{i}^{min})$
where set $x_{i}^{min}$  represents the value from set $\{x_{j}\}_{j=1}^{m}$ that is closest to given $x_{i}$. Where as, second sum determines the height at a given point, i.e., frequency. This is just a confusing mathematical representation and practical implementation only counts the frequency of $x_{i}^{min}$ directly.


However it is quite trivial, the above usage of sum of delta functions appear in mathematical physics as well, not limited to statistics.

Tuesday, 18 June 2013

N-body wave function has no meaning but shut up and calculate: quantum bayesianism

During my graduate school years I was lucky to be assigned on reporting a technique used in computational quantum chemistry, namely resolution of identity.  It is a simple trick to approximate many integrals with a smaller one. I don't want to discuss about this trick but rather a peculiar idea, if not controversial,  I come up during preparation of that report (here). It is about the legitimacy of wave function in many electron, let's be brave and say N-body, physical systems. Argument is posed by Walter Kohn, Nobel Physicist, in his lecture (here) attributed to Van Vleck:
I begin with a provocative statement. In general the
many-electron wave function $\Psi(r_{1} , . . . ,r_{N} )$ for a system
of $N$ electrons is not a legitimate scientific concept, when
$N \ge N_{0}$ , where $N_{0} = 10^{3}$ .
I believe this argument has a profound implication in interpretation of quantum mechanics. However,  sadly, perception of quantum mechanics in many circles of scientific community is limited to Schrödinger equation and interpretations based on wave function. Specially the concept of wave function collapse. In my view these approaches does not make any sense for macroscopic systems such as, humans. Considering Kohn's statement. I can only see the relevance in quantum computing where really small physical systems are in consideration. (Also see my previous post, pointing out recent works in constructing quantum mechanics without wave function)

One prominent figure in recent times is distinguished Professor Mermin who popularised the short version of Copenhagen interpretation as : 'shut up and calculate' and developer of the ithaka interpretation . Very recently discusses about quantum Bayesianism (here). Again, using Kohn's statement we have to be careful not to extent this concept, again, to macroscopic systems like Humans' or even larger atomic systems, like measurement devices. So, replacing a quantum measurement device with a human observer is a mistake, if not a sin. Doesn't matter even if they have a fancy names like Bob or Alice. Because a measurement device big enough has no defined or meaningful wave function hence any kind of  "quantum probability",  i.e., no superposition to observed systems can be established. Probably, problem lies in the transition from microscopic to macroscopic system. There was a large effort in this direction (here).

What ever you believe and read about quantum mechanics. I can only suggest that your crap detectors must function fully all the time when you hear some one talk about quantum mechanics and its interpretation, including this post. I recommend you Neil Postman's article Bullshit and the Art of Crap-Detection.

PS: Recently, there was an interesting cartoon on xkcd saying that we can safely ignore any phrase that starts with "According to quantum mechanics...".

Tuesday, 9 April 2013

Matrix Cumulative Coherence: Fourier Bases, Random and Sensing Matrices

Compressive sampling (CS) is revolutionizing the way we process analog to digital conversion, our understanding of linear systems and the limits of information theory. One of the key concept in CS is that a signal can be represented in a sparse bases or it is already sparse. The novelty of sparsity bases is that when signal is randomly sampled, in a nutshell it can be recovered (or a solution can be found to a linear problem) with fewer samples. A land mark example is being sparse MRI.

The concept of choosing "more" sparse bases or CS sensing matrix lies in the measure of mutual coherence. It is defined as follows for a given matrix at order k.
$$  M(A, k) = max_{p} max_{p \ne q, q \in \Omega } \sum_{q} | <a_{p}, a_{q}> | / ( |a_{p}| |a_{q}|)$$ When k=1, it is easy to understand what it means. Basically we find the largest inner product among columns of the given matrix. Lower the value better the sparsity i.e. incoherence. However a single number may not be so informative, after all how low is better.  With the definition of David Donoho and Joel Tropp,  if M is slowly increasing then matrix said to be enchances sparsity. Larger value of k forms a set of columns $\Omega$, and the second colums are selected from this set i.e. second max argument in the above definition.

In a recent post I have shortly reviewed my R package for CS called R1magic. Its recent version 0.2 contains a functionality to compute $M(A, k)$.  Also now there is a public Github repository of the package. mutualCoherence function is written fully functional way. All operations for computing $M(A,k)$ performed in vectorial fashion in R, using function closures and apply. However, for much larger matrices, a low level implementation may be required.


Here we shortly investigate coherence of Fourier, random and mixed bases in R.

A <- DFTMatrix0(10) # Fourier Bases
B <- matrix(rnorm(100), 10, 10) # Gaussian Random Matrix
C <- A %*% B # A sensing matrix with A and B as above
aa<-mutualCoherence(A, 8)
bb<-mutualCoherence(A, 8)
bb<-mutualCoherence(A, 8)
[1] 1 1 1 1 1 1 1 1
[1] 0.6784574 1.2011489 1.7001046 2.1713561 2.4664608 2.7302690 2.7908302
[8] 2.9623327
[1] 0.7506222 1.3448452 1.8047043 2.1105348 2.3350516 2.4703822 2.5898766
[8] 2.6882250
We observe that mixed bases, where we define so called a CS matrix, matrix $C$ in above notation, with Fourier ($A$) and Random basis ($B$). Its mutual coherence increases slower than the pure random matrix with unit measure. This may signify a better incoherent basis.  However, since this is a "syntetic data', it does not tell any universal behaviour at all. A real data or signal measurement matrix  shall be tested for a better judgement of which bases is better to use in sparse recovery.  A recent paper in optical tomography uses this concept to identify a better sensing matrix in tomographic image reconstruction.

Friday, 25 January 2013

Tensor Algebra: Efficient Operations on Multidimensional Arrays with R

 Multidimensional arrays are ubiquitous. Any complex problem having multivariate observables would easily generate a need to represent corresponding data in multidimensional arrays. Most of the practitioners would choose to apply operations on these objects with an index notation.  For example a 3-D array would have 3 indices A[i, j, k] or A[i][j][k] depending on the syntax. However there are languages which takes array operations as their native component, for example APL. More familiar examples are matrix operations in MATLAB, Fortran 90 and alike. Of course this concept is noting more than vectorisation.

Mathematical objects called tensors can be used to represent multidimensional objects. In reality a scalar is rank 0 tensor,  so scalar is the simplest tensor. Rank of a tensor can be thought as objects spatial dimension.  Let's consider matrix multiplication, where there are two indices, tensor of rank 2. If we have two matrices $A_{ij}$ and $B_{ij}$. We can multiply them with index notation as follows using an obscure language R with a naive algorithm:

> set.seed(42)
> A <- matrix(rnorm(4), 2, 2)
> B <- matrix(rnorm(4), 2, 2)
> C <- matrix(0, 2, 2)
> for(i in 1:2) {
+  for(j in 1:2) {
+   for(k in 1:2) {
+     C[i,j] = C[i,j] + A[i, k] * B[k,j]
+    }
+   }
+ }
> C
          [,1]       [,2]
[1,]  6.413566 -2.5178879
[2,] -2.249789  0.4908884
> A %*% B
           [,1]       [,2]
[1,]  0.5156982  2.0378605
[2,] -0.2954518 -0.9134599

You should have noticed that we introduce a third index. That is called a dummy index which we actually do the sum. This approach is technically called Einstein summation rule. So we represent the multiplication as follows: $C_{ij} = \sum_{k=1}^{m} A_{ik} B_{kj}$. Normally sum is dropped for short notation. It is quite intuitive from this notation that number of columns of A should be equal to number of rows in B to perform the sum consistently.

There are software packages that can do symbolic and numerical tensor algebra . Here we demonstrate an R package called tensorA developed by Professor van den Boogaart of TU Freiberg. We can repeat the matrix multiplication by using this package as follows.

> library("tensorA")
> set.seed(42)
> At <- to.tensor(rnorm(4), c(i=2, k=2))
> Bt <- to.tensor(rnorm(4), c(k=2, j=2))
> At %e% Bt
i            [,1]       [,2]
  [1,]  0.5156982  2.0378605
  [2,] -0.2954518 -0.9134599
[1] "tensor" "matrix"

Note that here $\%e\%$ implies tensor multiplication with Einstein summation rule. This is pretty trivial example but if you imagine higher dimensional objects, tracking indices would be cumbersome so multi-linear algebra makes life easy.

In conclusion,  I think,  using tensor arithmetic for multidimensional arrays looks more compacts and efficient (~ 2-3 times).  A discussion related to this appeared in R help list.
(c) Copyright 2008-2015 Mehmet Suzen (suzen at acm dot org)

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License.