This is an old revision of the document!


Estimate entropy of a finite discrete system using limited samples

Following idea of:

  • Ma, “Calculation of Entropy from Data of Motion”, Journal of Statistical Physics 26, 221–240 (1981) URL. zotero

Consider a system with a state space of the size $|\mathbb{X}| = \Gamma$ with equal probability. The entropy is given by \begin{align} H & = - \sum_{x\in\mathbb{X}} \frac{1}{\Gamma} \ln\frac{1}{\Gamma} \\ & = \ln \Gamma . \end{align} The objective is to estimate $\Gamma$ by sampling the space $\mathbb{X}$. The only information we have is whether a sampled point is the same as the other: $\delta_{x_i,x_j}$.

The consideration of Ma is that the number of repeats or the number of coincident pairs in the sequence of length $N$ is dependent on the space size $\Gamma$. If the sequence has zero correlation time and the element are drawn independently from the space, the chance of finding a pair to be coincident ($x_i=x_j$) is estimated by $P_\mathrm{c} = \Gamma^{-1}$. Since there are $N(N-1)/2$ pairs in a sequence of length $N$, the expected number of coincident pairs is given by \begin{equation} N_\mathrm{c} = \frac{N(N-1)}{2\Gamma}. \end{equation}

From the perspective of states, we assume that there are $k_n$ states out of $\Gamma$ that occur $n$ times in the sequence of size $N$. The total number of states is given by \begin{equation} \Gamma = \sum_n k_n \end{equation} while the length of the sequence is by \begin{equation} N = \sum_n n k_n . \end{equation} Since each element of the sequence is draw from the states, we have the total number of possible sequences \begin{equation} N_\mathrm{seq} = \Gamma^N. \end{equation} The number of of the sequences among $N_\mathrm{seq}$ that will give rise to a given combination of $\{k_n\}$ can be calculated from the product of two factors. First is the number of ways to pick the states with $n$ repeats for all $n$ values, \begin{equation} \frac{\Gamma !}{\prod_n k_n!}. \end{equation} Second is the number of ways assigned the orders in the sequence to the occurrences of the states, \begin{equation} \frac{N!}{\prod_n (n!)^{k_n}}. \end{equation}