User Tools

Estimate entropy of a finite discrete system using limited samples

Following idea of:

  • Ma, “Calculation of Entropy from Data of Motion”, Journal of Statistical Physics 26, 221–240 (1981) URL. zotero

Consider a system with a state space of the size $|\mathbb{X}| = \Gamma$ with equal probability. The entropy is given by \begin{align} H & = - \sum_{x\in\mathbb{X}} \frac{1}{\Gamma} \ln\frac{1}{\Gamma} \\ & = \ln \Gamma . \end{align} The objective is to estimate $\Gamma$ by sampling the space $\mathbb{X}$. The only information we have is whether a sampled point is the same as the other: $\delta_{x_i,x_j}$.

The consideration of Ma is that the number of repeats or the number of coincident pairs in the sequence of length $N$ is dependent on the space size $\Gamma$. If the sequence has zero correlation time and the element are drawn independently from the space, the chance of finding a pair to be coincident ($x_i=x_j$) is estimated by $P_\mathrm{c} = \Gamma^{-1}$. Since there are $N(N-1)/2$ pairs in a sequence of length $N$, the expected number of coincident pairs is given by \begin{equation} N_\mathrm{c} = \frac{N(N-1)}{2\Gamma}. \end{equation}

Number of states with given number of repeats

From the perspective of states, we assume that there are $k_n$ states out of $\Gamma$ that occur $n$ times in the sequence of size $N$. The total number of states is given by \begin{equation} \Gamma = \sum_n k_n \end{equation} while the length of the sequence is by \begin{equation} N = \sum_n n k_n . \end{equation} Here, the value of $k_0$ is not directly known from the observed sequence and is linked to the total number of states $\Gamma$. Since each element of the sequence is drawn from the states, we have the total number of possible sequences \begin{equation} N_\mathrm{seq} = \Gamma^N. \end{equation} The number of of the sequences among $N_\mathrm{seq}$ that will give rise to a given combination of $\{k_n\}$ can be calculated from the product of two factors. First is the number of ways to pick the states with $n$ repeats for all $n$ values, \begin{equation} \frac{\Gamma !}{\prod_n k_n!}. \end{equation} Second is the number of ways for assigning the orders in the sequence to the occurrences of the states, \begin{equation} \frac{N!}{\prod_n (n!)^{k_n}}. \end{equation} Overall, the likelihood of getting the combination $\{k_n| n \in \mathbb{N} \}$ ($\mathbb{N}$ is the set of positive integers) is \begin{equation} \frac{\Gamma! N!}{\Gamma^N \prod_n k_n! (n!)^{k_n}}. \end{equation} In the expression, the number of unobserved states is $k_0 = \Gamma - \sum_{n\in\mathbb{N}}k_n$.

Distribution

Consider a given state, the probability that it gets hit by $n$ points is \begin{align} P_n & = C^N_n \Gamma^{-n}\left(1-\Gamma^{-1}\right)^{N-n} \\ & = \frac{N!}{n!(N-n)!} \Gamma^{-n}\left(1-\Gamma^{-1}\right)^{N-n} \\ & = \frac{1}{n!}\frac{N(N-1)\ldots(N-n+1)}{\Gamma^n}\left(1-\Gamma^{-1}\right)^{N-n} \\ & \approx \frac{\lambda^n}{n!}(1-\frac{\lambda}{N})^N \\ & \approx \frac{\lambda^n}{n!}e^{-\lambda} \end{align} approaching the Poisson distribution with $\lambda \equiv N/\Gamma$ at the large $N$, $\Gamma$ limit.

This website uses cookies. By using the website, you agree with storing cookies on your computer. Also, you acknowledge that you have read and understand our Privacy Policy. If you do not agree, please leave the website.

More information