Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
technical:finite-entropy-estimate [2022/11/06 09:15] chunchungtechnical:finite-entropy-estimate [2022/11/27 05:36] (current) – [Distribution] chunchung
Line 3: Line 3:
   * Ma, "Calculation of Entropy from Data of Motion", //Journal of Statistical Physics// **26**, 221–240 (1981) [[https://doi.org/10.1007/BF01013169|URL]]. [[zotero://select/library/items/8QT36S3B|zotero]]   * Ma, "Calculation of Entropy from Data of Motion", //Journal of Statistical Physics// **26**, 221–240 (1981) [[https://doi.org/10.1007/BF01013169|URL]]. [[zotero://select/library/items/8QT36S3B|zotero]]
  
 +Consider a system with a state space of the size $|\mathbb{X}| = \Gamma$ with equal probability. The entropy is given by
 +\begin{align}
 +H & = - \sum_{x\in\mathbb{X}} \frac{1}{\Gamma} \ln\frac{1}{\Gamma} \\
 +& = \ln \Gamma .
 +\end{align}
 +The objective is to estimate $\Gamma$ by sampling the space $\mathbb{X}$. The only information we have is whether a sampled point is the same as the other: $\delta_{x_i,x_j}$.
 +
 +The consideration of Ma is that the number of repeats or the number of coincident pairs in the sequence of length $N$ is dependent on the space size $\Gamma$. If the sequence has zero correlation time and the element are drawn independently from the space, the chance of finding a pair to be coincident ($x_i=x_j$) is estimated by $P_\mathrm{c} = \Gamma^{-1}$. Since there are $N(N-1)/2$ pairs in a sequence of length $N$, the expected number of coincident pairs is given by
 +\begin{equation}
 +N_\mathrm{c} = \frac{N(N-1)}{2\Gamma}.
 +\end{equation}
 +=====Number of states with given number of repeats=====
 +From the perspective of states, we assume that there are $k_n$ states out of $\Gamma$ that occur $n$ times in the sequence of size $N$. The total number of states is given by
 +\begin{equation} \Gamma = \sum_n k_n \end{equation}
 +while the length of the sequence is by
 +\begin{equation} N = \sum_n n k_n . \end{equation}
 +Here, the value of $k_0$ is not directly known from the observed sequence and is linked to the total number of states $\Gamma$. Since each element of the sequence is drawn from the states, we have the total number of possible sequences
 +\begin{equation} N_\mathrm{seq} = \Gamma^N. \end{equation}
 +The number of of the sequences among $N_\mathrm{seq}$ that will give rise to a given combination of $\{k_n\}$ can be calculated from the product of two factors. First is the number of ways to pick the states with $n$ repeats for all $n$ values,
 +\begin{equation} \frac{\Gamma !}{\prod_n k_n!}. \end{equation}
 +Second is the number of ways for assigning the orders in the sequence to the occurrences of the states,
 +\begin{equation} \frac{N!}{\prod_n (n!)^{k_n}}. \end{equation}
 +Overall, the likelihood of getting the combination $\{k_n| n \in \mathbb{N} \}$ ($\mathbb{N}$ is the set of positive integers) is
 +\begin{equation} \frac{\Gamma! N!}{\Gamma^N \prod_n k_n! (n!)^{k_n}}. \end{equation}
 +In the expression, the number of unobserved states is $k_0 = \Gamma - \sum_{n\in\mathbb{N}}k_n$.
 +=====Distribution=====
 +Consider a given state, the probability that it gets hit by $n$ points is
 +\begin{align}
 +P_n & = C^N_n \Gamma^{-n}\left(1-\Gamma^{-1}\right)^{N-n} \\
 +& = \frac{N!}{n!(N-n)!} \Gamma^{-n}\left(1-\Gamma^{-1}\right)^{N-n} \\
 +& = \frac{1}{n!}\frac{N(N-1)\ldots(N-n+1)}{\Gamma^n}\left(1-\Gamma^{-1}\right)^{N-n} \\
 +& \approx \frac{\lambda^n}{n!}(1-\frac{\lambda}{N})^N \\
 +& \approx \frac{\lambda^n}{n!}e^{-\lambda}
 +\end{align}
 +approaching the Poisson distribution with $\lambda \equiv N/\Gamma$ at the large $N$, $\Gamma$ limit.