User Tools

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
technical:finite-entropy-estimate [2022/11/19 04:24] chunchungtechnical:finite-entropy-estimate [2022/11/27 05:36] (current) – [Distribution] chunchung
Line 10: Line 10:
 The objective is to estimate $\Gamma$ by sampling the space $\mathbb{X}$. The only information we have is whether a sampled point is the same as the other: $\delta_{x_i,x_j}$. The objective is to estimate $\Gamma$ by sampling the space $\mathbb{X}$. The only information we have is whether a sampled point is the same as the other: $\delta_{x_i,x_j}$.
  
-The consideration of Ma is that the number of repeats or the number of coincident pairs in the sequence of length $N$ is dependent on the space size $\Gamma$. If the sequence has zero correlation time and the element are drawn independently from the space, the chance of finding a pair to be coincident ($x_i=x_j$) is estimated by $P_\mathrm{c} = \Gamma^{-1}$. Since there are $N(N-1)/2$ pairs in a sequence of length $N$, the expected number of coincident pairs is estimated by+The consideration of Ma is that the number of repeats or the number of coincident pairs in the sequence of length $N$ is dependent on the space size $\Gamma$. If the sequence has zero correlation time and the element are drawn independently from the space, the chance of finding a pair to be coincident ($x_i=x_j$) is estimated by $P_\mathrm{c} = \Gamma^{-1}$. Since there are $N(N-1)/2$ pairs in a sequence of length $N$, the expected number of coincident pairs is given by
 \begin{equation} \begin{equation}
-N_\mathrm{c} = \frac{N(N-1)}{2\Gamma}+N_\mathrm{c} = \frac{N(N-1)}{2\Gamma}.
 \end{equation} \end{equation}
 +=====Number of states with given number of repeats===== 
 +From the perspective of states, we assume that there are $k_n$ states out of $\Gamma$ that occur $n$ times in the sequence of size $N$. The total number of states is given by 
 +\begin{equation} \Gamma = \sum_n k_n \end{equation} 
 +while the length of the sequence is by 
 +\begin{equation} N = \sum_n n k_n . \end{equation} 
 +Here, the value of $k_0$ is not directly known from the observed sequence and is linked to the total number of states $\Gamma$. Since each element of the sequence is drawn from the states, we have the total number of possible sequences 
 +\begin{equation} N_\mathrm{seq} = \Gamma^N. \end{equation} 
 +The number of of the sequences among $N_\mathrm{seq}$ that will give rise to a given combination of $\{k_n\}$ can be calculated from the product of two factors. First is the number of ways to pick the states with $n$ repeats for all $n$ values, 
 +\begin{equation} \frac{\Gamma !}{\prod_n k_n!}. \end{equation} 
 +Second is the number of ways for assigning the orders in the sequence to the occurrences of the states, 
 +\begin{equation} \frac{N!}{\prod_n (n!)^{k_n}}. \end{equation} 
 +Overall, the likelihood of getting the combination $\{k_n| n \in \mathbb{N} \}$ ($\mathbb{N}$ is the set of positive integers) is 
 +\begin{equation} \frac{\Gamma! N!}{\Gamma^N \prod_n k_n! (n!)^{k_n}}. \end{equation} 
 +In the expression, the number of unobserved states is $k_0 = \Gamma - \sum_{n\in\mathbb{N}}k_n$. 
 +=====Distribution===== 
 +Consider a given state, the probability that it gets hit by $n$ points is 
 +\begin{align} 
 +P_n & = C^N_n \Gamma^{-n}\left(1-\Gamma^{-1}\right)^{N-n} \\ 
 +& = \frac{N!}{n!(N-n)!} \Gamma^{-n}\left(1-\Gamma^{-1}\right)^{N-n} \\ 
 +& = \frac{1}{n!}\frac{N(N-1)\ldots(N-n+1)}{\Gamma^n}\left(1-\Gamma^{-1}\right)^{N-n} \\ 
 +& \approx \frac{\lambda^n}{n!}(1-\frac{\lambda}{N})^N \\ 
 +& \approx \frac{\lambda^n}{n!}e^{-\lambda} 
 +\end{align} 
 +approaching the Poisson distribution with $\lambda \equiv N/\Gamma$ at the large $N$, $\Gamma$ limit.

This website uses cookies. By using the website, you agree with storing cookies on your computer. Also, you acknowledge that you have read and understand our Privacy Policy. If you do not agree, please leave the website.

More information