110 lines
3.3 KiB
TeX
110 lines
3.3 KiB
TeX
\hypertarget{context-of-this-analysis}{%
|
|
\subsubsection{Context of this
|
|
analysis}\label{context-of-this-analysis}}
|
|
|
|
After performing a netclustering on the raw data, we will see if the
|
|
detect structure resulting in the clustering comes from the sampling
|
|
effort. To test this we will use the CoOPLBM model by
|
|
\cite{anakokDisentanglingStructureEcological2022} to complete the data.
|
|
|
|
The CoOPLBM model assumes that the observed incidence matrix \(R\) is an
|
|
element-wise product of an \(M\) matrix following an LBM and an \(N\)
|
|
matrix which elements follow Poisson distributions independent on \(M\).
|
|
|
|
The model gives us the \(\widehat{M}\) matrix, the elements of which
|
|
are:
|
|
|
|
\[\widehat{M_{ij}} = \mathbb{P}(M_{ij} = 1)\]
|
|
|
|
Note that if \(R_{ij} = 1\) then \(\widehat{M_{ij}} = 1\)
|
|
|
|
\begin{itemize}
|
|
\tightlist
|
|
\item
|
|
1 if the interaction was observed
|
|
\item
|
|
a probability, that there should be an interaction but it wasn't
|
|
observed
|
|
\end{itemize}
|
|
|
|
This \emph{completed matrix} can be used in different manners to be fed
|
|
to the colSBM model.
|
|
|
|
\hypertarget{threshold-based-completions}{%
|
|
\subsubsection{Threshold based
|
|
completions}\label{threshold-based-completions}}
|
|
|
|
With the thresholds, the infered incidence matrix obtained by CoOPLBM is
|
|
used to generate a completed incidence matrix by the following procedure
|
|
: \[X_{ij} = \begin{cases}
|
|
1 & \text{if the value is over the threshold} \\
|
|
0 & \text{else} \\
|
|
\end{cases}\]
|
|
|
|
\hypertarget{completed-threshold}{%
|
|
\paragraph{0.5 completed threshold}\label{completed-threshold}}
|
|
|
|
Here, the completion threshold is set to \(0.5\).
|
|
|
|
First we will compute an ARI on the collection id given by the raw data
|
|
and the completed matrix.
|
|
|
|
\begin{longtable}[]{@{}lr@{}}
|
|
\toprule
|
|
& ARI with uncompleted data\tabularnewline
|
|
\midrule
|
|
\endhead
|
|
iid & 0.1142823\tabularnewline
|
|
pi & 0.0263660\tabularnewline
|
|
rho & 0.0933340\tabularnewline
|
|
pirho & 0.2158747\tabularnewline
|
|
\bottomrule
|
|
\end{longtable}
|
|
|
|
In the above table, one can see the network clustering obtained after
|
|
applying CoOPLBM has not much in common with the clustering of the
|
|
uncompleted data.
|
|
|
|
\hypertarget{number-of-sub-collections-and-details-of-each-sub-collection}{%
|
|
\subparagraph{Number of sub-collections and details of each
|
|
sub-collection}\label{number-of-sub-collections-and-details-of-each-sub-collection}}
|
|
|
|
\hypertarget{completed-threshold-1}{%
|
|
\subsubsection{0.2 completed threshold}\label{completed-threshold-1}}
|
|
|
|
The \(0.2\) threshold adds a lot of interactions compared to raw matrix.
|
|
|
|
\begin{longtable}[]{@{}lr@{}}
|
|
\toprule
|
|
& ARI with uncompleted data\tabularnewline
|
|
\midrule
|
|
\endhead
|
|
iid & 0.0429465\tabularnewline
|
|
pi & 0.0330057\tabularnewline
|
|
rho & 0.0187305\tabularnewline
|
|
pirho & 0.0357728\tabularnewline
|
|
\bottomrule
|
|
\end{longtable}
|
|
|
|
Same as for \(0.5\), after applying CoOPLBM the obtained clustering
|
|
doesn't match the uncompleted data.
|
|
|
|
\hypertarget{sample-based-completions}{%
|
|
\subsubsection{Sample based
|
|
completions}\label{sample-based-completions}}
|
|
|
|
The \(M\) matrix is used to sample a new \(X\) matrix which elements are
|
|
the realisation of Bernoulli distributions of probability \(M_{i,j}\).
|
|
\[\mathbb{P}(X_{i,j} = 1) = M_{i,j} \]
|
|
|
|
\begin{longtable}[]{@{}lr@{}}
|
|
\toprule
|
|
& ARI with uncompleted data\tabularnewline
|
|
\midrule
|
|
\endhead
|
|
iid & 0.0148172\tabularnewline
|
|
pi & 0.0265793\tabularnewline
|
|
rho & 0.0051536\tabularnewline
|
|
pirho & 0.0152299\tabularnewline
|
|
\bottomrule
|
|
\end{longtable}
|