135 lines
4.3 KiB
TeX
135 lines
4.3 KiB
TeX
\subsection{Completing raw data using CoOPLBM~\parencite{anakokDisentanglingStructureEcological2022}}
|
|
|
|
\hypertarget{context-of-this-analysis}{%
|
|
\subsubsection{Context of this
|
|
analysis}\label{context-of-this-analysis}}
|
|
|
|
After performing a netclustering on the raw data, we will see if the
|
|
detect structure resulting in the clustering comes from the sampling
|
|
effort. To test this we will use the CoOPLBM model
|
|
by\textasciitilde{}\cite{anakokDisentanglingStructureEcological2022} to
|
|
complete the data.
|
|
|
|
\emph{Note:}\textasciitilde{}\cite{anakokDisentanglingStructureEcological2022}
|
|
provided data for the networks for which the method was applicable, this
|
|
explains that there are fewer networks in the collections.
|
|
|
|
The CoOPLBM model assumes that the observed incidence matrix \(R\) is an
|
|
element-wise product of an \(M\) matrix following an LBM and an \(N\)
|
|
matrix which elements follow Poisson distributions independent on \(M\).
|
|
|
|
The model gives us the \(\widehat{M}\) matrix, the elements of which
|
|
are:
|
|
|
|
\[\widehat{M_{ij}} = \mathbb{P}(M_{ij} = 1)\]
|
|
|
|
Note that if \(R_{ij} = 1\) then \(\widehat{M_{ij}} = 1\)
|
|
|
|
\begin{itemize}
|
|
\tightlist
|
|
\item
|
|
1 if the interaction was observed
|
|
\item
|
|
a probability, that there should be an interaction but it wasn't
|
|
observed
|
|
\end{itemize}
|
|
|
|
This \emph{completed matrix} can be used in different manners to be fed
|
|
to the colSBM model.
|
|
|
|
\hypertarget{threshold-based-completions}{%
|
|
\subsubsection{Threshold based
|
|
completions}\label{threshold-based-completions}}
|
|
|
|
With the thresholds, the infered incidence matrix obtained by CoOPLBM is
|
|
used to generate a completed incidence matrix by the following procedure
|
|
: \[X_{ij} = \begin{cases}
|
|
1 & \text{if the value is over the threshold} \\
|
|
0 & \text{else} \\
|
|
\end{cases}\]
|
|
|
|
\hypertarget{completed-threshold}{%
|
|
\paragraph{0.5 completed threshold}\label{completed-threshold}}
|
|
|
|
Here, the completion threshold is set to \(0.5\).
|
|
|
|
First we will compute an ARI on the collection id given by the raw data
|
|
and the completed matrix.
|
|
|
|
\begin{table}[h!]
|
|
|
|
\caption{\label{tab:0.5_ARI}\label{tab:ari-table-0-5-completed} Table of ARI between 0.5 completed data and uncompleted data}
|
|
\centering
|
|
\begin{tabular}[t]{lr}
|
|
\toprule
|
|
& ARI with uncompleted data\\
|
|
\midrule
|
|
$iid\text{-}colSBM$ & 0.11\\
|
|
$\pi\text{-}colSBM$ & 0.03\\
|
|
$\rho\text{-}colSBM$ & 0.09\\
|
|
$\pi\rho\text{-}colSBM$ & 0.22\\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
In the table \ref{tab:ari-table-0-5-completed}, one can see the network
|
|
clustering obtained after applying CoOPLBM has not much in common with
|
|
the clustering of the uncompleted data. Thus we can think that the
|
|
completion changed significantly the interactions in the collections.
|
|
|
|
\hypertarget{number-of-sub-collections-and-details-of-each-sub-collection}{%
|
|
\subparagraph{Number of sub-collections and details of each
|
|
sub-collection}\label{number-of-sub-collections-and-details-of-each-sub-collection}}
|
|
|
|
\hypertarget{supplementary-information}{%
|
|
\subparagraph{Supplementary
|
|
information}\label{supplementary-information}}
|
|
|
|
\hypertarget{completed-threshold-1}{%
|
|
\subsubsection{0.2 completed threshold}\label{completed-threshold-1}}
|
|
|
|
The \(0.2\) threshold adds a lot of interactions compared to raw matrix.
|
|
|
|
\begin{table}[h!]
|
|
|
|
\caption{\label{tab:0.2_ARI}\label{tab:ari-table-0-2-completed} Table of ARI between 0.2 completed data and uncompleted data}
|
|
\centering
|
|
\begin{tabular}[t]{lr}
|
|
\toprule
|
|
& ARI with uncompleted data\\
|
|
\midrule
|
|
$iid\text{-}colSBM$ & 0.04\\
|
|
$\pi\text{-}colSBM$ & 0.03\\
|
|
$\rho\text{-}colSBM$ & 0.02\\
|
|
$\pi\rho\text{-}colSBM$ & 0.04\\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
Same as for \(0.5\), after applying CoOPLBM the obtained clustering
|
|
doesn't match the uncompleted data.
|
|
|
|
\hypertarget{sample-based-completions}{%
|
|
\subsubsection{Sample based
|
|
completions}\label{sample-based-completions}}
|
|
|
|
The \(M\) matrix is used to sample a new \(X\) matrix which elements are
|
|
the realisation of Bernoulli distributions of probability \(M_{i,j}\).
|
|
\[\mathbb{P}(X_{i,j} = 1) = M_{i,j} \]
|
|
|
|
\begin{table}[h!]
|
|
|
|
\caption{\label{tab:random_ARI}\label{tab:ari-table-random-completed} Table of ARI between
|
|
randomly completed data and uncompleted data}
|
|
\centering
|
|
\begin{tabular}[t]{lr}
|
|
\toprule
|
|
& ARI with uncompleted data\\
|
|
\midrule
|
|
$iid\text{-}colSBM$ & 0.01\\
|
|
$\pi\text{-}colSBM$ & 0.03\\
|
|
$\rho\text{-}colSBM$ & 0.01\\
|
|
$\pi\rho\text{-}colSBM$ & 0.02\\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\end{table}
|