rapport-CEI-MIA-2023/Rcodes/real_data/CoOPLBM_completion_analyze.tex

135 lines
4.3 KiB
TeX

\subsection{Completing raw data using CoOPLBM~\parencite{anakokDisentanglingStructureEcological2022}}
\hypertarget{context-of-this-analysis}{%
\subsubsection{Context of this
analysis}\label{context-of-this-analysis}}
After performing a netclustering on the raw data, we will see if the
detect structure resulting in the clustering comes from the sampling
effort. To test this we will use the CoOPLBM model
by\textasciitilde{}\cite{anakokDisentanglingStructureEcological2022} to
complete the data.
\emph{Note:}\textasciitilde{}\cite{anakokDisentanglingStructureEcological2022}
provided data for the networks for which the method was applicable, this
explains that there are fewer networks in the collections.
The CoOPLBM model assumes that the observed incidence matrix \(R\) is an
element-wise product of an \(M\) matrix following an LBM and an \(N\)
matrix which elements follow Poisson distributions independent on \(M\).
The model gives us the \(\widehat{M}\) matrix, the elements of which
are:
\[\widehat{M_{ij}} = \mathbb{P}(M_{ij} = 1)\]
Note that if \(R_{ij} = 1\) then \(\widehat{M_{ij}} = 1\)
\begin{itemize}
\tightlist
\item
1 if the interaction was observed
\item
a probability, that there should be an interaction but it wasn't
observed
\end{itemize}
This \emph{completed matrix} can be used in different manners to be fed
to the colSBM model.
\hypertarget{threshold-based-completions}{%
\subsubsection{Threshold based
completions}\label{threshold-based-completions}}
With the thresholds, the infered incidence matrix obtained by CoOPLBM is
used to generate a completed incidence matrix by the following procedure
: \[X_{ij} = \begin{cases}
1 & \text{if the value is over the threshold} \\
0 & \text{else} \\
\end{cases}\]
\hypertarget{completed-threshold}{%
\paragraph{0.5 completed threshold}\label{completed-threshold}}
Here, the completion threshold is set to \(0.5\).
First we will compute an ARI on the collection id given by the raw data
and the completed matrix.
\begin{table}[h!]
\caption{\label{tab:0.5_ARI}\label{tab:ari-table-0-5-completed} Table of ARI between 0.5 completed data and uncompleted data}
\centering
\begin{tabular}[t]{lr}
\toprule
& ARI with uncompleted data\\
\midrule
$iid\text{-}colSBM$ & 0.11\\
$\pi\text{-}colSBM$ & 0.03\\
$\rho\text{-}colSBM$ & 0.09\\
$\pi\rho\text{-}colSBM$ & 0.22\\
\bottomrule
\end{tabular}
\end{table}
In the table \ref{tab:ari-table-0-5-completed}, one can see the network
clustering obtained after applying CoOPLBM has not much in common with
the clustering of the uncompleted data. Thus we can think that the
completion changed significantly the interactions in the collections.
\hypertarget{number-of-sub-collections-and-details-of-each-sub-collection}{%
\subparagraph{Number of sub-collections and details of each
sub-collection}\label{number-of-sub-collections-and-details-of-each-sub-collection}}
\hypertarget{supplementary-information}{%
\subparagraph{Supplementary
information}\label{supplementary-information}}
\hypertarget{completed-threshold-1}{%
\subsubsection{0.2 completed threshold}\label{completed-threshold-1}}
The \(0.2\) threshold adds a lot of interactions compared to raw matrix.
\begin{table}[h!]
\caption{\label{tab:0.2_ARI}\label{tab:ari-table-0-2-completed} Table of ARI between 0.2 completed data and uncompleted data}
\centering
\begin{tabular}[t]{lr}
\toprule
& ARI with uncompleted data\\
\midrule
$iid\text{-}colSBM$ & 0.04\\
$\pi\text{-}colSBM$ & 0.03\\
$\rho\text{-}colSBM$ & 0.02\\
$\pi\rho\text{-}colSBM$ & 0.04\\
\bottomrule
\end{tabular}
\end{table}
Same as for \(0.5\), after applying CoOPLBM the obtained clustering
doesn't match the uncompleted data.
\hypertarget{sample-based-completions}{%
\subsubsection{Sample based
completions}\label{sample-based-completions}}
The \(M\) matrix is used to sample a new \(X\) matrix which elements are
the realisation of Bernoulli distributions of probability \(M_{i,j}\).
\[\mathbb{P}(X_{i,j} = 1) = M_{i,j} \]
\begin{table}[h!]
\caption{\label{tab:random_ARI}\label{tab:ari-table-random-completed} Table of ARI between
randomly completed data and uncompleted data}
\centering
\begin{tabular}[t]{lr}
\toprule
& ARI with uncompleted data\\
\midrule
$iid\text{-}colSBM$ & 0.01\\
$\pi\text{-}colSBM$ & 0.03\\
$\rho\text{-}colSBM$ & 0.01\\
$\pi\rho\text{-}colSBM$ & 0.02\\
\bottomrule
\end{tabular}
\end{table}