Context of this analysis

After performing a netclustering on the raw data, we will see if the detect structure resulting in the clustering comes from the sampling effort. To test this we will use the CoOPLBM model by Anakok et al. (2022) to complete the data.

The CoOPLBM model assumes that the observed incidence matrix \(R\) is an element-wise product of an \(M\) matrix following an LBM and an \(N\) matrix which elements follow Poisson distributions independent on \(M\).

The model gives us the \(\widehat{M}\) matrix, the elements of which are:

\[\widehat{M_{ij}} = \mathbb{P}(M_{ij} = 1)\]

Note that if \(R_{ij} = 1\) then \(\widehat{M_{ij}} = 1\)

This completed matrix can be used in different manners to be fed to the colSBM model.

Threshold based completions

With the thresholds, the infered incidence matrix obtained by CoOPLBM is used to generate a completed incidence matrix by the following procedure : \[X_{ij} = \begin{cases} 1 & \text{if the value is over the threshold} \\ 0 & \text{else} \\ \end{cases}\]

0.5 completed threshold

Here, the completion threshold is set to \(0.5\).

First we will compute an ARI on the collection id given by the raw data and the completed matrix.

ARI with uncompleted data
iid 0.1142823
pi 0.0263660
rho 0.0933340
pirho 0.2158747

0.2 completed threshold

The \(0.2\) threshold adds a lot of interactions compared to raw matrix.

ARI with uncompleted data
iid 0.0429465
pi 0.0330057
rho 0.0187305
pirho 0.0357728

Sample based completions

The \(M\) matrix is used to sample a new \(X\) matrix which elements are the realisation of Bernoulli distributions of probability \(M_{i,j}\). \[\mathbb{P}(X_{i,j} = 1) = M_{i,j} \]

ARI with uncompleted data
iid 0.0148172
pi 0.0265793
rho 0.0051536
pirho 0.0152299