mia-rapport-2024/rapport/chapter4-simulations/na-robustness.tex

\paragraph{Simulation settings} We want to compare the performance of retrieving
the nodes blocks with missing edges (that are labeled as \texttt{NA} in the
incidence matrix).

For this purpose we generate collections of networks with the following
parameters:
\begin{align*}
    \bm{\pi}^m = \begin{cases}
                     \bm{\pi} = \left( 0.5, 0.3, 0.2 \right) & \text{for } iid\text{-colBiSBM}                                      \\
                     \sigma_1^m(\bm{\pi})                    & \text{for } \pi\text{-colBiSBM} \text{ and } \pi\rho\text{-colBiSBM}
                 \end{cases} \\
    \bm{\rho}^m =
    \begin{cases}
        \bm{\rho}  = \left( 0.5, 0.3, 0.2 \right) & \text{for } iid\text{-colBiSBM}                                        \\
        \sigma_2^m(\bm{\rho})                     & \text{for } \rho\text{-colBiSBM} \text{ and } \pi\rho\text{-colBiSBM},
    \end{cases}
\end{align*}
for the block proportions, and two different structures with the corresponding
$\bm{\alpha}$,
\begin{align*}
    \bm{\alpha}^{modular} = \begin{pmatrix}
                                0.9  & 0.05 & 0.05 \\
                                0.05 & 0.2  & 0.05 \\
                                0.05 & 0.05 & 0.8
                            \end{pmatrix}, &
    \bm{\alpha}^{nested} = \begin{pmatrix}
                               0.9 & 0.25 & 0.1  \\
                               0.3 & 0.15 & 0.05 \\
                               0.1 & 0.05 & 0.05
                           \end{pmatrix},
\end{align*}

where $\bm{\alpha}^{modular}$ represents networks where there are look-a-like
communities, which tends to interact preferentially within the community and
less with the other communities. And $\bm{\alpha}^{nested}$ represents a common
structure detected in ecology with generalist and specialist species and a
\enquote{nested} structure.

The collections contain two networks ($M=2$) of size $n^{m=1}_1 =
    n^{m=1}_2 = 40$ and
$n^{m=2}_1 = n^{m=2}_2 = 120$. One collection is generated for each colBiSBM
model. And the nodes block memberships (i.e., the row and column blocks they
belong to) are saved.

Per colBiSBM model, 10 collections are generated and their results are
averaged.

In the network $m=1$ (i.e., the smaller one) a proportion of the edges
$p_{\texttt{NA}}$ see their values replaced by \texttt{NA}s, the
\enquote{forgotten} values are stored.

\paragraph{Test procedure} A LBM is fitted on the first network, and the
predicted block memberships are saved, along with the predicted links using the
inferred parameters. This will serve as a baseline to see if the use of the
collection benefits the predictions.

A colBiSBM model is then fitted (with a model matching the dataset considered)
and we store the same predictions.

\paragraph{Quality metrics} To benchmark the performance we use the
\emph{Area Under the Curve} (AUC) for predicted versus real link values and the
ARI for predicted versus real block memberships.

\begin{figure}[ht]
    \centering
    \includestandalone{tikz/simulations/na_robustness/auc-model}
    \caption{}
    \label{fig:auc-plot}
\end{figure}

\begin{figure}[ht]
    \centering
    \includestandalone{tikz/simulations/na_robustness/ari-dim-model}
    \caption{}
    \label{fig:ari-dim-plot-na}
\end{figure}


\paragraph{Results}
Figures~\ref{fig:auc-plot} and~\ref{fig:ari-dim-plot-na} show a
box plot named \enquote{sep-$model$} that
corresponds to the results given by a LBM fitted on data generated with the
corresponding \emph{model}. These sep box plots are there to serve as a baseline
to compare model by model.