mia-rapport-2024/rapport/chapter4-simulations/na-robustness.tex

62 lines
2.8 KiB
TeX

\paragraph{Simulation settings} We want to compare the performance of retrieving
the nodes blocks with missing edges (that are labeled as \texttt{NA} in the
incidence matrix).
For this purpose we generate collections of networks with the following
parameters:
\begin{align*}
\bm{\pi}^m = \begin{cases}
\bm{\pi} = \left( 0.5, 0.3, 0.2 \right) & \text{for } iid\text{-}colBiSBM \\
\sigma_1^m(\bm{\pi}) & \text{for } \pi\text{-}colBiSBM \text{ and } \pi\rho\text{-}colBiSBM
\end{cases} \\
\bm{\rho}^m =
\begin{cases}
\bm{\rho} = \left( 0.5, 0.3, 0.2 \right) & \text{for } iid\text{-}colBiSBM \\
\sigma_2^m(\bm{\rho}) & \text{for } \rho\text{-}colBiSBM \text{ and } \pi\rho\text{-}colBiSBM,
\end{cases}
\end{align*}
for the block proportions, and two different structures with the corresponding
$\bm{\alpha}$,
\begin{align*}
\bm{\alpha}^{modular} = \begin{pmatrix}
0.9 & 0.05 & 0.05 \\
0.05 & 0.2 & 0.05 \\
0.05 & 0.05 & 0.8
\end{pmatrix}, &
\bm{\alpha}^{nested} = \begin{pmatrix}
0.9 & 0.25 & 0.1 \\
0.3 & 0.15 & 0.05 \\
0.1 & 0.05 & 0.05
\end{pmatrix},
\end{align*}
where $\bm{\alpha}^{modular}$ represents networks where there are look-a-like
communities, which tends to interact preferentially within the community and
less with the other communities. And $\bm{\alpha}^{nested}$ represents a common
structure detected in ecology with generalist and specialist species and a
\enquote{nested} structure.
The collections contain two networks of size $n^{m=1}_1 = n^{m=1}_2 = 40$ and
$n^{m=2}_1 = n^{m=2}_2 = 120$. One collection is generated for each $colBiSBM$
model. And the nodes block memberships (i.e., the row and column blocks they
belong to) are saved.
In the network $m=1$ (i.e., the smaller one) a proportion of the edges
$p_{\texttt{NA}}$ see their values replaced by \texttt{NA}s, the
\enquote{forgotten} values are stored.
\paragraph{Test procedure} A LBM is fitted on the first network, and the
predicted block memberships are saved, along with the predicted links using the
inferred parameters. This will serve as a baseline to see if the use of the
collection benefits the predictions.
A $colBiSBM$ model is then fitted (with a model matching the dataset considered)
and we store the same predictions.
\paragraph{Quality metrics} To benchmark the performance we use the
\emph{Area Under the Curve} (AUC) for predicted versus real link values and the
ARI for predicted versus real block memberships.
\paragraph{Results}