99 lines
No EOL
4.5 KiB
TeX
99 lines
No EOL
4.5 KiB
TeX
\paragraph{Simulation settings} We want to compare the performance of retrieving
|
|
the nodes blocks with missing edges (that are labeled as \texttt{NA} in the
|
|
incidence matrix).
|
|
|
|
For this purpose we generate collections of networks with the following
|
|
parameters:
|
|
\begin{align*}
|
|
\bm{\pi}^m = \begin{cases}
|
|
\bm{\pi} = \left( 0.5, 0.3, 0.2 \right) & \text{for } iid\text{-colBiSBM} \\
|
|
\sigma_1^m(\bm{\pi}) & \text{for } \pi\text{-colBiSBM} \text{ and } \pi\rho\text{-colBiSBM}
|
|
\end{cases} \\
|
|
\bm{\rho}^m =
|
|
\begin{cases}
|
|
\bm{\rho} = \left( 0.5, 0.3, 0.2 \right) & \text{for } iid\text{-colBiSBM} \\
|
|
\sigma_2^m(\bm{\rho}) & \text{for } \rho\text{-colBiSBM} \text{ and } \pi\rho\text{-colBiSBM},
|
|
\end{cases}
|
|
\end{align*}
|
|
for the block proportions, and two different structures with the corresponding
|
|
$\bm{\alpha}$,
|
|
\begin{align*}
|
|
\bm{\alpha}^{modular} = \begin{pmatrix}
|
|
0.9 & 0.05 & 0.05 \\
|
|
0.05 & 0.2 & 0.05 \\
|
|
0.05 & 0.05 & 0.8
|
|
\end{pmatrix}, &
|
|
\bm{\alpha}^{nested} = \begin{pmatrix}
|
|
0.9 & 0.25 & 0.1 \\
|
|
0.3 & 0.15 & 0.05 \\
|
|
0.1 & 0.05 & 0.05
|
|
\end{pmatrix},
|
|
\end{align*}
|
|
|
|
where $\bm{\alpha}^{modular}$ represents networks where there are look-a-like
|
|
communities, which tends to interact preferentially within the community and
|
|
less with the other communities. And $\bm{\alpha}^{nested}$ represents a common
|
|
structure detected in ecology with generalist and specialist species and a
|
|
\enquote{nested} structure.
|
|
|
|
The collections contain two networks ($M=2$) of size $n^{m=1}_1 =
|
|
n^{m=1}_2 = 40$ and
|
|
$n^{m=2}_1 = n^{m=2}_2 = 120$. One collection is generated for each colBiSBM
|
|
model. And the nodes block memberships (i.e., the row and column blocks they
|
|
belong to) are saved.
|
|
|
|
Per colBiSBM model, 10 collections are generated and their results are
|
|
averaged.
|
|
|
|
In the network $m=1$ (i.e., the smaller one) a proportion of the edges
|
|
$p_{\texttt{NA}}$ see their values replaced by \texttt{NA}s, the
|
|
\enquote{forgotten} values are stored.
|
|
|
|
\paragraph{Test procedure} A LBM is fitted on the first network, and the
|
|
predicted block memberships are saved, along with the predicted links using the
|
|
inferred parameters. This will serve as a baseline to see if the use of the
|
|
collection benefits the predictions.
|
|
|
|
A colBiSBM model is then fitted (with a model matching the dataset considered)
|
|
and we store the same predictions.
|
|
|
|
\paragraph{Quality metrics} To benchmark the performance we use the
|
|
\emph{Area Under the Curve} (AUC) for predicted versus real link values and the
|
|
ARI for predicted versus real block memberships.
|
|
|
|
For the comparison we subtract the metric given by the LBM to the one
|
|
given by colBiSBM and denote it $\Delta\mbox{metric}$.
|
|
|
|
\begin{figure}[ht]
|
|
\centering
|
|
\input{../tikz/simulations/na_robustness/auc-model}
|
|
\caption{$\Delta\mbox{AUC}$ in function of $p_{\texttt{NA}}$. The dashed red
|
|
lines indicate the value 0 for which
|
|
$\mbox{AUC}_{LBM} = \mbox{AUC}_{colBiSBM}$}
|
|
\label{fig:auc-plot}
|
|
\end{figure}
|
|
|
|
\begin{figure}[ht]
|
|
\centering
|
|
\input{../tikz/simulations/na_robustness/ari-dim-model}
|
|
\caption{$\Delta\mbox{ARI}$ in function of $p_{\texttt{NA}}$. The dashed red
|
|
lines indicate the value 0 for which
|
|
$\mbox{ARI}_{LBM} = \mbox{ARI}_{colBiSBM}$}
|
|
\label{fig:ari-dim-plot-na}
|
|
\end{figure}
|
|
|
|
|
|
\paragraph{Results}
|
|
On figure~\ref{fig:auc-plot} one can see that overall the nested structure seems
|
|
to be the one benefitting most from the collection model having generally
|
|
slightly higher $\Delta$AUC than the modular one.
|
|
But in general it seems that for $\epsilon\in[0.1,0.7]$ there are no clear
|
|
differences between LBM and colBiSBM regarding link prediction. For $\epsilon
|
|
\in[0.8,0.9]$ this is where the collection model seems to be most effective.
|
|
|
|
For the ARI, figure~\ref{fig:ari-dim-plot-na} suggests that collection model
|
|
does at least as well as LBM and improves nodes memberships recovery for modular
|
|
structure starting from $\epsilon = 0.7$. Again, nested structure benefits
|
|
of collection model for smaller $\epsilon$ values but those increase in
|
|
$\Delta$ARI are also smaller than what can be observed for modular structure.
|
|
\clearpage |