109 lines
No EOL
5.1 KiB
TeX
109 lines
No EOL
5.1 KiB
TeX
\paragraph{Simulation settings} We want to compare the performance of retrieving
|
|
the nodes blocks with missing edges (that are labeled as \texttt{NA} in the
|
|
incidence matrix).
|
|
|
|
For this purpose we generate collections of networks with the following
|
|
parameters:
|
|
\begin{align*}
|
|
\bm{\pi}^m = \begin{cases}
|
|
\bm{\pi} = \left( 0.5, 0.3, 0.2 \right) & \text{for } iid\text{-colBiSBM} \\
|
|
\sigma_1^m(\bm{\pi}) & \text{for } \pi\text{-colBiSBM} \text{ and } \pi\rho\text{-colBiSBM}
|
|
\end{cases} \\
|
|
\bm{\rho}^m =
|
|
\begin{cases}
|
|
\bm{\rho} = \left( 0.5, 0.3, 0.2 \right) & \text{for } iid\text{-colBiSBM} \\
|
|
\sigma_2^m(\bm{\rho}) & \text{for } \rho\text{-colBiSBM} \text{ and } \pi\rho\text{-colBiSBM},
|
|
\end{cases}
|
|
\end{align*}
|
|
for the block proportions, and two different structures with the corresponding
|
|
$\bm{\alpha}$,
|
|
\begin{align*}
|
|
\bm{\alpha}^{modular} = \begin{pmatrix}
|
|
0.9 & 0.05 & 0.05 \\
|
|
0.05 & 0.2 & 0.05 \\
|
|
0.05 & 0.05 & 0.8
|
|
\end{pmatrix}, &
|
|
~\bm{\alpha}^{nested} = \begin{pmatrix}
|
|
0.9 & 0.65 & 0.1 \\
|
|
0.35 & 0.15 & 0.05 \\
|
|
0.1 & 0.05 & 0.05
|
|
\end{pmatrix},
|
|
\end{align*}
|
|
|
|
where $\bm{\alpha}^{modular}$ represents networks where there are look-a-like
|
|
communities, which tends to interact preferentially within the community and
|
|
less with the other communities. And $\bm{\alpha}^{nested}$ represents a common
|
|
structure detected in ecology with generalist and specialist species and a
|
|
\enquote{nested} structure.
|
|
|
|
The collections contain two networks ($M=2$) of size $n^{m=1}_1 =
|
|
n^{m=1}_2 = 20$ and
|
|
$n^{m=2}_1 = n^{m=2}_2 = 120$. One collection is generated for each colBiSBM
|
|
model. And the nodes block memberships (i.e., the row and column blocks they
|
|
belong to) are saved.
|
|
|
|
Per colBiSBM model, 10 collections are generated and their results are
|
|
averaged.
|
|
|
|
In the network $m=1$ (i.e., the smaller one) a proportion of the edges
|
|
$p_{\texttt{NA}}$ see their values replaced by \texttt{NA}s, the
|
|
\enquote{forgotten} values are stored.
|
|
|
|
\paragraph{Test procedure} A LBM is fitted on the first network, and the
|
|
predicted block memberships are saved, along with the predicted links using the
|
|
inferred parameters. This will serve as a baseline to see if the use of the
|
|
collection benefits the predictions.
|
|
|
|
A colBiSBM model is then fitted (with a model matching the dataset considered)
|
|
and we store the same predictions.
|
|
|
|
\paragraph{Quality metrics} To benchmark the performance we use the
|
|
\emph{Area Under the Curve} (AUC) for predicted versus real link values and the
|
|
ARI for predicted versus real block memberships.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includestandalone{tikz/simulations/na_robustness/ari-dim-model}
|
|
\caption{ARI in function of $p_\texttt{NA}$, the proportion of missing links
|
|
for various colBiSBM models and their LBM counterparts}
|
|
\label{fig:ari-dim-plot-na}
|
|
\end{figure}
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includestandalone{tikz/simulations/na_robustness/auc-model}
|
|
\caption{AUC in function of $p_\texttt{NA}$, the proportion of missing links
|
|
for various colBiSBM models and their LBM counterparts}
|
|
\label{fig:auc-plot}
|
|
\end{figure}
|
|
|
|
\paragraph{Results}
|
|
Figures~\ref{fig:auc-plot} and~\ref{fig:ari-dim-plot-na} show a
|
|
box plots named \enquote{sep-$model$} that
|
|
corresponds to the results given by a LBM fitted on data generated with the
|
|
corresponding \emph{model}. We will compare the results for one model box plot
|
|
to the corresponding sep-model box plot, serving as a baseline.
|
|
|
|
% TODO the ARI interpretation
|
|
For the figure~\ref{fig:ari-dim-plot-na}, our models almost always do at least
|
|
as good as the sep counterpart. The $iid$ model is the only one for which the
|
|
sep performs better on the columns block memberships.
|
|
The nested structure seems to complexify the block membership attribution with
|
|
only ARI less than 0.75
|
|
|
|
For the figure~\ref{fig:auc-plot}, in almost all cases and for almost
|
|
all models the differences are not significant but our models seems to perform
|
|
marginally better and are only a few times under their LBM counterpart.
|
|
This indicates that information is transferred from the bigger network when estimating the parameters and predicting link values.
|
|
|
|
On the differences between nested and modular structures, the latter shows
|
|
a smaller variance in the AUC with our models predictions contained between
|
|
0.7 and 0.9. Whereas for the nested structure, $iid$ and $\pi$ models are
|
|
in quite similar value ranges with small variances but $\rho$ and
|
|
$\pi\rho$ present smaller values and larger variances.
|
|
|
|
An explanation for the cases in which our models return lower values than
|
|
expected could be to look for in our simulation parameters. They may, combined
|
|
with the $\rho$ model be a difficult case for the estimation.
|
|
As we currently do not have identifiability results this is just and
|
|
hypothesis. |