mia-rapport-2024/rapport/chapter4-simulations/model-selection.tex

82 lines
No EOL
4 KiB
TeX

\section[Capacity to distinguish models]{Capacity to distinguish
$\pi\rho\text{-}colBiSBM$~from\newline
$iid\text{-}colBiSBM$ and other
models}\label{sec:capacity-to-distinguish-pirhotext-colbisbm-from-iidtext-colbisbm-and-other-variants}
The idea of this model selection simulations is to assess how the model
select the correct \emph{colBiSBM} model among the possible ones:
\textit{$iid, \pi, \rho, \pi\rho$}. This difference being based on the row and
col block proportions.\\
\paragraph{Simulation settings} For this task we choose the same simulation settings as
\cite{chabert-liddellLearningCommonStructures2024a}.\\
Namely, $n_{1}^{m} = 90, n_{2}^{m} = 90, Q_1 = Q_2 = 3$,
$\bm{\alpha}, \bm{\pi}$ and $\bm{\rho}$ are set as follows:\\
\begin{minipage}[l]{0.4\linewidth}
\begin{align*}
\bm{\alpha} =.25 + \begin{pmatrix}
3 \eps[\alpha] & 2 \eps[\alpha] & \eps[\alpha] \\
2 \eps[\alpha] & 2 \eps[\alpha] & - \eps[\alpha] \\
\eps[\alpha] & - \eps[\alpha] & \eps[\alpha]
\end{pmatrix},
\end{align*}
\end{minipage}
\hfill
\begin{minipage}[r]{0.4\linewidth}
\begin{align*}
\bm{\pi}^1 = \begin{pmatrix}
\frac{1}{3}, & \frac{1}{3}, & \frac{1}{3}
\end{pmatrix}, & & \bm{\pi}^2 = \sigma\begin{pmatrix}
\frac{1}{3} - \eps[\pi], & \frac{1}{3}, & \frac{1}{3} + \eps[\pi]
\end{pmatrix}, \\
\bm{\rho}^1 = \begin{pmatrix}
\frac{1}{3}, & \frac{1}{3}, & \frac{1}{3}
\end{pmatrix}, & & \bm{\rho}^2 = \sigma\begin{pmatrix}
\frac{1}{3} - \eps[\rho], & \frac{1}{3}, & \frac{1}{3} + \eps[\rho]
\end{pmatrix},
\end{align*}
\end{minipage}
with $\eps[\alpha] = 0.16$, $\eps[\pi]$ and
$\eps[\rho]$ taking 9 values equally spaced in
$\left[ 0, .28\right]$.\newline
We simulate 324 different collections for each
value of $\eps[\pi]$ and $\eps[\rho]$.
$\pi\rho\text{-}colBiSBM$, $\pi\text{-}colBiSBM$,
$\rho\text{-}colBiSBM$, $iid\text{-}colBiSBM$ and
$sep\text{-}BiSBM$ are put in competition and the model with the
greater BIC-L is selected as the \emph{preferred model}.
When $\eps[\pi] = 0$, $\bm{\pi}^1 = \bm{\pi}^2$, $\eps[\rho] = 0$
and $\bm{\rho}^1 = \bm{\rho}^2$, the generated collection is an
$iid\text{-}colBiSBM$. When $\eps[\pi] > 0$ or
$\bm{\pi}^1 \neq \bm{\pi}^2$, the model is a $\pi\text{-}colBiSBM$.
When $\eps[\rho] > 0$ or $\bm{\rho}^1 \neq \bm{\rho}^2$, the model
is a $\rho\text{-}colBiSBM$. Finally, when $\eps[\pi] > 0$ or
$\bm{\pi}^1 \neq \bm{\pi}^2$ and $\eps[\rho] > 0$ or
$\bm{\rho}^1 \neq \bm{\rho}^2$, the model is a
$\pi\rho\text{-}colBiSBM$.
\begin{figure}[!ht]
\centering
\input{../tikz/simulations/model_selection/eps-pi-rho-preferred.tex}
\caption{\label{fig:pref_model_func_eps}Plot of model selection proportions
over the different datasets in
function of $\eps[\pi]$ and $\eps[\rho]$}
\end{figure}
\paragraph{Results:}
On the figure \ref{fig:pref_model_func_eps} and table \ref{tab:model-selection}, one can see that
there is a turning point around $\eps[\pi] = 0.2$ (resp.
$\eps[\rho] = 0.2$), before which $iid\text{-}colBiSBM$ and
$\rho\text{-}colBiSBM$ (resp. $\pi\text{-}colBiSBM$) are selected
very often and after $0.2$ the $\pi\text{-}colBiSBM$ (resp.
$\rho\text{-}colBiSBM$) and $\pi\rho\text{-}colBiSBM$ gets more and
more selected. Moreover, the number of blocks are correctly detected in most
of the case.
These two results highlight our capacity to recover the simulated
structure.
As $\eps[\pi]$ and $\eps[\rho]$ need to be above $0.2$ to see $\pi\rho$ model
being preferred this may indicate the need of a strong difference between blocks
to select this model.