mia-rapport-2024/rapport/chapter4-simulations/model-selection.tex

\section[Capacity to distinguish models]{Capacity to distinguish
  $\pi\rho$-colBiSBM~from\newline
  $iid$-colBiSBM and other
  models}\label{sec:capacity-to-distinguish-pirhotext-colbisbm-from-iidtext-colbisbm-and-other-variants}
The idea of this model selection simulations is to assess how the model
select the correct colBiSBM model among the possible ones:
\textit{$iid, \pi, \rho, \pi\rho$}. This difference being based on the row and
col block proportions.\\
\paragraph{Simulation settings} For this task we choose the same simulation settings as
\cite{chabert-liddellLearningCommonStructures2024a}.\\
Namely, $n_{1}^{m} = 90, n_{2}^{m} = 90, Q_1 = Q_2 = 3$,
$\bm{\alpha}, \bm{\pi}$ and $\bm{\rho}$ are set as follows:\\
\begin{minipage}[l]{0.4\linewidth}
    \begin{align*}
        \bm{\alpha} =.25 +  \begin{pmatrix}
                                3 \eps[\alpha] & 2 \eps[\alpha] & \eps[\alpha]   \\
                                2 \eps[\alpha] & 2 \eps[\alpha] & - \eps[\alpha] \\
                                \eps[\alpha]   & - \eps[\alpha] & \eps[\alpha]
                            \end{pmatrix},
    \end{align*}
\end{minipage}
\hfill
\begin{minipage}[r]{0.4\linewidth}
    \begin{align*}
        \bm{\pi}^1 = \begin{pmatrix}
                         \frac{1}{3}, & \frac{1}{3}, & \frac{1}{3}
                     \end{pmatrix}, &  & \bm{\pi}^2 = \sigma\begin{pmatrix}
                                                                \frac{1}{3} - \eps[\pi], & \frac{1}{3}, & \frac{1}{3} + \eps[\pi]
                                                            \end{pmatrix},     \\
        \bm{\rho}^1 = \begin{pmatrix}
                          \frac{1}{3}, & \frac{1}{3}, & \frac{1}{3}
                      \end{pmatrix}, &  & \bm{\rho}^2 = \sigma\begin{pmatrix}
                                                                  \frac{1}{3} - \eps[\rho], & \frac{1}{3}, & \frac{1}{3} + \eps[\rho]
                                                              \end{pmatrix},
    \end{align*}
\end{minipage}
with $\eps[\alpha] = 0.16$, $\eps[\pi]$ and
$\eps[\rho]$ taking 9 values equally spaced in
$\left[ 0, .28\right]$.\newline
We simulate 324 different collections for each
value of $\eps[\pi]$ and $\eps[\rho]$.

$\pi\rho$-colBiSBM, $\pi$-colBiSBM,
$\rho$-colBiSBM, $iid$-colBiSBM and
$sep\text{-}BiSBM$ are put in competition and the model with the
greater BIC-L is selected as the \emph{preferred model}.

When $\eps[\pi] = 0$, $\bm{\pi}^1 = \bm{\pi}^2$, $\eps[\rho] = 0$
and $\bm{\rho}^1 = \bm{\rho}^2$, the generated collection is an
$iid$-colBiSBM. When $\eps[\pi] > 0$ or
$\bm{\pi}^1 \neq \bm{\pi}^2$, the model is a $\pi$-colBiSBM.
When $\eps[\rho] > 0$ or $\bm{\rho}^1 \neq \bm{\rho}^2$, the model
is a $\rho$-colBiSBM. Finally, when $\eps[\pi] > 0$ or
$\bm{\pi}^1 \neq \bm{\pi}^2$ and $\eps[\rho] > 0$ or
$\bm{\rho}^1 \neq \bm{\rho}^2$, the model is a
$\pi\rho$-colBiSBM.


\begin{figure}[!ht]
    \centering
    \includestandalone{tikz/simulations/model_selection/eps-pi-rho-preferred}
    \caption{\label{fig:pref_model_func_eps}Plot of model selection proportions
        over the different datasets in
        function of $\eps[\pi]$ and $\eps[\rho]$}
\end{figure}

\paragraph{Results}

On the figure \ref{fig:pref_model_func_eps} and table \ref{tab:model-selection}, one can see that
there is a turning point around $\eps[\pi] = 0.2$ (resp.
$\eps[\rho] = 0.2$), before which $iid$-colBiSBM and
$\rho$-colBiSBM (resp. $\pi$-colBiSBM) are selected
very often and after $0.2$ the $\pi$-colBiSBM (resp.
$\rho$-colBiSBM) and $\pi\rho$-colBiSBM gets more and
more selected. Moreover, the number of blocks are correctly detected in most
of the case.
These two results highlight our capacity to recover the simulated
structure.

As $\eps[\pi]$ and $\eps[\rho]$ need to be above $0.2$ to see $\pi\rho$ model
being preferred this may indicate the need of a strong difference between blocks
to select this model.