mia-rapport-2024/rapport/conclusions.tex

\addtocounter{customchapter}{1}
\chapter{Conclusions and future work}
\label{chap:conclusions-and-future-work}
\section{Conclusion}
\label{sec:conclusion}

\subsection{Difficulties encountered}
\label{ssec:difficulties-encountered}
\paragraph{Seed dependance} While using our clustering on data
from~\cite{doreRelativeEffectsAnthropogenic2021} we obtained quite interesting
results but investigating further, we noticed that the clustering on such big
collections ($M=123$) was not fully reproducible. It depends a lot on the random
generator seed and as there is no possibility to merge back
collections\footnote{
    This is due to the need of having same sized $\bm{\alpha}$ to be able to compute
    the distance. Meaning that the networks must have been fitted together
    in the same collection.}
the clustering dendrograms do not stabilize on large collections.
This, currently, prevents us to clusterize large collections.

\paragraph{Large penalties with free mixture models}
We observed while testing clustering with the different models that
the $\pi$, $\rho$ and $\pi\rho$ model, with their increased number of parameters
for block memberships parameters tends to give smaller BIC-L criterion values
while having a higher Evidence Lower Bound than the \emph{iid}.
This arises because of the penalties on the block memberships and supports that
increase significantly and exceeds the gain on the ELBO and the diminution of
the connectivity parameters.

\section{Future work}
\label{sec:future-work}

\paragraph{Fixing seed dependance}
We are currently investigating the procedure and code to see if reducing or
escaping seed dependance is possible.

\paragraph{Identifiability}
As stated in section~\ref{sec:model-identifiability}, we only have
identifiability for the \emph{iid}-colBiSBM and we will work on establishing
identifiability for $\pi$, $\rho$ and $\pi\rho$ models which are the most
challenging with regard to identifiability.

\paragraph{Finding a trade-off between \emph{iid} and $\pi\rho$}
An idea to tackle the problem of large penalties with $\pi$, $\rho$ and
$\pi\rho$ could be to suppose that the block memberships
for network $m$ are themselves the realizations of random variables and
thus introduce sort of a mixed effect model. This may allow a self-penalization
that could keep the flexibility intended in these models.

\paragraph{Comparison to other graphs clustering methods}
Recent work have been comparing
\texttt{colSBM}~\parencite{chabert-liddellLearningCommonStructures2024a} and
\texttt{graphclust}~\parencite{rebafkaModelbasedClusteringMultiple2023} assessing various
capabilities of the models and particularly focusing on networks clustering.
We will reproduce and adapt the analysis to test other simulation settings that
were not considered in this work.

\section*{Thank you for reading this work}