From 4198af647613f2b5e54e5993eae2eebf8d4918c9 Mon Sep 17 00:00:00 2001 From: Louis Lacoste Date: Fri, 16 Aug 2024 18:03:34 +0200 Subject: [PATCH] =?UTF-8?q?conclu=20:=20d=C3=A9placement=20et=20ajout?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- rapport/conclusions.tex | 51 ++++++++++++++++++++++++++++------------- 1 file changed, 35 insertions(+), 16 deletions(-) diff --git a/rapport/conclusions.tex b/rapport/conclusions.tex index 6eaf639..70c5453 100644 --- a/rapport/conclusions.tex +++ b/rapport/conclusions.tex @@ -5,33 +5,52 @@ \label{sec:conclusion} \subsection{Difficulties encountered} -\paragraph{Seed dependance} While using our clustering on +\label{ssec:difficulties-encountered} +\paragraph{Seed dependance} While using our clustering on data +from~\cite{doreRelativeEffectsAnthropogenic2021} we obtained quite interesting +results but investigating further, we noticed that the clustering on such big +collections ($M=123$) was not fully reproducible. It depends a lot on the random +generator seed and as there is no possibility to merge back +collections\footnote{ + This is due to the need of having same sized $\bm{\alpha}$ to be able to compute + the distance. Meaning that the networks must have been fitted together + in the same collection.} +the clustering dendrograms do not stabilize on large collections. +This, currently, prevents us to clusterize large collections. - -\section{Future work} -\label{sec:future-work} - -\paragraph{Identifiability} -As stated in section~\ref{sec:model-identifiability}, we only have -identifiability for the \emph{iid}-colBiSBM and we will work on establishing -identifiability for $\pi$, $\rho$ and $\pi\rho$ models. - -\paragraph{Finding a trade-off between \emph{iid} and $\pi\rho$} +\paragraph{Large penalties with free mixture models} We observed while testing clustering with the different models that the $\pi$, $\rho$ and $\pi\rho$ model, with their increased number of parameters for block memberships parameters tends to give smaller BIC-L criterion values while having a higher Evidence Lower Bound than the \emph{iid}. -This arises because of the penalties on the block memberships and support that +This arises because of the penalties on the block memberships and supports that increase significantly and exceeds the gain on the ELBO and the diminution of the connectivity parameters. -An idea to tackle this problem could be to suppose that the block memberships + +\section{Future work} +\label{sec:future-work} + +\paragraph{Fixing seed dependance} +We are currently investigating the procedure and code to see if reducing or +escaping seed dependance is possible. + +\paragraph{Identifiability} +As stated in section~\ref{sec:model-identifiability}, we only have +identifiability for the \emph{iid}-colBiSBM and we will work on establishing +identifiability for $\pi$, $\rho$ and $\pi\rho$ models which are the most +challenging with regard to identifiability. + +\paragraph{Finding a trade-off between \emph{iid} and $\pi\rho$} +An idea to tackle the problem of large penalties with $\pi$, $\rho$ and +$\pi\rho$ could be to suppose that the block memberships for network $m$ are themselves the realizations of random variables and -thus introduce sort of a mixed effect model. +thus introduce sort of a mixed effect model. This may allow a self-penalization +that could keep the flexibility intended in these models. \paragraph{Comparison to other graphs clustering methods} Recent work have been comparing -colSBM~\parencite{chabert-liddellLearningCommonStructures2024a} and -graphclust~\parencite{rebafkaModelbasedClusteringMultiple2023} assessing various +\texttt{colSBM}~\parencite{chabert-liddellLearningCommonStructures2024a} and +\texttt{graphclust}~\parencite{rebafkaModelbasedClusteringMultiple2023} assessing various capabilities of the models and particularly focusing on networks clustering. We will reproduce and adapt the analysis to test other simulation settings that were not considered in this work.