From cd793e3094288f733c061fd7e9ea5a685b63b167 Mon Sep 17 00:00:00 2001 From: Louis Lacoste Date: Thu, 11 Jul 2024 23:52:59 +0200 Subject: [PATCH] rapport : modification texte --- rapport/chapter2-context.tex | 14 ++++---- rapport/chapter3-structure-detection.tex | 45 ++++++++++++++++-------- 2 files changed, 38 insertions(+), 21 deletions(-) diff --git a/rapport/chapter2-context.tex b/rapport/chapter2-context.tex index 8de8272..718f07b 100644 --- a/rapport/chapter2-context.tex +++ b/rapport/chapter2-context.tex @@ -81,15 +81,15 @@ interactions, the rows are pollinator species and the columns are plant species, and the intersection is a value, binary if it is a presence/absence or a value if it is an abundance count. -Bipartite graphs are widely used in biology, in various fields, among which the -previously cited ecological networks, but also in medicine with biomedical -networks, biomolecular networks or epidemiological networks. +Bipartite graphs are widely used in biology in general, in various fields, among +which the previously cited ecological networks, but also in medicine with +biomedical networks, biomolecular networks or epidemiological networks. \parencite{pavlopoulosBipartiteGraphsSystems2018} Some interesting results can arise when applying a tool widely used on a particular kind of interactions is used on another kind of interactions. -Companies like Netflix use recommender system, to recommend another product to -consumers based on their previous interactions. In +Companies like Netflix or Amazon use recommender system, to recommend other +products to consumers based on their previous interactions. In ~\cite{desjardins-proulxEcologicalInteractionsNetflix2017} the authors use the \emph{K-nearest neighbour} (KNN) algorithm as a Recommender to predict missing preys for predators in a predator-prey network. @@ -101,9 +101,9 @@ adapts the Stochastic Block Model (SBM) \parencite{hollandStochasticBlockmodelsFirst1983, snijdersEstimationPredictionStochastic1997} to bipartite graphs. -\begin{small} +\textit{Note :}\begin{small} Please note that we prefer the term ``BiSBM`` and will use both LBM and BiSBM to - designate the Stochastic Block model applied on bipartite networks. + designate the Stochastic Block Model applied on bipartite networks. \end{small} This model supposes that: diff --git a/rapport/chapter3-structure-detection.tex b/rapport/chapter3-structure-detection.tex index a3e5cf8..4222875 100644 --- a/rapport/chapter3-structure-detection.tex +++ b/rapport/chapter3-structure-detection.tex @@ -112,14 +112,14 @@ the same problems as~\cite{chabert-liddellLearningCommonStructures2024a} and adapt the support $S$ they define for the $\pi$-colSBM to the bipartite case by having $S^1$ of size $M\times Q_1$ the support for the rows and $S^2$ of size $M\times Q_2$ the support for the columns. Thus -$S^1_{mq} = \mathbb{1}_{\pi^m_q > 0}$ and -$S^2_{mr} = \mathbb{1}_{\rho^m_r > 0}$. In this case, $S^2 = \bm{1}$, because +$S^1_{mq} = \mathbbb{1}_{\pi^m_q > 0}$ and +$S^2_{mr} = \mathbbb{1}_{\rho^m_r > 0}$. In this case, $S^2 = \bm{1}$, because there is no freedom on the column dimension. For a given number of blocks $Q_1$, $Q_2$ and matrix $S^1$ ($S^2$ being in this case the matrix full of ones), the number of parameters is: \begin{equation*} - \text{NP}(\pi\text{-}colBiSBM) = \sum_{m=1}^{M}\Bigg( \sum_{q=1}^{Q_1} S^1_{mq} - 1 \Bigg) + (Q_2 - 1) + \sum_{\substack{q=1,\dots,Q_1 \\ r=1,\dots,Q_2}} \mathbb{1}_{{(S^{1\prime}S^2)}_{qr}>0} + \text{NP}(\pi\text{-}colBiSBM) = \sum_{m=1}^{M}\Bigg( \sum_{q=1}^{Q_1} S^1_{mq} - 1 \Bigg) + (Q_2 - 1) + \sum_{\substack{q=1,\dots,Q_1 \\ r=1,\dots,Q_2}} \mathbbb{1}_{{(S^{1\prime}S^2)}_{qr}>0} \end{equation*} The first term corresponds to the non-null block proportions in each network. The third quantity accounts for the fact that some blocks may never be @@ -147,7 +147,7 @@ the column dimension. For a given number of blocks $Q_1$, $Q_2$ and matrix $S^2$ ($S^1$ being in this case the matrix full of ones), the number of parameters is: \begin{equation*} - \text{NP}(\rho\text{-}colBiSBM) = (Q_1 - 1) + \sum_{m=1}^{M}\Bigg( \sum_{r=1}^{Q_2} S^2_{mr} - 1 \Bigg) + \sum_{\substack{q=1,\dots,Q_1 \\ r=1,\dots,Q_2}} \mathbb{1}_{{(S^{1\prime}S^2)}_{qr}>0} + \text{NP}(\rho\text{-}colBiSBM) = (Q_1 - 1) + \sum_{m=1}^{M}\Bigg( \sum_{r=1}^{Q_2} S^2_{mr} - 1 \Bigg) + \sum_{\substack{q=1,\dots,Q_1 \\ r=1,\dots,Q_2}} \mathbbb{1}_{{(S^{1\prime}S^2)}_{qr}>0} \end{equation*} $\pi\rho$-colBiSBM model still assumes that the networks share a common connectivity @@ -165,7 +165,7 @@ $\rho^m_r \in \left[ 0,1 \right], \sum_{r=1}^{Q_2} \rho^m_r = 1 $. For a given number of blocks $Q_1$, $Q_2$ and matrices $S^1$, $S^2$, the number of parameters is: \begin{equation*} - \text{NP}(\pi\rho\text{-}colBiSBM) = \sum_{m=1}^{M}\Bigg( \sum_{q=1}^{Q_1} S^1_{mq} - 1 \Bigg) + \sum_{m=1}^{M}\Bigg( \sum_{r=1}^{Q_2} S^2_{mr} - 1 \Bigg) + \sum_{\substack{q=1,\dots,Q_1 \\ r=1,\dots,Q_2}} \mathbb{1}_{{(S^{1\prime}S^2)}_{qr}>0} + \text{NP}(\pi\rho\text{-}colBiSBM) = \sum_{m=1}^{M}\Bigg( \sum_{q=1}^{Q_1} S^1_{mq} - 1 \Bigg) + \sum_{m=1}^{M}\Bigg( \sum_{r=1}^{Q_2} S^2_{mr} - 1 \Bigg) + \sum_{\substack{q=1,\dots,Q_1 \\ r=1,\dots,Q_2}} \mathbbb{1}_{{(S^{1\prime}S^2)}_{qr}>0} \end{equation*} \section{Variational estimation of the parameters}\label{sec:variational-estimation-of-the-parameters} @@ -289,6 +289,10 @@ all networks over the number of number of possible interactions: % Adapt bicl, methode explo car defi % 1 bicl 2 model exploration % Citer la conclusion de l'article de St Clair discussion sur bipartite +The section \ref{sec:variational-estimation-of-the-parameters} explains how we +estimate the parameters of the model for \emph{fixed} number of blocks +$Q_1$ and $Q_2$. But as they are in general not known we need to explore the +latent space to find the \emph{best} values. As discussed in~\cite{chabert-liddellLearningCommonStructures2024a}, the algorithmic aspect becomes complex when dealing with the bipartite case. Due to the size of the latent space being $\mathbb{N}^2$, conducting a complete @@ -299,8 +303,14 @@ challenge involved making significant choices, which are outlined below. The below procedures are implemented in the \emph{colSBM} package, available on \url{https://github.com/Chabert-Liddell/colSBM}. -\subsection{The BIC-L criterion for model selection} +\subsection{The \emph{Bayesian Information Criterion like} (BIC-L) criterion for model selection} \label{ssec:the-bic-l-criterion-for-model-selection} +To select the best number of blocks we need a criterion to +measure adequacy between our model and data. The ELBO might seem a good +criterion at first but as for the likelihood, the more complex a model the +higher it gets. And thus a good criterion should make a \emph{trade-off} between +fitting to data and model complexity. + The Integrated Classified Likelihood (ICL) is a well-established tool in the SBM and LBM domains for selecting the appropriate number of blocks. It was introduced by~\cite{biernackiAssessingMixtureModel2000, @@ -322,8 +332,9 @@ well-separated blocks by imposing a penalty on the entropy of node grouping. However, the objective of our study extends beyond grouping nodes into coherent blocks. We also aim to assess the similarity of connectivity patterns across different networks. Consequently, we aim to permit models that offer more -flexible node grouping without penalizing entropy. This leads us to formulate a -BIC-like criterion in the following manner: +flexible node grouping without penalizing entropy. + +This leads us to formulate a BIC-like criterion in the following manner: \[ \text{BIC-L} = \max_{\bm{\theta}} \mathbb{E}_{\widehat{\mathcal{R}}} [\ell(\bm{X,Z,W;\theta})] + \mathcal{H(\widehat{R})} - \frac{1}{2}\text{pen} = \max_{\bm{\theta}} \mathcal{J(\widehat{R}, \bm{\theta})} - \frac{1}{2}\text{pen} @@ -364,7 +375,7 @@ propose. \log n_{2}^{m}. \] Penalties for the $\bm\alpha$ \[ \text{pen}_{\alpha}(Q_1, Q_2, S_1, S_2) = (\sum_{q=1}^{Q_1} - \sum_{r=1}^{Q_2} \mathbb{1}_{(S_1)'S_2 > 0}) \log (N_M). \] + \sum_{r=1}^{Q_2} \mathbbb{1}_{(S_1)'S_2 > 0}) \log (N_M). \] And the corresponding BIC-L formula, \[ \begin{aligned} @@ -380,11 +391,16 @@ propose. \subsection{Initialization and pairing of the models} \label{ssec:initialization-and-pairing-of-the-models} -First to combine the information from the $M$ networks we fit a collection model +The row (resp. column) block memberships are the labels of row (resp. column) +nodes corresponding to the group to which they were assigned based on their +connection patterns. This adds another layer of complexity to the model +selection as we need to find the best $Q_1, Q_2$ and the best memberships for +each vertex. + +First to combine the information from the $M$ networks we fit a LBM model for each network at the two points $Q = (1, 2)$ and $Q = (2, 1)$. Using the previously described VEM algorithm we obtain for each network its parameters ($\bm{\rho,\pi,\alpha}$). - We then compute the marginal laws for each dimension, for each network. Then we order the network blocks by the probabilities obtained in decreasing order. @@ -395,10 +411,10 @@ For the memberships on the rows: $row~order_m = order\left(\rho_m \times ~^{t}(\alpha_m)\right)$. Using this order we relabel the memberships for the $M$ fitted collection of a -single network. Then we use the $M$ memberships to fit a collection containing +single network. +We then use the $M$ memberships to fit a collection containing the $M$ networks. \subsection{Greedy exploration to find an estimation of the mode}\label{ssec:greedy-exploration-to-find-an-estimation-of-the-mode} - Using the previously fitted models for $Q = (1,2)$ and $Q = (2,1)$ we choose to perform a greedy exploration to find a first mode. @@ -408,7 +424,7 @@ memberships for the points $Q \in \{(Q_1 + 1, Q_2),(Q_1, Q_2 + 1),(Q_1 - 1, maximizes the BIC-L as the next point from which to repeat the procedure. We repeat the procedure until the BIC-L stops increasing $2$ times in a row. -\begin{algorithm}[t] +\begin{algorithm}[H] \caption{Greedy Exploration for Mode Estimation} \SetAlgoLined \SetKwInOut{Input}{Input} @@ -447,6 +463,7 @@ repeat the procedure until the BIC-L stops increasing $2$ times in a row. When this first estimation of the BIC-L mode has been find we apply the moving window on it. + \subsection{Moving window to update the block memberships and the BIC-L} \label{ssec:moving-window-to-update-the-block-memberships-and-the-bic-l} The \emph{moving window} is used to update the block memberships on rows and