structure detection : relecture

This commit is contained in:
Louis Lacoste 2024-08-18 22:17:15 +02:00
parent 563ef8ede3
commit 57410d1b4a
3 changed files with 105 additions and 98 deletions

View file

@ -7,8 +7,8 @@
We define a collection of bipartite networks as We define a collection of bipartite networks as
$\bm{X} = (X^1,\dots X^m,\dots, X^M)$ $\bm{X} = (X^1,\dots X^m,\dots, X^M)$
the collection of incidence matrix. Moreover, all the networks in the the collection of incidence matrix. Moreover, all the networks in the
collection have the same type of interaction (e.g., all interactions are collection have the same valuation of the interactions (e.g., they are
binary). all binary).
\section{Separate BiSBM (sep-BiSBM)}\label{sec:separate-bisbm-sepbisbm} \section{Separate BiSBM (sep-BiSBM)}\label{sec:separate-bisbm-sepbisbm}
@ -51,21 +51,21 @@ Equations~\eqref{eqn:lbm-block-membership-prob},
\eqref{eqn:lbm-conditional-to-latent} and \eqref{eqn:lbm-emission} defines the \eqref{eqn:lbm-conditional-to-latent} and \eqref{eqn:lbm-emission} defines the
BiSBM model and we will now use a short notation: BiSBM model and we will now use a short notation:
\begin{equation} \begin{align}
\tag{\emph{sep-BiSBM}} \tag{\emph{sep-BiSBM}}
X^m \sim \mathcal{F}\text{-BiSBM}_{n_1^m,n_2^m}(Q_1^m, Q_2^m, \bm{\pi^m}, \bm{\rho^m}, \bm{\alpha^m}) X^m \sim \mathcal{F}\text{-BiSBM}_{n_1^m,n_2^m}(Q_1^m, Q_2^m, \bm{\pi^m}, \bm{\rho^m}, \bm{\alpha^m}) & & \forall m = 1, \dots M
\end{equation} \end{align}
where $\mathcal{F}$ encodes the emission distribution, $n_1^m,n_2^m$ are the row where $\mathcal{F}$ encodes the emission distribution, $n_1^m,n_2^m$ are the number of row
and column nodes, $Q_1^m, Q_2^m$ are the number of row and column blocks in and column nodes, $Q_1^m, Q_2^m$ are the number of row and column blocks in
network $m$, $\bm{\pi}^m~=~{(\pi^m_q)}_{q=1,\dots,Q_1^m}$ and network $m$, $\bm{\pi}^m~=~{(\pi^m_q)}_{q=1,\dots,Q_1^m}$ and
$\bm{\rho}^m~=~{(\rho^m_r)}_{r=1,\dots,Q_2^m}$ are the vectors of their $\bm{\rho}^m~=~{(\rho^m_r)}_{r=1,\dots,Q_2^m}$ are the vectors of their
proportions. The $Q_1^m \times Q_2^m$ matrix proportions. The $Q_1^m \times Q_2^m$ matrix
$\bm{\alpha}^m = {(\alpha^m_{qr})}_{\substack{q = 1,\dots,Q_1^m \\ r = 1,\dots,Q_2^m}}$ $\bm{\alpha}^m = {(\alpha^m_{qr})}_{\substack{q = 1,\dots,Q_1^m \\ r = 1,\dots,Q_2^m}}$
are the connectivity parameters, the parameters of the emission distribution. are the connectivity parameters, i.e.~the parameters of the emission distribution.
$\alpha^m_{qr}\in\mathcal{A}_{\mathcal{F}}$ where, for the Bernoulli $\alpha^m_{qr}\in\mathcal{A}_{\mathcal{F}}$ where, for the Bernoulli
(resp. Poisson) emission distribution, $\mathcal{A}_{\mathcal{F}} = (0,1)$ (resp. (resp. Poisson) emission distribution, $\mathcal{A}_{\mathcal{F}} = (0,1)$ (resp.
$\mathcal{A}_{\mathcal{F}} = \mathbb{R}^{*+}$). In this $sep$-$BiSBM$ each $\mathcal{A}_{\mathcal{F}} = \mathbb{R}^{*+}$). In this $sep$-BiSBM model each
network $m$ is assumed to follow a $BiSBM$ with its own parameters ($\bm{\pi}^m, network $m$ is assumed to follow a BiSBM with its own parameters ($\bm{\pi}^m,
\bm{\rho}^m, \bm{\alpha}^m$). \bm{\rho}^m, \bm{\alpha}^m$).
% DONE Finish explaining % DONE Finish explaining
@ -76,7 +76,7 @@ network $m$ is assumed to follow a $BiSBM$ with its own parameters ($\bm{\pi}^m,
\subsection{A collection of iid bipartite SBM}\label{ssec:a-collection-of-i-i-d-bipartite-sbm} \subsection{A collection of iid bipartite SBM}\label{ssec:a-collection-of-i-i-d-bipartite-sbm}
As for \emph{colSBM} this first model is the most constrained. It assumes that As for \emph{colSBM} this first model is the most constrained. It assumes that
all the networks are the independent realizations of the same $Q_1$-$Q_2$-BiSBM all the networks are the independent realizations of the same $Q_1$-$Q_2$-BiSBM
with identical parameters. The \emph{iid-colBiSBM} is defined as follows: with identical parameters. The \emph{iid}-colBiSBM is defined as follows:
\begin{align} \begin{align}
\tag{\emph{iid}-colBiSBM} \tag{\emph{iid}-colBiSBM}
@ -85,7 +85,7 @@ with identical parameters. The \emph{iid-colBiSBM} is defined as follows:
where $\forall (q,r) \in \{1,\dots,Q_1\}\times\{1,\dots,Q_2\}$, $\alpha_{qr} \in \mathcal{A}_{\mathcal{F}}$, where $\forall (q,r) \in \{1,\dots,Q_1\}\times\{1,\dots,Q_2\}$, $\alpha_{qr} \in \mathcal{A}_{\mathcal{F}}$,
$\pi_q \in \left( 0,1 \right], \sum_{q=1}^{Q_1} \pi_q = 1 $ and $\rho_r \in \left( 0,1 \right], \sum_{r=1}^{Q_2} \rho_r = 1 $. $\pi_q \in \left( 0,1 \right], \sum_{q=1}^{Q_1} \pi_q = 1 $ and $\rho_r \in \left( 0,1 \right], \sum_{r=1}^{Q_2} \rho_r = 1 $.
This model involves $(Q_1 - 1) + (Q_2 - 1) + Q_1\times Q_2$ parameters, the two This model involves $(Q_1 - 1) + (Q_2 - 1) + Q_1\times Q_2$ parameters, the two
first terms corresponding to block proportions on the row and column dimensions first terms corresponding to block proportions on the row and column
and the third term to connectivity parameters. and the third term to connectivity parameters.
But the assumption that block proportions are the same among the networks is a But the assumption that block proportions are the same among the networks is a
@ -106,9 +106,9 @@ block proportions. For $m \in \{1,\dots,M\}$, the $X^m$ are independent and
\end{align} \end{align}
where $\forall (q,r) \in \{1,\dots,Q_1\}\times\{1,\dots,Q_2\}$, $\alpha_{qr} \in \mathcal{A}_{\mathcal{F}}$, where $\forall (q,r) \in \{1,\dots,Q_1\}\times\{1,\dots,Q_2\}$, $\alpha_{qr} \in \mathcal{A}_{\mathcal{F}}$,
$\pi^m_q \in \left[ 0,1 \right], \sum_{q=1}^{Q_1} \pi^m_q~=~1, \forall m \in \{1,\dots,M\}$ and $\rho_r \in \left( 0,1 \right], \sum_{r=1}^{Q_2} \rho_r = 1 $. $\pi^m_q \in \left[ 0,1 \right], \sum_{q=1}^{Q_1} \pi^m_q~=~1, \forall m \in \{1,\dots,M\}$ and $\rho_r \in \left( 0,1 \right], \sum_{r=1}^{Q_2} \rho_r = 1 $.
This model is more flexible than the iid-colBiSBM as it allows some row block This model is more flexible than the iid-colBiSBM as it allows the row block
proportions to be null proportions to vary between networks and even to be null
in certain networks ($\pi^m_q\in\left[ 0,1 \right]$): if $\pi_q^m = 0$ then the ($\pi^m_q\in\left[ 0,1 \right]$): if $\pi_q^m = 0$ then the
block $q$ is not represented in the network $m$. The connectivity structure is block $q$ is not represented in the network $m$. The connectivity structure is
thus a subset of a large connectivity structure common to all networks. We face thus a subset of a large connectivity structure common to all networks. We face
the same problems as~\cite{chabert-liddellLearningCommonStructures2024a} and the same problems as~\cite{chabert-liddellLearningCommonStructures2024a} and
@ -139,9 +139,9 @@ block proportions. For $m \in \{1,\dots,M\}$, the $X^m$ are independent and
where $\forall (q,r) \in \{1,\dots,Q_1\}\times\{1,\dots,Q_2\}$, $\alpha_{qr} \in \mathcal{A}_{\mathcal{F}}$, where $\forall (q,r) \in \{1,\dots,Q_1\}\times\{1,\dots,Q_2\}$, $\alpha_{qr} \in \mathcal{A}_{\mathcal{F}}$,
$\pi_q \in \left( 0,1 \right], \sum_{q=1}^{Q_1} \pi_q = 1 $ and $\pi_q \in \left( 0,1 \right], \sum_{q=1}^{Q_1} \pi_q = 1 $ and
$\rho^m_r \in \left[ 0,1 \right], \sum_{r=1}^{Q_2} \rho^m_r = 1 $. $\rho^m_r \in \left[ 0,1 \right], \sum_{r=1}^{Q_2} \rho^m_r = 1 $.
This model is more flexible than the iid-colBiSBM as it allows some column block This model is more flexible than the iid-colBiSBM as it allows
proportions to be proportions to vary between networks and even to be null
null in certain networks ($\rho^m_r\in\left[ 0,1 \right]$): if $\rho_r^m = 0$ ($\rho^m_r\in\left[ 0,1 \right]$): if $\rho_r^m = 0$
then the column block $r$ is not represented in the network $m$. then the column block $r$ is not represented in the network $m$.
\enquote{Mirroring} the formulas for the $\pi$-colBiSBM we relax the constraints on \enquote{Mirroring} the formulas for the $\pi$-colBiSBM we relax the constraints on
@ -155,7 +155,7 @@ case the matrix full of ones), the number of parameters is:
$\pi\rho$-colBiSBM model still assumes that the networks share a common connectivity $\pi\rho$-colBiSBM model still assumes that the networks share a common connectivity
structure represented by $\bm{\alpha}$ but that each network has its own row and structure represented by $\bm{\alpha}$ but that each network has its own row and
column block proportions, it is the less constrained model. column block proportions, it is the least constrained model.
For $m \in \{1,\dots,M\}$, the $X^m$ are independent and For $m \in \{1,\dots,M\}$, the $X^m$ are independent and
\begin{align} \begin{align}
\tag{\emph{$\pi\rho$}-colBiSBM} \tag{\emph{$\pi\rho$}-colBiSBM}
@ -204,22 +204,23 @@ we have: $\mathbb{P}_{\mathcal{R}_m} (Z_{iq}^m = 1, W_{jr}^m = 1|X^m) =
The formula for the entropy per network is thus: The formula for the entropy per network is thus:
\begin{equation*} \begin{equation*}
\mathcal{H}(\mathcal{R}_m) = - \sum_{i=1}^{n_1^m} \tau^{1,m}_{i,q} \log \tau^{1,m}_{i,q} - \sum_{j=1}^{n_2^m} \tau^{2,m}_{j,r} \log \tau^{2,m}_{j,r} \mathcal{H}(\mathcal{R}_m) = - \sum_{i=1}^{n_1^m} \tau_{iq}^{1,m} \log \tau_{iq}^{1,m} - \sum_{j=1}^{n_2^m} \tau_{jr}^{2,m} \log \tau_{jr}^{2,m}
\end{equation*} \end{equation*}
And the expectation of the completed log-likelihood under the $\mathcal{R}_m$ And the expectation of the completed log-likelihood under the $\mathcal{R}_m$
variational distribution for network $m$ is: variational distribution for network $m$ is:
\begin{align*} \begin{align*}
\mathbb{E}_{\mathcal{R}_m}[\ell(X^m,Z^m,W^m;\bm{\theta})] = \sum_{i = 1}^{n_1^m}\sum_{j=1}^{n_2^m}\sum_{q \in \mathcal{Q}_{1,m}} \sum_{r \in \mathcal{Q}_{2,m}} \tau^{1,m}_{i,q} \tau^{2,m}_{j,r} \log f(X^{m}_{ij}; \alpha_{qr}) \\ \mathbb{E}_{\mathcal{R}_m}[\ell(X^m,Z^m,W^m;\bm{\theta})] = \sum_{i = 1}^{n_1^m}\sum_{j=1}^{n_2^m}\sum_{q \in \mathcal{Q}_1^m} \sum_{r \in \mathcal{Q}_2^m} \tau_{iq}^{1,m} \tau_{jr}^{2,m} \log f(X^{m}_{ij}; \alpha_{qr}) \\
+ \sum_{i=1}^{n_1^m} \sum_{q \in \mathcal{Q}_{1,m}} \tau^{1,m}_{i,q} \log \pi_{\color{black}q}^{\color{gray}m} + \sum_{j=1}^{n_2^m} \sum_{r \in \mathcal{Q}_{2,m}} \tau^{2,m}_{j,r} \log \rho_{\color{black}r}^{\color{gray}m} + \sum_{i=1}^{n_1^m} \sum_{q \in \mathcal{Q}_1^m} \tau_{iq}^{1,m} \log \pi_{\color{black}q}^{\color{gray}m} + \sum_{j=1}^{n_2^m} \sum_{r \in \mathcal{Q}_2^m} \tau_{jr}^{2,m} \log \rho_{\color{black}r}^{\color{gray}m}
\end{align*} \end{align*}
with $\mathcal{Q}_1^m = \{q\in \{1 \dots, Q_1\}|\pi_q^m > 0\}$ and
$\mathcal{Q}_2^m = \{r\in \{1 \dots, Q_2\}|\rho_r^m > 0\}$
And thus the lower bound becomes: And thus the lower bound becomes:
\begin{align*} \begin{align*}
\mathcal{J}(\bm{\tau};\bm{\theta}) \coloneqq \sum_{m=1}^{M} \bigg(\sum_{i = 1}^{n_1^m}\sum_{j=1}^{n_2^m}\sum_{q \in \mathcal{Q}_{1,m}} \sum_{r \in \mathcal{Q}_{2,m}} \tau^{1,m}_{i,q} \tau^{2,m}_{j,r} \log f(X^{m}_{ij}; \alpha_{qr}) \\ \mathcal{J}(\bm{\tau};\bm{\theta}) \coloneqq \sum_{m=1}^{M} \bigg(\sum_{i = 1}^{n_1^m}\sum_{j=1}^{n_2^m}\sum_{q \in \mathcal{Q}_1^m} \sum_{r \in \mathcal{Q}_2^m} \tau_{iq}^{1,m} \tau_{jr}^{2,m} \log f(X^{m}_{ij}; \alpha_{qr}) \\
+ \sum_{i=1}^{n_1^m} \sum_{q \in \mathcal{Q}_{1,m}} \tau^{1,m}_{i,q} \log \pi_{\color{black}q}^{\color{gray}m} + \sum_{j=1}^{n_2^m} \sum_{r \in \mathcal{Q}_{2,m}} \tau^{2,m}_{j,r} \log \rho_{\color{black}r}^{\color{gray}m} \\ + \sum_{i=1}^{n_1^m} \sum_{q \in \mathcal{Q}_1^m} \tau_{iq}^{1,m} \log \pi_{\color{black}q}^{\color{gray}m} + \sum_{j=1}^{n_2^m} \sum_{r \in \mathcal{Q}_2^m} \tau_{jr}^{2,m} \log \rho_{\color{black}r}^{\color{gray}m} \\
- \sum_{i=1}^{n_1^m} \tau^{1,m}_{i,q} \log \tau^{1,m}_{i,q} - \sum_{j=1}^{n_2^m} \tau^{2,m}_{j,r} \log \tau^{2,m}_{j,r} \bigg) \color{black} - \sum_{i=1}^{n_1^m} \tau_{iq}^{1,m} \log \tau_{iq}^{1,m} - \sum_{j=1}^{n_2^m} \tau_{jr}^{2,m} \log \tau_{jr}^{2,m} \bigg) \color{black}
\end{align*} \end{align*}
where we identify the variational distribution $\mathcal{R}$ with its parameter where we identify the variational distribution $\mathcal{R}$ with its parameter
@ -284,8 +285,8 @@ while on the other hand,
\end{align*} \end{align*}
the parameters take into account all the networks at the same time. the parameters take into account all the networks at the same time.
The connectivity parameters $\alpha_{qr}$ for all models are estimated as the The connectivity parameters $\alpha_{qr}$ for all models are estimated as the
ratio of the number of interactions between row block $q$ and column block $r$ ratio of the number of observed interactions between row block $q$ and column block $r$
among all networks over the number of number of possible interactions: among all networks over the number of possible interactions:
\begin{align*} \begin{align*}
\widehat{\alpha}_{qr} = \frac{\sum_{m=1}^{M} e^{m}_{qr}}{\sum_{m=1}^{M} n^{m}_{qr}} \widehat{\alpha}_{qr} = \frac{\sum_{m=1}^{M} e^{m}_{qr}}{\sum_{m=1}^{M} n^{m}_{qr}}
\end{align*} \end{align*}
@ -303,7 +304,7 @@ $Q_1$ and $Q_2$. But as they are in general not known we need to explore the
latent space to find the \emph{best} values. latent space to find the \emph{best} values.
As discussed in~\cite{chabert-liddellLearningCommonStructures2024a}, the As discussed in~\cite{chabert-liddellLearningCommonStructures2024a}, the
algorithmic aspect becomes complex when dealing with the bipartite case. Due to algorithmic aspect becomes complex when dealing with the bipartite case. Due to
the size of the latent space being $\mathbb{N}^2$, conducting a complete the latent space being $\mathbb{N}^2$, conducting a complete
exploration of the latent space is practically infeasible. Therefore, in exploration of the latent space is practically infeasible. Therefore, in
addition to adapting the existing formulas, our contribution to addressing this addition to adapting the existing formulas, our contribution to addressing this
challenge involved making significant choices, which are outlined below. challenge involved making significant choices, which are outlined below.
@ -315,7 +316,7 @@ The below procedures are implemented in the \emph{colSBM} package, available on
\label{ssec:the-bic-l-criterion-for-model-selection} \label{ssec:the-bic-l-criterion-for-model-selection}
To select the best number of blocks we need a criterion to To select the best number of blocks we need a criterion to
measure adequacy between our model and data. The ELBO might seem a good measure adequacy between our model and data. The ELBO might seem a good
criterion at first but as for the likelihood, the more complex a model the criterion at first but as for the likelihood, the more complex the model, the
higher it gets. And thus a good criterion should make a \emph{trade-off} between higher it gets. And thus a good criterion should make a \emph{trade-off} between
fitting to data and model complexity. fitting to data and model complexity.
@ -340,7 +341,7 @@ well-separated blocks by imposing a penalty on the entropy of node grouping.
However, the objective of our study extends beyond grouping nodes into coherent However, the objective of our study extends beyond grouping nodes into coherent
blocks. We also aim to assess the similarity of connectivity patterns across blocks. We also aim to assess the similarity of connectivity patterns across
different networks. Consequently, we aim to permit models that offer more different networks. Consequently, we aim to permit models that offer more
flexible node grouping without penalizing entropy. flexible node grouping by not penalizing on entropy.
This leads us to formulate a BIC-like criterion in the following manner: This leads us to formulate a BIC-like criterion in the following manner:
@ -352,49 +353,49 @@ We provide below the expression for the penalties for the 4 models that we
propose. propose.
\begin{description} \begin{description}
\item[\textit{iid}-colBiSBM] For the $\bm\pi$ and $\bm\rho$: \item[\textit{iid}-colBiSBM] For the $\bm\pi$ and $\bm\rho$:
\begin{align*} \begin{align*}
\text{pen}_{\pi}(Q_1) = (Q_1 - 1)\log(\sum_{m=1}^{M}n_{1}^{m}) & , & \text{pen}_{\pi}(Q_1) = (Q_1 - 1)\log(\sum_{m=1}^{M}n_{1}^{m}) & , &
\text{pen}_{\rho}(Q_2) = (Q_2 - 1)\log(\sum_{m=1}^{M}n_{2}^{m}) \text{pen}_{\rho}(Q_2) = (Q_2 - 1)\log(\sum_{m=1}^{M}n_{2}^{m})
\end{align*} \end{align*}
For the $\bm\alpha$: For the $\bm\alpha$:
\[\text{pen}_{\alpha}(Q_1, Q_2) = Q_1 \times Q_2 \log(N_M)\] \[\text{pen}_{\alpha}(Q_1, Q_2) = Q_1 \times Q_2 \log(N_M)\]
with with
\[ N_M = \sum_{m = 1}^{M} n_{1}^{m} \times n_{2}^{m} \] \[ N_M = \sum_{m = 1}^{M} n_{1}^{m} \times n_{2}^{m} \]
And thus the $\text{BIC-L}$ formula is the following: And thus the $\text{BIC-L}$ formula is the following:
\[ \text{BIC-L}(\bm{X},Q_1, Q_2) = \max_{\theta} \[ \text{BIC-L}(\bm{X},Q_1, Q_2) = \max_{\theta}
\mathcal{J} (\mathcal{\hat{R}}, \bm{\theta}) \mathcal{J} (\mathcal{\hat{R}}, \bm{\theta})
- \frac{1}{2} [\text{pen}_{\pi}(Q_1) + \text{pen}_{\rho}(Q_2) + - \frac{1}{2} [\text{pen}_{\pi}(Q_1) + \text{pen}_{\rho}(Q_2) +
\text{pen}_{\alpha}(Q_1, Q_2)]\] \text{pen}_{\alpha}(Q_1, Q_2)]\]
\item[$\bm{\pi\rho}$-colBiSBM] The support penalties are \item[$\bm{\pi\rho}$-colBiSBM] The support penalties are
\begin{align*} \begin{align*}
\text{pen}_{S_1}(Q_1) = -2 \log p_{Q_1} (S_1) & , & \text{pen}_{S_1}(Q_1) = -2 \log p_{Q_1} (S_1) & , &
\text{pen}_{S_2}(Q_2) = -2 \log p_{Q_2} (S_2) \text{pen}_{S_2}(Q_2) = -2 \log p_{Q_2} (S_2)
\end{align*} \end{align*}
with \begin{align*} with \begin{align*}
\textstyle \log p_{Q_1}(S_1) = - M \log(Q_1) - \sum_{m=1}^{M} \log {Q_1 \textstyle \log p_{Q_1}(S_1) = - M \log(Q_1) - \sum_{m=1}^{M} \log {Q_1
\choose Q_1^{(m)}}, \\ \choose Q_1^{(m)}}, \\
\textstyle \log p_{Q_2}(S_2) = - M \log(Q_2) - \sum_{m=1}^{M} \log {Q_2 \textstyle \log p_{Q_2}(S_2) = - M \log(Q_2) - \sum_{m=1}^{M} \log {Q_2
\choose Q_2^{(m)}}. \choose Q_2^{(m)}}.
\end{align*} \end{align*}
And penalties for the $\bm\rho$ and $\bm\pi$ are And penalties for the $\bm\rho$ and $\bm\pi$ are
\[ \text{pen}_{\pi}(Q_1, S_1) = \sum_{m=1}^{M} (Q_{1}^{(m)} - 1) \[ \text{pen}_{\pi}(Q_1, S_1) = \sum_{m=1}^{M} (Q_{1}^{(m)} - 1)
\log n_{1}^{m}, \log n_{1}^{m},
~\text{pen}_{\rho}(Q_2, S_2) = \sum_{m=1}^{M} (Q_{2}^{(m)} - 1) ~\text{pen}_{\rho}(Q_2, S_2) = \sum_{m=1}^{M} (Q_{2}^{(m)} - 1)
\log n_{2}^{m}. \] \log n_{2}^{m}. \]
Penalties for the $\bm\alpha$ Penalties for the $\bm\alpha$
\[ \text{pen}_{\alpha}(Q_1, Q_2, S_1, S_2) = (\sum_{q=1}^{Q_1} \[ \text{pen}_{\alpha}(Q_1, Q_2, S_1, S_2) = (\sum_{q=1}^{Q_1}
\sum_{r=1}^{Q_2} \mathbbb{1}_{(S_1)'S_2 > 0}) \log (N_M). \] \sum_{r=1}^{Q_2} \mathbbb{1}_{(S_1)'S_2 > 0}) \log (N_M). \]
And the corresponding BIC-L formula, And the corresponding BIC-L formula,
\[ \[
\begin{aligned} \begin{aligned}
\text{BIC-L}(\bm{X},Q_1, Q_2) = \text{BIC-L}(\bm{X},Q_1, Q_2) =
\max_{S_1,S_2} [ \max_{S_1,S_2} [
& \max_{\theta_{S_1,S_2} \in \Theta_{S_1,S_2}} \mathcal{J}(\mathcal{\hat{R}},\theta_{S_1,S_2}) \\ & \max_{\theta_{S_1,S_2} \in \Theta_{S_1,S_2}} \mathcal{J}(\mathcal{\hat{R}},\theta_{S_1,S_2}) \\
- \frac{1}{2} & (\text{pen}_{\pi}(Q_1, S_1) + \text{pen}_{\rho}(Q_2, S_2) \\ - \frac{1}{2} & (\text{pen}_{\pi}(Q_1, S_1) + \text{pen}_{\rho}(Q_2, S_2) \\
& + \text{pen}_{\alpha}(Q_1, Q_2, S_1, S_2) \\ & + \text{pen}_{\alpha}(Q_1, Q_2, S_1, S_2) \\
& + \text{pen}_{S_1}(Q_1) + \text{pen}_{S_2}(Q_2))] \\ & + \text{pen}_{S_1}(Q_1) + \text{pen}_{S_2}(Q_2))] \\
\end{aligned} \end{aligned}
\] \]
\end{description} \end{description}
\subsection{Initialization and pairing of the models} \subsection{Initialization and pairing of the models}
@ -420,11 +421,11 @@ For the memberships on the rows: $row~order_m = order\left(\rho_m \times
Using this order we relabel the memberships for the $M$ fitted collection of a Using this order we relabel the memberships for the $M$ fitted collection of a
single network. single network.
We then use the $M$ memberships to fit a collection containing We then use the $M$ memberships to compute first $\bm{\tau}$ to fit a collection
the $M$ networks. containing the $M$ networks.
\subsection{Greedy exploration to find an estimation of the mode}\label{ssec:greedy-exploration-to-find-an-estimation-of-the-mode} \subsection{Greedy exploration to find an estimation of the mode}\label{ssec:greedy-exploration-to-find-an-estimation-of-the-mode}
Using the previously fitted models for $Q = (1,2)$ and $Q = (2,1)$ we choose to Using the previously fitted models for $Q = (1,2)$ and $Q = (2,1)$ we choose to
perform a greedy exploration to find a first mode. perform a greedy exploration from each of those points to find a first mode.
Meaning that for a given $Q = (Q_1, Q_2)$ we will compute all the possible Meaning that for a given $Q = (Q_1, Q_2)$ we will compute all the possible
memberships for the points $Q \in \{(Q_1 + 1, Q_2),(Q_1, Q_2 + 1),(Q_1 - 1, memberships for the points $Q \in \{(Q_1 + 1, Q_2),(Q_1, Q_2 + 1),(Q_1 - 1,
@ -432,6 +433,10 @@ memberships for the points $Q \in \{(Q_1 + 1, Q_2),(Q_1, Q_2 + 1),(Q_1 - 1,
maximizes the BIC-L as the next point from which to repeat the procedure. We maximizes the BIC-L as the next point from which to repeat the procedure. We
repeat the procedure until the BIC-L stops increasing $2$ times in a row. repeat the procedure until the BIC-L stops increasing $2$ times in a row.
Let us denote the neighborhood in the latent space of a point $Q$ by
$\mathcal{N}(Q) = Q + {(1,0), (0,1), (-1,0), (0,-1)}$, the four neighbors of $Q$
in the grid.
\begin{algorithm}[H] \begin{algorithm}[H]
\small \small
\caption{Greedy Exploration for Mode Estimation} \caption{Greedy Exploration for Mode Estimation}
@ -443,28 +448,31 @@ repeat the procedure until the BIC-L stops increasing $2$ times in a row.
\Output{Estimation of the mode using greedy exploration} \Output{Estimation of the mode using greedy exploration}
\BlankLine \BlankLine
Initialize $Q = (1,2)$ as the starting point\\ \For{$Q_{\text{start}} \in \{(1,2), (2,1)\}$}{ % and $Q = (2,1)$ as starting point
Initialize $\text{BIC-L}_{\text{max}}$ as the maximum achieved BIC-L value\\ \BlankLine
Initialize $\text{BIC-L}_{\text{max}} \leftarrow \text{BIC-L}(Q_{\text{start}})$\\
Initialize $consecutive\_count$ as 0 Initialize $consecutive\_count$ as 0
\BlankLine \BlankLine
$Q_{\text{curr}} \leftarrow Q_{\text{start}}$
\While{$consecutive\_count < 2$}{ \While{$consecutive\_count < 2$}{
Compute possible memberships for $Q \in \{(Q_1 + 1, Q_2), (Q_1, Q_2 + 1), (Q_1 - 1, Q_2), (Q_1, Q_2 - 1)\}$\; Fit models in $\mathcal{N}(Q_{\text{curr}})$\;
Fit models with the computed memberships
Choose the model with the maximum BIC-L as the next point
\BlankLine \BlankLine
\If{$\text{BIC-L} > \text{BIC-L}_{\text{max}}$}{ $Q \leftarrow \arg\max_{Q \in \mathcal{N}(Q_{\text{curr}})} \text{BIC-L}(Q)$
$\text{BIC-L}_{\text{max}} \leftarrow \text{BIC-L}$\\
$\text{BIC-L}_{\text{curr}} \leftarrow \max_{Q \in \mathcal{N}(Q_{\text{curr}})} \text{BIC-L}(Q)$
\BlankLine
\If{$\text{BIC-L}_{\text{curr}} > \text{BIC-L}_{\text{max}}$}{
$\text{BIC-L}_{\text{max}} \leftarrow \text{BIC-L}_{\text{curr}}$\\
$consecutive\_count \leftarrow 0$ $consecutive\_count \leftarrow 0$
} }
\Else{ \Else{
$consecutive\_count \leftarrow consecutive\_count + 1$ $consecutive\_count \leftarrow consecutive\_count + 1$
} }
\BlankLine
$Q \leftarrow$ Next selected point
} }
}
\BlankLine \BlankLine
\textbf{Output:} Estimation of the mode using greedy exploration \textbf{Output:} Estimation of the mode using greedy exploration
\end{algorithm} \end{algorithm}
@ -512,8 +520,7 @@ consists of two alternating steps:
\For{$Q_1 \in \left[ Q_{1,\text{center}} - \text{depth} ; Q_{1,\text{center}} + \text{depth} \right]$}{ \For{$Q_1 \in \left[ Q_{1,\text{center}} - \text{depth} ; Q_{1,\text{center}} + \text{depth} \right]$}{
\For{$Q_2 \in \left[ Q_{2,\text{center}} - \text{depth}; Q_{2,\text{center}} + \text{depth} \right] $}{ \For{$Q_2 \in \left[ Q_{2,\text{center}} - \text{depth}; Q_{2,\text{center}} + \text{depth} \right] $}{
Compute possible splits from predecessors $(Q_1 - 1, Q_2)$ and $(Q_1, Q_2 - 1)$\\ Compute possible splits from predecessors $(Q_1 - 1, Q_2)$ and $(Q_1, Q_2 - 1)$\\
Fit models with the block membership changes Among the model generated from the splits choose the best in regard of the BIC-L
Compare and keep the best model based on BIC-L
} }
} }
@ -523,13 +530,12 @@ consists of two alternating steps:
\For{$Q_1 \in \left[ Q_{1,\text{center}} + \text{depth} ; Q_{1,\text{center}} - \text{depth} \right]$}{ \For{$Q_1 \in \left[ Q_{1,\text{center}} + \text{depth} ; Q_{1,\text{center}} - \text{depth} \right]$}{
\For{$Q_2 \in \left[ Q_{2,\text{center}} + \text{depth}; Q_{2,\text{center}} - \text{depth} \right] $}{ \For{$Q_2 \in \left[ Q_{2,\text{center}} + \text{depth}; Q_{2,\text{center}} - \text{depth} \right] $}{
Compute possible merges from predecessors $(Q_1 + 1, Q_2)$ and $(Q_1, Q_2 + 1)$\\ Compute possible merges from predecessors $(Q_1 + 1, Q_2)$ and $(Q_1, Q_2 + 1)$\\
Fit models with the block membership changes Among the model generated from the merges choose the best in regard of the BIC-L
Compare and keep the best model based on BIC-L
} }
} }
\BlankLine \BlankLine
Update the best model based on the maximum BIC-L Choose the mode as the one that maximizes the BIC-L
} }
\BlankLine \BlankLine
@ -637,6 +643,7 @@ The procedure then repeats for the point at $(Q_1 + 1, Q_2)$ until it reaches
$(Q_{1,center} + depth, Q_2)$ from which it repeats from $(Q_{1,center} + depth, Q_2)$ from which it repeats from
$(Q_{1,center} - depth, Q_2 + 1)$. This repeats until computing the best model $(Q_{1,center} - depth, Q_2 + 1)$. This repeats until computing the best model
for $(Q_{1,center} + depth, Q_{2,center} + depth)$. for $(Q_{1,center} + depth, Q_{2,center} + depth)$.
\textit{Note on the initialization:} The forward pass starts from the point \textit{Note on the initialization:} The forward pass starts from the point
$(Q_{1,center} + depth, Q_{2,center} + depth)$, so this points needs to have at $(Q_{1,center} + depth, Q_{2,center} + depth)$, so this points needs to have at
least a model fitted. In the best case, the greedy exploration will have visited least a model fitted. In the best case, the greedy exploration will have visited
@ -663,7 +670,7 @@ $(Q_{1,center} + depth, Q_{2,center} + depth)$, we know it was initialized at
least by the forward pass, no special case here.\\ least by the forward pass, no special case here.\\
At the end of the moving window pass, the model of max BIC-L is the new best At the end of the moving window pass, the model of max BIC-L is the new best
fit and the procedure can repeat until convergence. fit and the procedure repeats until convergence.
\section{Networks clustering} \section{Networks clustering}
\label{sec:networks-clustering} \label{sec:networks-clustering}
@ -752,7 +759,7 @@ trivial partition in a unique group.
Then using the \emph{Kmeans} we split the collection in two sub-collections Then using the \emph{Kmeans} we split the collection in two sub-collections
with the dissimilarity matrix. The two sub-collections are fitted and we with the dissimilarity matrix. The two sub-collections are fitted and we
compute the score of this new partition $\mathcal{G}^{*} = \{G_1, G_2\}$. compute the score of this new partition $\mathcal{G}^{*} = \{G_1, G_2\}$.
If $Sc(\mathcal{G}^{*}) > Sc(\mathcal{G})$ then we repeat the same procedure on If $Sc(\mathcal{G}^{*}) > Sc(\mathcal{G})$, we repeat the same procedure on
$G_1$ and $G_2$. Else we return $\mathcal{G}$. $G_1$ and $G_2$. Else we return $\mathcal{G}$.
We illustrate our capacity to perform a partition of a collection for all We illustrate our capacity to perform a partition of a collection for all
colBiSBM models in~\ref{sec:network-clustering-of-simulated-networks}. colBiSBM models in~\ref{sec:network-clustering-of-simulated-networks}.
@ -772,11 +779,11 @@ we obtain the following result of identifiability\footnote{The proof is in appen
\begin{itemize} \begin{itemize}
\item[(1.1)] $\exists m^*\in\{1,\dots,M\} : n^1_{m^*} \geq 2 Q_2 - 1~\text{and}~n^2_{m^*} \geq 2 Q_1 - 1$. \item[(1.1)] $\exists m^*\in\{1,\dots,M\} : n^1_{m^*} \geq 2 Q_2 - 1~\text{and}~n^2_{m^*} \geq 2 Q_1 - 1$.
\item[(1.2)] $\forall 1\leq q \leq Q_1, \pi_q > 0$ \item[(1.2)] $\forall 1\leq q \leq Q_1, \pi_q > 0$
and the coordinates of vector $\bm{\rho} and the coordinates of vector $\bm{\rho}
{X^{m^*}}^T$ are distinct (where ${X^{m^*}}^T$ is the transpose of $X^{m^*}$). {X^{m^*}}^T$ are distinct (where ${X^{m^*}}^T$ is the transpose of $X^{m^*}$).
\item[(1.3)] $\forall 1\leq r \leq Q_2, \rho_r > 0$ \item[(1.3)] $\forall 1\leq r \leq Q_2, \rho_r > 0$
and the coordinates of vector $\bm{\pi} and the coordinates of vector $\bm{\pi}
X^{m^*}$ are distinct. X^{m^*}$ are distinct.
\end{itemize} \end{itemize}
\end{theorem} \end{theorem}

Binary file not shown.

View file

@ -22,8 +22,8 @@ Maxime.
Merci à tous les permanents du 3\ieme étage, parmi lesquels: Christophe, Merci à tous les permanents du 3\ieme étage, parmi lesquels: Christophe,
Stéphane et Vincent. Stéphane et Vincent.
Merci à Hugo, Théodore, Éric, Jean-Benoist, Nicolas, Tristan, Sarah, Jade et Merci à Liliane, Isabelle, Hugo, Théodore, Éric, Jean-Benoist, Nicolas, Lucia,
Pierre Gloaguen. Tristan, Sarah, Jade et Pierre Gloaguen.
Un grand merci à tous ceux qui ont participé de près ou de loin au bon Un grand merci à tous ceux qui ont participé de près ou de loin au bon
déroulement de ce stage. déroulement de ce stage.