structure detection : relecture
This commit is contained in:
parent
563ef8ede3
commit
57410d1b4a
3 changed files with 105 additions and 98 deletions
|
|
@ -7,8 +7,8 @@
|
||||||
We define a collection of bipartite networks as
|
We define a collection of bipartite networks as
|
||||||
$\bm{X} = (X^1,\dots X^m,\dots, X^M)$
|
$\bm{X} = (X^1,\dots X^m,\dots, X^M)$
|
||||||
the collection of incidence matrix. Moreover, all the networks in the
|
the collection of incidence matrix. Moreover, all the networks in the
|
||||||
collection have the same type of interaction (e.g., all interactions are
|
collection have the same valuation of the interactions (e.g., they are
|
||||||
binary).
|
all binary).
|
||||||
|
|
||||||
\section{Separate BiSBM (sep-BiSBM)}\label{sec:separate-bisbm-sepbisbm}
|
\section{Separate BiSBM (sep-BiSBM)}\label{sec:separate-bisbm-sepbisbm}
|
||||||
|
|
||||||
|
|
@ -51,21 +51,21 @@ Equations~\eqref{eqn:lbm-block-membership-prob},
|
||||||
\eqref{eqn:lbm-conditional-to-latent} and \eqref{eqn:lbm-emission} defines the
|
\eqref{eqn:lbm-conditional-to-latent} and \eqref{eqn:lbm-emission} defines the
|
||||||
BiSBM model and we will now use a short notation:
|
BiSBM model and we will now use a short notation:
|
||||||
|
|
||||||
\begin{equation}
|
\begin{align}
|
||||||
\tag{\emph{sep-BiSBM}}
|
\tag{\emph{sep-BiSBM}}
|
||||||
X^m \sim \mathcal{F}\text{-BiSBM}_{n_1^m,n_2^m}(Q_1^m, Q_2^m, \bm{\pi^m}, \bm{\rho^m}, \bm{\alpha^m})
|
X^m \sim \mathcal{F}\text{-BiSBM}_{n_1^m,n_2^m}(Q_1^m, Q_2^m, \bm{\pi^m}, \bm{\rho^m}, \bm{\alpha^m}) & & \forall m = 1, \dots M
|
||||||
\end{equation}
|
\end{align}
|
||||||
where $\mathcal{F}$ encodes the emission distribution, $n_1^m,n_2^m$ are the row
|
where $\mathcal{F}$ encodes the emission distribution, $n_1^m,n_2^m$ are the number of row
|
||||||
and column nodes, $Q_1^m, Q_2^m$ are the number of row and column blocks in
|
and column nodes, $Q_1^m, Q_2^m$ are the number of row and column blocks in
|
||||||
network $m$, $\bm{\pi}^m~=~{(\pi^m_q)}_{q=1,\dots,Q_1^m}$ and
|
network $m$, $\bm{\pi}^m~=~{(\pi^m_q)}_{q=1,\dots,Q_1^m}$ and
|
||||||
$\bm{\rho}^m~=~{(\rho^m_r)}_{r=1,\dots,Q_2^m}$ are the vectors of their
|
$\bm{\rho}^m~=~{(\rho^m_r)}_{r=1,\dots,Q_2^m}$ are the vectors of their
|
||||||
proportions. The $Q_1^m \times Q_2^m$ matrix
|
proportions. The $Q_1^m \times Q_2^m$ matrix
|
||||||
$\bm{\alpha}^m = {(\alpha^m_{qr})}_{\substack{q = 1,\dots,Q_1^m \\ r = 1,\dots,Q_2^m}}$
|
$\bm{\alpha}^m = {(\alpha^m_{qr})}_{\substack{q = 1,\dots,Q_1^m \\ r = 1,\dots,Q_2^m}}$
|
||||||
are the connectivity parameters, the parameters of the emission distribution.
|
are the connectivity parameters, i.e.~the parameters of the emission distribution.
|
||||||
$\alpha^m_{qr}\in\mathcal{A}_{\mathcal{F}}$ where, for the Bernoulli
|
$\alpha^m_{qr}\in\mathcal{A}_{\mathcal{F}}$ where, for the Bernoulli
|
||||||
(resp. Poisson) emission distribution, $\mathcal{A}_{\mathcal{F}} = (0,1)$ (resp.
|
(resp. Poisson) emission distribution, $\mathcal{A}_{\mathcal{F}} = (0,1)$ (resp.
|
||||||
$\mathcal{A}_{\mathcal{F}} = \mathbb{R}^{*+}$). In this $sep$-$BiSBM$ each
|
$\mathcal{A}_{\mathcal{F}} = \mathbb{R}^{*+}$). In this $sep$-BiSBM model each
|
||||||
network $m$ is assumed to follow a $BiSBM$ with its own parameters ($\bm{\pi}^m,
|
network $m$ is assumed to follow a BiSBM with its own parameters ($\bm{\pi}^m,
|
||||||
\bm{\rho}^m, \bm{\alpha}^m$).
|
\bm{\rho}^m, \bm{\alpha}^m$).
|
||||||
% DONE Finish explaining
|
% DONE Finish explaining
|
||||||
|
|
||||||
|
|
@ -76,7 +76,7 @@ network $m$ is assumed to follow a $BiSBM$ with its own parameters ($\bm{\pi}^m,
|
||||||
\subsection{A collection of iid bipartite SBM}\label{ssec:a-collection-of-i-i-d-bipartite-sbm}
|
\subsection{A collection of iid bipartite SBM}\label{ssec:a-collection-of-i-i-d-bipartite-sbm}
|
||||||
As for \emph{colSBM} this first model is the most constrained. It assumes that
|
As for \emph{colSBM} this first model is the most constrained. It assumes that
|
||||||
all the networks are the independent realizations of the same $Q_1$-$Q_2$-BiSBM
|
all the networks are the independent realizations of the same $Q_1$-$Q_2$-BiSBM
|
||||||
with identical parameters. The \emph{iid-colBiSBM} is defined as follows:
|
with identical parameters. The \emph{iid}-colBiSBM is defined as follows:
|
||||||
|
|
||||||
\begin{align}
|
\begin{align}
|
||||||
\tag{\emph{iid}-colBiSBM}
|
\tag{\emph{iid}-colBiSBM}
|
||||||
|
|
@ -85,7 +85,7 @@ with identical parameters. The \emph{iid-colBiSBM} is defined as follows:
|
||||||
where $\forall (q,r) \in \{1,\dots,Q_1\}\times\{1,\dots,Q_2\}$, $\alpha_{qr} \in \mathcal{A}_{\mathcal{F}}$,
|
where $\forall (q,r) \in \{1,\dots,Q_1\}\times\{1,\dots,Q_2\}$, $\alpha_{qr} \in \mathcal{A}_{\mathcal{F}}$,
|
||||||
$\pi_q \in \left( 0,1 \right], \sum_{q=1}^{Q_1} \pi_q = 1 $ and $\rho_r \in \left( 0,1 \right], \sum_{r=1}^{Q_2} \rho_r = 1 $.
|
$\pi_q \in \left( 0,1 \right], \sum_{q=1}^{Q_1} \pi_q = 1 $ and $\rho_r \in \left( 0,1 \right], \sum_{r=1}^{Q_2} \rho_r = 1 $.
|
||||||
This model involves $(Q_1 - 1) + (Q_2 - 1) + Q_1\times Q_2$ parameters, the two
|
This model involves $(Q_1 - 1) + (Q_2 - 1) + Q_1\times Q_2$ parameters, the two
|
||||||
first terms corresponding to block proportions on the row and column dimensions
|
first terms corresponding to block proportions on the row and column
|
||||||
and the third term to connectivity parameters.
|
and the third term to connectivity parameters.
|
||||||
|
|
||||||
But the assumption that block proportions are the same among the networks is a
|
But the assumption that block proportions are the same among the networks is a
|
||||||
|
|
@ -106,9 +106,9 @@ block proportions. For $m \in \{1,\dots,M\}$, the $X^m$ are independent and
|
||||||
\end{align}
|
\end{align}
|
||||||
where $\forall (q,r) \in \{1,\dots,Q_1\}\times\{1,\dots,Q_2\}$, $\alpha_{qr} \in \mathcal{A}_{\mathcal{F}}$,
|
where $\forall (q,r) \in \{1,\dots,Q_1\}\times\{1,\dots,Q_2\}$, $\alpha_{qr} \in \mathcal{A}_{\mathcal{F}}$,
|
||||||
$\pi^m_q \in \left[ 0,1 \right], \sum_{q=1}^{Q_1} \pi^m_q~=~1, \forall m \in \{1,\dots,M\}$ and $\rho_r \in \left( 0,1 \right], \sum_{r=1}^{Q_2} \rho_r = 1 $.
|
$\pi^m_q \in \left[ 0,1 \right], \sum_{q=1}^{Q_1} \pi^m_q~=~1, \forall m \in \{1,\dots,M\}$ and $\rho_r \in \left( 0,1 \right], \sum_{r=1}^{Q_2} \rho_r = 1 $.
|
||||||
This model is more flexible than the iid-colBiSBM as it allows some row block
|
This model is more flexible than the iid-colBiSBM as it allows the row block
|
||||||
proportions to be null
|
proportions to vary between networks and even to be null
|
||||||
in certain networks ($\pi^m_q\in\left[ 0,1 \right]$): if $\pi_q^m = 0$ then the
|
($\pi^m_q\in\left[ 0,1 \right]$): if $\pi_q^m = 0$ then the
|
||||||
block $q$ is not represented in the network $m$. The connectivity structure is
|
block $q$ is not represented in the network $m$. The connectivity structure is
|
||||||
thus a subset of a large connectivity structure common to all networks. We face
|
thus a subset of a large connectivity structure common to all networks. We face
|
||||||
the same problems as~\cite{chabert-liddellLearningCommonStructures2024a} and
|
the same problems as~\cite{chabert-liddellLearningCommonStructures2024a} and
|
||||||
|
|
@ -139,9 +139,9 @@ block proportions. For $m \in \{1,\dots,M\}$, the $X^m$ are independent and
|
||||||
where $\forall (q,r) \in \{1,\dots,Q_1\}\times\{1,\dots,Q_2\}$, $\alpha_{qr} \in \mathcal{A}_{\mathcal{F}}$,
|
where $\forall (q,r) \in \{1,\dots,Q_1\}\times\{1,\dots,Q_2\}$, $\alpha_{qr} \in \mathcal{A}_{\mathcal{F}}$,
|
||||||
$\pi_q \in \left( 0,1 \right], \sum_{q=1}^{Q_1} \pi_q = 1 $ and
|
$\pi_q \in \left( 0,1 \right], \sum_{q=1}^{Q_1} \pi_q = 1 $ and
|
||||||
$\rho^m_r \in \left[ 0,1 \right], \sum_{r=1}^{Q_2} \rho^m_r = 1 $.
|
$\rho^m_r \in \left[ 0,1 \right], \sum_{r=1}^{Q_2} \rho^m_r = 1 $.
|
||||||
This model is more flexible than the iid-colBiSBM as it allows some column block
|
This model is more flexible than the iid-colBiSBM as it allows
|
||||||
proportions to be
|
proportions to vary between networks and even to be null
|
||||||
null in certain networks ($\rho^m_r\in\left[ 0,1 \right]$): if $\rho_r^m = 0$
|
($\rho^m_r\in\left[ 0,1 \right]$): if $\rho_r^m = 0$
|
||||||
then the column block $r$ is not represented in the network $m$.
|
then the column block $r$ is not represented in the network $m$.
|
||||||
|
|
||||||
\enquote{Mirroring} the formulas for the $\pi$-colBiSBM we relax the constraints on
|
\enquote{Mirroring} the formulas for the $\pi$-colBiSBM we relax the constraints on
|
||||||
|
|
@ -155,7 +155,7 @@ case the matrix full of ones), the number of parameters is:
|
||||||
|
|
||||||
$\pi\rho$-colBiSBM model still assumes that the networks share a common connectivity
|
$\pi\rho$-colBiSBM model still assumes that the networks share a common connectivity
|
||||||
structure represented by $\bm{\alpha}$ but that each network has its own row and
|
structure represented by $\bm{\alpha}$ but that each network has its own row and
|
||||||
column block proportions, it is the less constrained model.
|
column block proportions, it is the least constrained model.
|
||||||
For $m \in \{1,\dots,M\}$, the $X^m$ are independent and
|
For $m \in \{1,\dots,M\}$, the $X^m$ are independent and
|
||||||
\begin{align}
|
\begin{align}
|
||||||
\tag{\emph{$\pi\rho$}-colBiSBM}
|
\tag{\emph{$\pi\rho$}-colBiSBM}
|
||||||
|
|
@ -204,22 +204,23 @@ we have: $\mathbb{P}_{\mathcal{R}_m} (Z_{iq}^m = 1, W_{jr}^m = 1|X^m) =
|
||||||
|
|
||||||
The formula for the entropy per network is thus:
|
The formula for the entropy per network is thus:
|
||||||
\begin{equation*}
|
\begin{equation*}
|
||||||
\mathcal{H}(\mathcal{R}_m) = - \sum_{i=1}^{n_1^m} \tau^{1,m}_{i,q} \log \tau^{1,m}_{i,q} - \sum_{j=1}^{n_2^m} \tau^{2,m}_{j,r} \log \tau^{2,m}_{j,r}
|
\mathcal{H}(\mathcal{R}_m) = - \sum_{i=1}^{n_1^m} \tau_{iq}^{1,m} \log \tau_{iq}^{1,m} - \sum_{j=1}^{n_2^m} \tau_{jr}^{2,m} \log \tau_{jr}^{2,m}
|
||||||
\end{equation*}
|
\end{equation*}
|
||||||
|
|
||||||
And the expectation of the completed log-likelihood under the $\mathcal{R}_m$
|
And the expectation of the completed log-likelihood under the $\mathcal{R}_m$
|
||||||
variational distribution for network $m$ is:
|
variational distribution for network $m$ is:
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\mathbb{E}_{\mathcal{R}_m}[\ell(X^m,Z^m,W^m;\bm{\theta})] = \sum_{i = 1}^{n_1^m}\sum_{j=1}^{n_2^m}\sum_{q \in \mathcal{Q}_{1,m}} \sum_{r \in \mathcal{Q}_{2,m}} \tau^{1,m}_{i,q} \tau^{2,m}_{j,r} \log f(X^{m}_{ij}; \alpha_{qr}) \\
|
\mathbb{E}_{\mathcal{R}_m}[\ell(X^m,Z^m,W^m;\bm{\theta})] = \sum_{i = 1}^{n_1^m}\sum_{j=1}^{n_2^m}\sum_{q \in \mathcal{Q}_1^m} \sum_{r \in \mathcal{Q}_2^m} \tau_{iq}^{1,m} \tau_{jr}^{2,m} \log f(X^{m}_{ij}; \alpha_{qr}) \\
|
||||||
+ \sum_{i=1}^{n_1^m} \sum_{q \in \mathcal{Q}_{1,m}} \tau^{1,m}_{i,q} \log \pi_{\color{black}q}^{\color{gray}m} + \sum_{j=1}^{n_2^m} \sum_{r \in \mathcal{Q}_{2,m}} \tau^{2,m}_{j,r} \log \rho_{\color{black}r}^{\color{gray}m}
|
+ \sum_{i=1}^{n_1^m} \sum_{q \in \mathcal{Q}_1^m} \tau_{iq}^{1,m} \log \pi_{\color{black}q}^{\color{gray}m} + \sum_{j=1}^{n_2^m} \sum_{r \in \mathcal{Q}_2^m} \tau_{jr}^{2,m} \log \rho_{\color{black}r}^{\color{gray}m}
|
||||||
\end{align*}
|
\end{align*}
|
||||||
|
with $\mathcal{Q}_1^m = \{q\in \{1 \dots, Q_1\}|\pi_q^m > 0\}$ and
|
||||||
|
$\mathcal{Q}_2^m = \{r\in \{1 \dots, Q_2\}|\rho_r^m > 0\}$
|
||||||
And thus the lower bound becomes:
|
And thus the lower bound becomes:
|
||||||
|
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\mathcal{J}(\bm{\tau};\bm{\theta}) \coloneqq \sum_{m=1}^{M} \bigg(\sum_{i = 1}^{n_1^m}\sum_{j=1}^{n_2^m}\sum_{q \in \mathcal{Q}_{1,m}} \sum_{r \in \mathcal{Q}_{2,m}} \tau^{1,m}_{i,q} \tau^{2,m}_{j,r} \log f(X^{m}_{ij}; \alpha_{qr}) \\
|
\mathcal{J}(\bm{\tau};\bm{\theta}) \coloneqq \sum_{m=1}^{M} \bigg(\sum_{i = 1}^{n_1^m}\sum_{j=1}^{n_2^m}\sum_{q \in \mathcal{Q}_1^m} \sum_{r \in \mathcal{Q}_2^m} \tau_{iq}^{1,m} \tau_{jr}^{2,m} \log f(X^{m}_{ij}; \alpha_{qr}) \\
|
||||||
+ \sum_{i=1}^{n_1^m} \sum_{q \in \mathcal{Q}_{1,m}} \tau^{1,m}_{i,q} \log \pi_{\color{black}q}^{\color{gray}m} + \sum_{j=1}^{n_2^m} \sum_{r \in \mathcal{Q}_{2,m}} \tau^{2,m}_{j,r} \log \rho_{\color{black}r}^{\color{gray}m} \\
|
+ \sum_{i=1}^{n_1^m} \sum_{q \in \mathcal{Q}_1^m} \tau_{iq}^{1,m} \log \pi_{\color{black}q}^{\color{gray}m} + \sum_{j=1}^{n_2^m} \sum_{r \in \mathcal{Q}_2^m} \tau_{jr}^{2,m} \log \rho_{\color{black}r}^{\color{gray}m} \\
|
||||||
- \sum_{i=1}^{n_1^m} \tau^{1,m}_{i,q} \log \tau^{1,m}_{i,q} - \sum_{j=1}^{n_2^m} \tau^{2,m}_{j,r} \log \tau^{2,m}_{j,r} \bigg) \color{black}
|
- \sum_{i=1}^{n_1^m} \tau_{iq}^{1,m} \log \tau_{iq}^{1,m} - \sum_{j=1}^{n_2^m} \tau_{jr}^{2,m} \log \tau_{jr}^{2,m} \bigg) \color{black}
|
||||||
\end{align*}
|
\end{align*}
|
||||||
|
|
||||||
where we identify the variational distribution $\mathcal{R}$ with its parameter
|
where we identify the variational distribution $\mathcal{R}$ with its parameter
|
||||||
|
|
@ -284,8 +285,8 @@ while on the other hand,
|
||||||
\end{align*}
|
\end{align*}
|
||||||
the parameters take into account all the networks at the same time.
|
the parameters take into account all the networks at the same time.
|
||||||
The connectivity parameters $\alpha_{qr}$ for all models are estimated as the
|
The connectivity parameters $\alpha_{qr}$ for all models are estimated as the
|
||||||
ratio of the number of interactions between row block $q$ and column block $r$
|
ratio of the number of observed interactions between row block $q$ and column block $r$
|
||||||
among all networks over the number of number of possible interactions:
|
among all networks over the number of possible interactions:
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\widehat{\alpha}_{qr} = \frac{\sum_{m=1}^{M} e^{m}_{qr}}{\sum_{m=1}^{M} n^{m}_{qr}}
|
\widehat{\alpha}_{qr} = \frac{\sum_{m=1}^{M} e^{m}_{qr}}{\sum_{m=1}^{M} n^{m}_{qr}}
|
||||||
\end{align*}
|
\end{align*}
|
||||||
|
|
@ -303,7 +304,7 @@ $Q_1$ and $Q_2$. But as they are in general not known we need to explore the
|
||||||
latent space to find the \emph{best} values.
|
latent space to find the \emph{best} values.
|
||||||
As discussed in~\cite{chabert-liddellLearningCommonStructures2024a}, the
|
As discussed in~\cite{chabert-liddellLearningCommonStructures2024a}, the
|
||||||
algorithmic aspect becomes complex when dealing with the bipartite case. Due to
|
algorithmic aspect becomes complex when dealing with the bipartite case. Due to
|
||||||
the size of the latent space being $\mathbb{N}^2$, conducting a complete
|
the latent space being $\mathbb{N}^2$, conducting a complete
|
||||||
exploration of the latent space is practically infeasible. Therefore, in
|
exploration of the latent space is practically infeasible. Therefore, in
|
||||||
addition to adapting the existing formulas, our contribution to addressing this
|
addition to adapting the existing formulas, our contribution to addressing this
|
||||||
challenge involved making significant choices, which are outlined below.
|
challenge involved making significant choices, which are outlined below.
|
||||||
|
|
@ -315,7 +316,7 @@ The below procedures are implemented in the \emph{colSBM} package, available on
|
||||||
\label{ssec:the-bic-l-criterion-for-model-selection}
|
\label{ssec:the-bic-l-criterion-for-model-selection}
|
||||||
To select the best number of blocks we need a criterion to
|
To select the best number of blocks we need a criterion to
|
||||||
measure adequacy between our model and data. The ELBO might seem a good
|
measure adequacy between our model and data. The ELBO might seem a good
|
||||||
criterion at first but as for the likelihood, the more complex a model the
|
criterion at first but as for the likelihood, the more complex the model, the
|
||||||
higher it gets. And thus a good criterion should make a \emph{trade-off} between
|
higher it gets. And thus a good criterion should make a \emph{trade-off} between
|
||||||
fitting to data and model complexity.
|
fitting to data and model complexity.
|
||||||
|
|
||||||
|
|
@ -340,7 +341,7 @@ well-separated blocks by imposing a penalty on the entropy of node grouping.
|
||||||
However, the objective of our study extends beyond grouping nodes into coherent
|
However, the objective of our study extends beyond grouping nodes into coherent
|
||||||
blocks. We also aim to assess the similarity of connectivity patterns across
|
blocks. We also aim to assess the similarity of connectivity patterns across
|
||||||
different networks. Consequently, we aim to permit models that offer more
|
different networks. Consequently, we aim to permit models that offer more
|
||||||
flexible node grouping without penalizing entropy.
|
flexible node grouping by not penalizing on entropy.
|
||||||
|
|
||||||
This leads us to formulate a BIC-like criterion in the following manner:
|
This leads us to formulate a BIC-like criterion in the following manner:
|
||||||
|
|
||||||
|
|
@ -352,49 +353,49 @@ We provide below the expression for the penalties for the 4 models that we
|
||||||
propose.
|
propose.
|
||||||
\begin{description}
|
\begin{description}
|
||||||
\item[\textit{iid}-colBiSBM] For the $\bm\pi$ and $\bm\rho$:
|
\item[\textit{iid}-colBiSBM] For the $\bm\pi$ and $\bm\rho$:
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\text{pen}_{\pi}(Q_1) = (Q_1 - 1)\log(\sum_{m=1}^{M}n_{1}^{m}) & , &
|
\text{pen}_{\pi}(Q_1) = (Q_1 - 1)\log(\sum_{m=1}^{M}n_{1}^{m}) & , &
|
||||||
\text{pen}_{\rho}(Q_2) = (Q_2 - 1)\log(\sum_{m=1}^{M}n_{2}^{m})
|
\text{pen}_{\rho}(Q_2) = (Q_2 - 1)\log(\sum_{m=1}^{M}n_{2}^{m})
|
||||||
\end{align*}
|
\end{align*}
|
||||||
For the $\bm\alpha$:
|
For the $\bm\alpha$:
|
||||||
\[\text{pen}_{\alpha}(Q_1, Q_2) = Q_1 \times Q_2 \log(N_M)\]
|
\[\text{pen}_{\alpha}(Q_1, Q_2) = Q_1 \times Q_2 \log(N_M)\]
|
||||||
with
|
with
|
||||||
\[ N_M = \sum_{m = 1}^{M} n_{1}^{m} \times n_{2}^{m} \]
|
\[ N_M = \sum_{m = 1}^{M} n_{1}^{m} \times n_{2}^{m} \]
|
||||||
And thus the $\text{BIC-L}$ formula is the following:
|
And thus the $\text{BIC-L}$ formula is the following:
|
||||||
\[ \text{BIC-L}(\bm{X},Q_1, Q_2) = \max_{\theta}
|
\[ \text{BIC-L}(\bm{X},Q_1, Q_2) = \max_{\theta}
|
||||||
\mathcal{J} (\mathcal{\hat{R}}, \bm{\theta})
|
\mathcal{J} (\mathcal{\hat{R}}, \bm{\theta})
|
||||||
- \frac{1}{2} [\text{pen}_{\pi}(Q_1) + \text{pen}_{\rho}(Q_2) +
|
- \frac{1}{2} [\text{pen}_{\pi}(Q_1) + \text{pen}_{\rho}(Q_2) +
|
||||||
\text{pen}_{\alpha}(Q_1, Q_2)]\]
|
\text{pen}_{\alpha}(Q_1, Q_2)]\]
|
||||||
\item[$\bm{\pi\rho}$-colBiSBM] The support penalties are
|
\item[$\bm{\pi\rho}$-colBiSBM] The support penalties are
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\text{pen}_{S_1}(Q_1) = -2 \log p_{Q_1} (S_1) & , &
|
\text{pen}_{S_1}(Q_1) = -2 \log p_{Q_1} (S_1) & , &
|
||||||
\text{pen}_{S_2}(Q_2) = -2 \log p_{Q_2} (S_2)
|
\text{pen}_{S_2}(Q_2) = -2 \log p_{Q_2} (S_2)
|
||||||
\end{align*}
|
\end{align*}
|
||||||
with \begin{align*}
|
with \begin{align*}
|
||||||
\textstyle \log p_{Q_1}(S_1) = - M \log(Q_1) - \sum_{m=1}^{M} \log {Q_1
|
\textstyle \log p_{Q_1}(S_1) = - M \log(Q_1) - \sum_{m=1}^{M} \log {Q_1
|
||||||
\choose Q_1^{(m)}}, \\
|
\choose Q_1^{(m)}}, \\
|
||||||
\textstyle \log p_{Q_2}(S_2) = - M \log(Q_2) - \sum_{m=1}^{M} \log {Q_2
|
\textstyle \log p_{Q_2}(S_2) = - M \log(Q_2) - \sum_{m=1}^{M} \log {Q_2
|
||||||
\choose Q_2^{(m)}}.
|
\choose Q_2^{(m)}}.
|
||||||
\end{align*}
|
\end{align*}
|
||||||
And penalties for the $\bm\rho$ and $\bm\pi$ are
|
And penalties for the $\bm\rho$ and $\bm\pi$ are
|
||||||
\[ \text{pen}_{\pi}(Q_1, S_1) = \sum_{m=1}^{M} (Q_{1}^{(m)} - 1)
|
\[ \text{pen}_{\pi}(Q_1, S_1) = \sum_{m=1}^{M} (Q_{1}^{(m)} - 1)
|
||||||
\log n_{1}^{m},
|
\log n_{1}^{m},
|
||||||
~\text{pen}_{\rho}(Q_2, S_2) = \sum_{m=1}^{M} (Q_{2}^{(m)} - 1)
|
~\text{pen}_{\rho}(Q_2, S_2) = \sum_{m=1}^{M} (Q_{2}^{(m)} - 1)
|
||||||
\log n_{2}^{m}. \]
|
\log n_{2}^{m}. \]
|
||||||
Penalties for the $\bm\alpha$
|
Penalties for the $\bm\alpha$
|
||||||
\[ \text{pen}_{\alpha}(Q_1, Q_2, S_1, S_2) = (\sum_{q=1}^{Q_1}
|
\[ \text{pen}_{\alpha}(Q_1, Q_2, S_1, S_2) = (\sum_{q=1}^{Q_1}
|
||||||
\sum_{r=1}^{Q_2} \mathbbb{1}_{(S_1)'S_2 > 0}) \log (N_M). \]
|
\sum_{r=1}^{Q_2} \mathbbb{1}_{(S_1)'S_2 > 0}) \log (N_M). \]
|
||||||
And the corresponding BIC-L formula,
|
And the corresponding BIC-L formula,
|
||||||
\[
|
\[
|
||||||
\begin{aligned}
|
\begin{aligned}
|
||||||
\text{BIC-L}(\bm{X},Q_1, Q_2) =
|
\text{BIC-L}(\bm{X},Q_1, Q_2) =
|
||||||
\max_{S_1,S_2} [
|
\max_{S_1,S_2} [
|
||||||
& \max_{\theta_{S_1,S_2} \in \Theta_{S_1,S_2}} \mathcal{J}(\mathcal{\hat{R}},\theta_{S_1,S_2}) \\
|
& \max_{\theta_{S_1,S_2} \in \Theta_{S_1,S_2}} \mathcal{J}(\mathcal{\hat{R}},\theta_{S_1,S_2}) \\
|
||||||
- \frac{1}{2} & (\text{pen}_{\pi}(Q_1, S_1) + \text{pen}_{\rho}(Q_2, S_2) \\
|
- \frac{1}{2} & (\text{pen}_{\pi}(Q_1, S_1) + \text{pen}_{\rho}(Q_2, S_2) \\
|
||||||
& + \text{pen}_{\alpha}(Q_1, Q_2, S_1, S_2) \\
|
& + \text{pen}_{\alpha}(Q_1, Q_2, S_1, S_2) \\
|
||||||
& + \text{pen}_{S_1}(Q_1) + \text{pen}_{S_2}(Q_2))] \\
|
& + \text{pen}_{S_1}(Q_1) + \text{pen}_{S_2}(Q_2))] \\
|
||||||
\end{aligned}
|
\end{aligned}
|
||||||
\]
|
\]
|
||||||
\end{description}
|
\end{description}
|
||||||
|
|
||||||
\subsection{Initialization and pairing of the models}
|
\subsection{Initialization and pairing of the models}
|
||||||
|
|
@ -420,11 +421,11 @@ For the memberships on the rows: $row~order_m = order\left(\rho_m \times
|
||||||
|
|
||||||
Using this order we relabel the memberships for the $M$ fitted collection of a
|
Using this order we relabel the memberships for the $M$ fitted collection of a
|
||||||
single network.
|
single network.
|
||||||
We then use the $M$ memberships to fit a collection containing
|
We then use the $M$ memberships to compute first $\bm{\tau}$ to fit a collection
|
||||||
the $M$ networks.
|
containing the $M$ networks.
|
||||||
\subsection{Greedy exploration to find an estimation of the mode}\label{ssec:greedy-exploration-to-find-an-estimation-of-the-mode}
|
\subsection{Greedy exploration to find an estimation of the mode}\label{ssec:greedy-exploration-to-find-an-estimation-of-the-mode}
|
||||||
Using the previously fitted models for $Q = (1,2)$ and $Q = (2,1)$ we choose to
|
Using the previously fitted models for $Q = (1,2)$ and $Q = (2,1)$ we choose to
|
||||||
perform a greedy exploration to find a first mode.
|
perform a greedy exploration from each of those points to find a first mode.
|
||||||
|
|
||||||
Meaning that for a given $Q = (Q_1, Q_2)$ we will compute all the possible
|
Meaning that for a given $Q = (Q_1, Q_2)$ we will compute all the possible
|
||||||
memberships for the points $Q \in \{(Q_1 + 1, Q_2),(Q_1, Q_2 + 1),(Q_1 - 1,
|
memberships for the points $Q \in \{(Q_1 + 1, Q_2),(Q_1, Q_2 + 1),(Q_1 - 1,
|
||||||
|
|
@ -432,6 +433,10 @@ memberships for the points $Q \in \{(Q_1 + 1, Q_2),(Q_1, Q_2 + 1),(Q_1 - 1,
|
||||||
maximizes the BIC-L as the next point from which to repeat the procedure. We
|
maximizes the BIC-L as the next point from which to repeat the procedure. We
|
||||||
repeat the procedure until the BIC-L stops increasing $2$ times in a row.
|
repeat the procedure until the BIC-L stops increasing $2$ times in a row.
|
||||||
|
|
||||||
|
Let us denote the neighborhood in the latent space of a point $Q$ by
|
||||||
|
$\mathcal{N}(Q) = Q + {(1,0), (0,1), (-1,0), (0,-1)}$, the four neighbors of $Q$
|
||||||
|
in the grid.
|
||||||
|
|
||||||
\begin{algorithm}[H]
|
\begin{algorithm}[H]
|
||||||
\small
|
\small
|
||||||
\caption{Greedy Exploration for Mode Estimation}
|
\caption{Greedy Exploration for Mode Estimation}
|
||||||
|
|
@ -443,28 +448,31 @@ repeat the procedure until the BIC-L stops increasing $2$ times in a row.
|
||||||
\Output{Estimation of the mode using greedy exploration}
|
\Output{Estimation of the mode using greedy exploration}
|
||||||
|
|
||||||
\BlankLine
|
\BlankLine
|
||||||
Initialize $Q = (1,2)$ as the starting point\\
|
\For{$Q_{\text{start}} \in \{(1,2), (2,1)\}$}{ % and $Q = (2,1)$ as starting point
|
||||||
Initialize $\text{BIC-L}_{\text{max}}$ as the maximum achieved BIC-L value\\
|
\BlankLine
|
||||||
|
Initialize $\text{BIC-L}_{\text{max}} \leftarrow \text{BIC-L}(Q_{\text{start}})$\\
|
||||||
Initialize $consecutive\_count$ as 0
|
Initialize $consecutive\_count$ as 0
|
||||||
|
|
||||||
\BlankLine
|
\BlankLine
|
||||||
|
$Q_{\text{curr}} \leftarrow Q_{\text{start}}$
|
||||||
|
|
||||||
\While{$consecutive\_count < 2$}{
|
\While{$consecutive\_count < 2$}{
|
||||||
Compute possible memberships for $Q \in \{(Q_1 + 1, Q_2), (Q_1, Q_2 + 1), (Q_1 - 1, Q_2), (Q_1, Q_2 - 1)\}$\;
|
Fit models in $\mathcal{N}(Q_{\text{curr}})$\;
|
||||||
Fit models with the computed memberships
|
|
||||||
Choose the model with the maximum BIC-L as the next point
|
|
||||||
|
|
||||||
\BlankLine
|
\BlankLine
|
||||||
\If{$\text{BIC-L} > \text{BIC-L}_{\text{max}}$}{
|
$Q \leftarrow \arg\max_{Q \in \mathcal{N}(Q_{\text{curr}})} \text{BIC-L}(Q)$
|
||||||
$\text{BIC-L}_{\text{max}} \leftarrow \text{BIC-L}$\\
|
|
||||||
|
$\text{BIC-L}_{\text{curr}} \leftarrow \max_{Q \in \mathcal{N}(Q_{\text{curr}})} \text{BIC-L}(Q)$
|
||||||
|
\BlankLine
|
||||||
|
\If{$\text{BIC-L}_{\text{curr}} > \text{BIC-L}_{\text{max}}$}{
|
||||||
|
$\text{BIC-L}_{\text{max}} \leftarrow \text{BIC-L}_{\text{curr}}$\\
|
||||||
$consecutive\_count \leftarrow 0$
|
$consecutive\_count \leftarrow 0$
|
||||||
}
|
}
|
||||||
\Else{
|
\Else{
|
||||||
$consecutive\_count \leftarrow consecutive\_count + 1$
|
$consecutive\_count \leftarrow consecutive\_count + 1$
|
||||||
}
|
}
|
||||||
\BlankLine
|
|
||||||
$Q \leftarrow$ Next selected point
|
|
||||||
}
|
}
|
||||||
|
}
|
||||||
\BlankLine
|
\BlankLine
|
||||||
\textbf{Output:} Estimation of the mode using greedy exploration
|
\textbf{Output:} Estimation of the mode using greedy exploration
|
||||||
\end{algorithm}
|
\end{algorithm}
|
||||||
|
|
@ -512,8 +520,7 @@ consists of two alternating steps:
|
||||||
\For{$Q_1 \in \left[ Q_{1,\text{center}} - \text{depth} ; Q_{1,\text{center}} + \text{depth} \right]$}{
|
\For{$Q_1 \in \left[ Q_{1,\text{center}} - \text{depth} ; Q_{1,\text{center}} + \text{depth} \right]$}{
|
||||||
\For{$Q_2 \in \left[ Q_{2,\text{center}} - \text{depth}; Q_{2,\text{center}} + \text{depth} \right] $}{
|
\For{$Q_2 \in \left[ Q_{2,\text{center}} - \text{depth}; Q_{2,\text{center}} + \text{depth} \right] $}{
|
||||||
Compute possible splits from predecessors $(Q_1 - 1, Q_2)$ and $(Q_1, Q_2 - 1)$\\
|
Compute possible splits from predecessors $(Q_1 - 1, Q_2)$ and $(Q_1, Q_2 - 1)$\\
|
||||||
Fit models with the block membership changes
|
Among the model generated from the splits choose the best in regard of the BIC-L
|
||||||
Compare and keep the best model based on BIC-L
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -523,13 +530,12 @@ consists of two alternating steps:
|
||||||
\For{$Q_1 \in \left[ Q_{1,\text{center}} + \text{depth} ; Q_{1,\text{center}} - \text{depth} \right]$}{
|
\For{$Q_1 \in \left[ Q_{1,\text{center}} + \text{depth} ; Q_{1,\text{center}} - \text{depth} \right]$}{
|
||||||
\For{$Q_2 \in \left[ Q_{2,\text{center}} + \text{depth}; Q_{2,\text{center}} - \text{depth} \right] $}{
|
\For{$Q_2 \in \left[ Q_{2,\text{center}} + \text{depth}; Q_{2,\text{center}} - \text{depth} \right] $}{
|
||||||
Compute possible merges from predecessors $(Q_1 + 1, Q_2)$ and $(Q_1, Q_2 + 1)$\\
|
Compute possible merges from predecessors $(Q_1 + 1, Q_2)$ and $(Q_1, Q_2 + 1)$\\
|
||||||
Fit models with the block membership changes
|
Among the model generated from the merges choose the best in regard of the BIC-L
|
||||||
Compare and keep the best model based on BIC-L
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
\BlankLine
|
\BlankLine
|
||||||
Update the best model based on the maximum BIC-L
|
Choose the mode as the one that maximizes the BIC-L
|
||||||
}
|
}
|
||||||
|
|
||||||
\BlankLine
|
\BlankLine
|
||||||
|
|
@ -637,6 +643,7 @@ The procedure then repeats for the point at $(Q_1 + 1, Q_2)$ until it reaches
|
||||||
$(Q_{1,center} + depth, Q_2)$ from which it repeats from
|
$(Q_{1,center} + depth, Q_2)$ from which it repeats from
|
||||||
$(Q_{1,center} - depth, Q_2 + 1)$. This repeats until computing the best model
|
$(Q_{1,center} - depth, Q_2 + 1)$. This repeats until computing the best model
|
||||||
for $(Q_{1,center} + depth, Q_{2,center} + depth)$.
|
for $(Q_{1,center} + depth, Q_{2,center} + depth)$.
|
||||||
|
|
||||||
\textit{Note on the initialization:} The forward pass starts from the point
|
\textit{Note on the initialization:} The forward pass starts from the point
|
||||||
$(Q_{1,center} + depth, Q_{2,center} + depth)$, so this points needs to have at
|
$(Q_{1,center} + depth, Q_{2,center} + depth)$, so this points needs to have at
|
||||||
least a model fitted. In the best case, the greedy exploration will have visited
|
least a model fitted. In the best case, the greedy exploration will have visited
|
||||||
|
|
@ -663,7 +670,7 @@ $(Q_{1,center} + depth, Q_{2,center} + depth)$, we know it was initialized at
|
||||||
least by the forward pass, no special case here.\\
|
least by the forward pass, no special case here.\\
|
||||||
|
|
||||||
At the end of the moving window pass, the model of max BIC-L is the new best
|
At the end of the moving window pass, the model of max BIC-L is the new best
|
||||||
fit and the procedure can repeat until convergence.
|
fit and the procedure repeats until convergence.
|
||||||
|
|
||||||
\section{Networks clustering}
|
\section{Networks clustering}
|
||||||
\label{sec:networks-clustering}
|
\label{sec:networks-clustering}
|
||||||
|
|
@ -752,7 +759,7 @@ trivial partition in a unique group.
|
||||||
Then using the \emph{Kmeans} we split the collection in two sub-collections
|
Then using the \emph{Kmeans} we split the collection in two sub-collections
|
||||||
with the dissimilarity matrix. The two sub-collections are fitted and we
|
with the dissimilarity matrix. The two sub-collections are fitted and we
|
||||||
compute the score of this new partition $\mathcal{G}^{*} = \{G_1, G_2\}$.
|
compute the score of this new partition $\mathcal{G}^{*} = \{G_1, G_2\}$.
|
||||||
If $Sc(\mathcal{G}^{*}) > Sc(\mathcal{G})$ then we repeat the same procedure on
|
If $Sc(\mathcal{G}^{*}) > Sc(\mathcal{G})$, we repeat the same procedure on
|
||||||
$G_1$ and $G_2$. Else we return $\mathcal{G}$.
|
$G_1$ and $G_2$. Else we return $\mathcal{G}$.
|
||||||
We illustrate our capacity to perform a partition of a collection for all
|
We illustrate our capacity to perform a partition of a collection for all
|
||||||
colBiSBM models in~\ref{sec:network-clustering-of-simulated-networks}.
|
colBiSBM models in~\ref{sec:network-clustering-of-simulated-networks}.
|
||||||
|
|
@ -772,11 +779,11 @@ we obtain the following result of identifiability\footnote{The proof is in appen
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item[(1.1)] $\exists m^*\in\{1,\dots,M\} : n^1_{m^*} \geq 2 Q_2 - 1~\text{and}~n^2_{m^*} \geq 2 Q_1 - 1$.
|
\item[(1.1)] $\exists m^*\in\{1,\dots,M\} : n^1_{m^*} \geq 2 Q_2 - 1~\text{and}~n^2_{m^*} \geq 2 Q_1 - 1$.
|
||||||
\item[(1.2)] $\forall 1\leq q \leq Q_1, \pi_q > 0$
|
\item[(1.2)] $\forall 1\leq q \leq Q_1, \pi_q > 0$
|
||||||
and the coordinates of vector $\bm{\rho}
|
and the coordinates of vector $\bm{\rho}
|
||||||
{X^{m^*}}^T$ are distinct (where ${X^{m^*}}^T$ is the transpose of $X^{m^*}$).
|
{X^{m^*}}^T$ are distinct (where ${X^{m^*}}^T$ is the transpose of $X^{m^*}$).
|
||||||
\item[(1.3)] $\forall 1\leq r \leq Q_2, \rho_r > 0$
|
\item[(1.3)] $\forall 1\leq r \leq Q_2, \rho_r > 0$
|
||||||
and the coordinates of vector $\bm{\pi}
|
and the coordinates of vector $\bm{\pi}
|
||||||
X^{m^*}$ are distinct.
|
X^{m^*}$ are distinct.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
\end{theorem}
|
\end{theorem}
|
||||||
|
|
||||||
|
|
|
||||||
Binary file not shown.
|
|
@ -22,8 +22,8 @@ Maxime.
|
||||||
Merci à tous les permanents du 3\ieme étage, parmi lesquels: Christophe,
|
Merci à tous les permanents du 3\ieme étage, parmi lesquels: Christophe,
|
||||||
Stéphane et Vincent.
|
Stéphane et Vincent.
|
||||||
|
|
||||||
Merci à Hugo, Théodore, Éric, Jean-Benoist, Nicolas, Tristan, Sarah, Jade et
|
Merci à Liliane, Isabelle, Hugo, Théodore, Éric, Jean-Benoist, Nicolas, Lucia,
|
||||||
Pierre Gloaguen.
|
Tristan, Sarah, Jade et Pierre Gloaguen.
|
||||||
|
|
||||||
Un grand merci à tous ceux qui ont participé de près ou de loin au bon
|
Un grand merci à tous ceux qui ont participé de près ou de loin au bon
|
||||||
déroulement de ce stage.
|
déroulement de ce stage.
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue