Refactored rapport.tex

2023-06-26 22:09:00 +02:00 · 2023-06-26 22:09:00 +02:00 · 00c926bf1b
commit 00c926bf1b
parent dd1ed631d3
2 changed files with 155 additions and 145 deletions
--- a/rapport.pdf
+++ b/rapport.pdf
--- a/rapport.tex
+++ b/rapport.tex
@ -70,7 +70,7 @@
 \chapter{Context}
 \section{Usage and importance of bipartite graphs}
-
+\label{sec:usage-and-importance-of-bipartite-graphs}
 Bipartite graphs, denoted as $G = (U,V,E)$ with $U$ and $V$ two disjoint and
 independent sets of vertices and $E$ the set of edges connecting $U$ vertices to
 $V$ vertices.
@ -121,7 +121,7 @@ $\\
 This representation can be used to represent various forms of interactions were
-two kinds of ''actors'' interact. Those interactions can be binary or valued and
+two kinds of "actors" interact. Those interactions can be binary or valued and
 a numeric representation is the incidence matrix, in the above example $B$.\\
 Among the use case of bipartite graphs one can find the Netflix Problem, which
@ -147,19 +147,19 @@ Some interesting results can arise when applying a tool widely used on a particu
 kind of interactions is used on another kind of interactions. Companies like
 Netflix use recommender system, to recommend another product to consumers based
 on their previous interactions.
-In \cite{desjardins-proulxEcologicalInteractionsNetflix2017} the authors use the
+In ~\cite{desjardins-proulxEcologicalInteractionsNetflix2017} the authors use the
 \emph{K-nearest neighbour} (KNN) algorithm as a Recommender to predict missing
 preys for predators in a predator-prey network.
 \section{Latent Block Model}
-
+\label{sec:latent-block-model}
-The Latent Block Model (LBM) introduced by \cite{govaertLatentBlockModel2010} 
+The Latent Block Model (LBM) introduced by ~\cite{govaertLatentBlockModel2010}
 adapts the Stochastic Block Model (SBM)
-(\cite{hollandStochasticBlockmodelsFirst1983};\cite{snijdersEstimationPredictionStochastic1997})
+(~\cite{hollandStochasticBlockmodelsFirst1983};~\cite{snijdersEstimationPredictionStochastic1997})
 to bipartite graphs.
 \begin{small}
-Please note that we prefer the term ''BiSBM'' and will use both LBM and BiSBM to
+    Please note that we prefer the term "BiSBM" and will use both LBM and BiSBM to
    designate the Stochastic Block model applied on bipartite networks.
 \end{small}
@ -258,13 +258,13 @@ varied structures. But when trying to determine the structure of a given network
 we need to find those parameters.
 For this a common approach is to use a VEM algorithm
-(proposed for SBM in \cite{daudinMixtureModelRandom2008} and for LBM in \cite{govaertEMAlgorithmBlock2005}) 
+(proposed for SBM in ~\cite{daudinMixtureModelRandom2008} and for LBM in ~\cite{govaertEMAlgorithmBlock2005})
 those groups and the required parameters can be inferred by maximizing a lower
 bound of the likelihood minus a penalty.
 \section{colSBM model, a joint model for a collection of networks}
-
+\label{sec:colsbm-model-a-joint-model-for-a-collection-of-networks}
-The \emph{colSBM} model introduced by \cite{chabert-liddellLearningCommonStructures2023}
+The \emph{colSBM} model introduced by ~\cite{chabert-liddellLearningCommonStructures2023}
 propose an extension of the SBM model to collections of SBMs. A collection is a
 set of networks which nodes are not common or linked between different networks,
 the interactions have the same valuations and are of the same type.
@ -279,11 +279,12 @@ it to the bipartite case.
 \chapter{Adjustment of colSBM to the bipartite case: colBiSBM}
 \section{Definition of the model}
 \label{sec:definition-of-the-model}
 Here are some common notations and conventions that we will use in the following
 sections.
 \subsection{A collection of i.i.d Bipartite SBM}
-
+\label{ssec:a-collection-of-i-i-d-bipartite-sbm}
 As for \emph{colSBM} this first model is the most constrained. It assumes
 that all the networks are the independent realizations of the same $Q_1$-$Q_2$-BiSBM
 with identical parameters. The \emph{iid-colBiSBM} is defined as follows:
@ -295,6 +296,7 @@ with identical parameters. The \emph{iid-colBiSBM} is defined as follows:
 \section{Variational Expectation step}
 \label{sec:variational-expectation-step}
 Fixed point formula for the Bernoulli distribution:
 \begin{itemize}
    \item[-] \textit{iid} :
@ -317,13 +319,14 @@ with $\text{Mask}^{m}$ the matrix containing $0$ if the value is a NA and a 1
 otherwise.
 \section{M step of the algorithm}
-
+\label{sec:m-step-of-the-algorithm}
 Incorporate the equations from \parencite{chabert-liddellLearningCommonStructures2023}
 \section{Computation of the variational bound}
 \label{sec:computation-of-the-variational-bound}
 \section{Penalties}
-
+\label{sec:penalties}
 \paragraph*{\textit{iid-colBiSBM}}
 For the \textit{iid-colBiSBM} the penalties were modified in the following way :
@ -369,15 +372,18 @@ And the corresponding BIC-L formula:
 \]
 \section{Latent space exploration and model selection}
 \label{sec:latent-space-exploration-and-model-selection}
 In order to explorer the bi-dimensional latent space $(Q_1,Q_2)$
 we use the following strategies.
 \subsection{Model selection}
 \label{ssec:model-selection}
 In the following steps the model selection consists of using the BIC-L
 criterion to select the model. We choose among the proposed models the one that
 maximizes the BIC-L
 \subsection{Initialization and pairing of the models}
 \label{ssec:initialization-and-pairing-of-the-models}
 First to combine the information from the $M$ networks we fit a collection model
 for each network at the two points $Q = (1, 2)$ and $Q = (2, 1)$. Using the
 previously described VEM algorithm we obtain for each network its parameters
@ -396,6 +402,7 @@ Using this order we relabel the memberships for the $M$ fitted collection of a
 single network.
 Then we use the $M$ memberships to fit a collection containing the $M$ networks.
 \subsection{Greedy exploration to find an estimation of the mode}
 \label{ssec:greedy-exploration-to-find-an-estimation-of-the-mode}
 Using the previously fitted models for $Q = (1,2)$ and $Q = (2,1)$ we choose to
 perform a greedy exploration to find a first mode.
@ -446,6 +453,7 @@ BIC-L stops increasing $2$ times in a row.
 When this first estimation of the BIC-L mode has been find we apply the moving
 window on it.
 \subsection{Moving window to update the block memberships and the BIC-L}
 \label{ssec:moving-window-to-update-the-block-memberships-and-the-bic-l}
 The \emph{moving window} is used to update the block memberships on rows and
 columns and fit new models with those changes.
 To define the window, we use a center point and a \emph{depth}, giving us the
@ -550,11 +558,13 @@ At the end of the moving window pass, the model of max BIC-L is the new best
 fit and the procedure can repeat until convergence.
 \section{Networks clustering}
 \label{sec:networks-clustering}
 As in \parencite{chabert-liddellLearningCommonStructures2023} we use a recursive
 algorithm to determine the best clustering of the given networks. The procedure
 being the same, only the technical modifications for the bipartite case will be
 explained below.
 \subsection{Distance between two networks}
 \label{ssec:distance-between-two-networks}
 The distance weights uses $\pi$ and $\rho$.
 \[
    D_{\mathcal{M}}(m,m') = \sum_{q = 1}^{Q_1} \sum_{r = 1}^{Q_2} \max(\widetilde{\pi}_{q}^{m}, \widetilde{\pi}_{q}^{m'}) \left( \frac{\widetilde{\alpha}_{qr}^{m}}{\widehat{\delta}_{m}} - \frac{\widetilde{\alpha}_{qr}^{m'}}{\widehat{\delta}_{m'}}\right)^{2} \max(\widetilde{\rho}_{r}^{m}, \widetilde{\rho}_{r}^{m'})