mia-rapport-2024/rapport/chapter2-context.tex

165 lines
7.6 KiB
TeX

\addtocounter{customchapter}{1}
\chapter{Introduction}
\section{Usage and importance of bipartite graphs}\label{sec:usage-and-importance-of-bipartite-graphs}
Bipartite graphs, denoted as $G = (U,V,E)$ with $U$ and $V$ two disjoint and
independent sets of vertices and $E$ the set of edges connecting $U$ vertices to
$V$ vertices.
\begin{minipage}{0.5\linewidth}
\centering
Bipartite network\\
\begin{tikzpicture}[scale=.6]
\tikzstyle{every edge}=[-,>=stealth',shorten >=1pt,auto,draw,line width=1.5pt]
\tikzstyle{every state}=[draw, text=black,scale=0.95, transform shape]
\tikzstyle{every state}=[draw=none,text=black,scale=0.75, transform shape]
\tikzstyle{every node}=[fill=blueind]
\node[state, draw=black!50] (A1) at (0,5) {\textbf{R1}};
\node[state, draw=black!50] (A2) at (2.5,5) {\textbf{R2}};
\node[state, draw=black!50] (A3) at (5,5) {\textbf{R3}};
\tikzstyle{every node}=[fill=greenind, shape=rectangle]
\tikzstyle{every state}=[draw=none,text=black,scale=0.75, transform shape, shape=rectangle]
\node[state, draw=black!50] (B1) at (0,0) {\textbf{C1}};
\node[state, draw=black!50] (B2) at (1.25,0) {\textbf{C2}};
\node[state, draw=black!50] (B3) at (2.5,0) {\textbf{C3}};
\node[state, draw=black!50] (B4) at (3.75,0) {\textbf{C4}};
\node[state, draw=black!50] (B5) at (5,0) {\textbf{C5}};
\path (A1) edge [] (B1);
\path (A1) edge (B2);
\path (A1) edge (B3);
\path (A1) edge (B4);
\path (A2) edge (B3);
\path (A2) edge (B4);
\path (A3) edge (B5);
\path (A2) edge (B5);
\end{tikzpicture}
\end{minipage}
\begin{minipage}{0.5\linewidth}
\begin{center}
$X=
\begin{pmatrix}
1 & 1 & 1 & 1 & 0 \\
0 & 0 & 1 & 1 & 1 \\
0 & 0 & 0 & 0 & 1 \\
\end{pmatrix}
$\\
\vspace*{\baselineskip}
Incidence matrix
\end{center}
\end{minipage}
\vspace*{\baselineskip}
$X$ is the \emph{incidence matrix} and is the mathematical object on which
computations are performed. It is filled with the following rule:
\begin{equation*}
\begin{cases}
X_{ij} = 0 & \text{if no interaction is observed between species }i\text{ and }j \\
X_{ij} \neq 0 & \text{otherwise}
\end{cases}
\end{equation*}
If the network represents binary observations (like presence-absence) then
$X_{ij}\in\mathcal{K}=\{0,1\},\forall(i,j)$; if the interactions are weighted
(like an abundance count), $X_{ij}\in\mathcal{K}=\mathbb{N},\forall(i,j)$.
This representation can be used to represent various forms of interactions were
two kinds of ``actors`` interact. Those interactions can be binary or valued
and a numeric representation is the incidence matrix, in the above example
$X$.\\
Among the use case of bipartite graphs one can find the Netflix Problem, which
was a prize organized by Netflix to improve its Recommender system. The row
nodes are the movies and the columns are the user, at the intersection the
value is the review of the user $j$ for the movie $i$.\\
Another use is the representation of ecological interactions like
plant-pollinator \parencite{ramos-jilibertoTopologicalChangeAndean2010},
birds-seed dispersion, prey-predator or host-parasite
\parencite{kaszewska-gilasGlobalStudiesHostParasite2021}. For plant-pollinator
interactions, the rows are pollinator species and the columns are plant species,
and the intersection is a value, binary if it is a presence/absence or a value
if it is an abundance count.
Bipartite graphs are widely used in biology in general, in various fields, among
which the previously cited ecological networks, but also in medicine with
biomedical networks, biomolecular networks or epidemiological networks.
\parencite{pavlopoulosBipartiteGraphsSystems2018}
Some interesting results can arise when applying a tool widely used on a
particular kind of interactions is used on another kind of interactions.
Companies like Netflix or Amazon use recommender system, to recommend other
products to consumers based on their previous interactions. In
~\cite{desjardins-proulxEcologicalInteractionsNetflix2017} the authors use the
\emph{K-nearest neighbour} (KNN) algorithm as a Recommender to predict missing
preys for predators in a predator-prey network.
\section{Latent Block Model}
\label{sec:latent-block-model}
The Latent Block Model (LBM) introduced by ~\cite{govaertLatentBlockModel2010}
adapts the Stochastic Block Model (SBM)
\parencite{hollandStochasticBlockmodelsFirst1983, snijdersEstimationPredictionStochastic1997}
to bipartite graphs.
\textit{Note :}\begin{small}
Please note that we prefer the term ``BiSBM`` and will use both LBM and BiSBM to
designate the Stochastic Block Model applied on bipartite networks.
\end{small}
This model supposes that:
\begin{itemize}
\item Row nodes are members of row blocks and column nodes are members of column
blocks.
\item The connectivity of two individuals is determined by their block memberships.
\item An interaction can only occur between a row and a column node.
\end{itemize}
\begin{figure}[H]
\center
\begin{tikzpicture}[scale=.6]
\input{../tikz/lbm.tex}
\end{tikzpicture}
\caption{An LBM model visualization}
\label{fig:LBMvisu}
\end{figure}
\begin{itemize}
\item $Q_1 = |\{{\color{blueind}\bullet},{\color{cyanind}\bullet},{\color{electricblue}\bullet}\}|$ \emph{given} blocks in rows
\item $Q_2 = |\{{\color{burntorange}\bullet},{\color{goldenyellow}\bullet},{\color{peach}\bullet}\}|$ \emph{given} blocks in columns
\end{itemize}
Parameters
\begin{itemize}
\item $\pi_{\bullet} = \mathbb{P}(Z_i = \bullet)$ for rows and $\rho_{\bullet} = \mathbb{P}(W_j = \bullet)$ for columns
\item $\alpha_{{\color{blueind}\bullet}{\color{burntorange}\bullet}} = \mathbb{P}(X_{ij} = 1 | Z_i = {\color{blueind}\bullet}, W_j = {\color{burntorange}\bullet})$, probability of connectivity knowing node membership blocks.
\end{itemize}
On \ref{fig:LBMvisu}, $\bm{\pi}$ are the probabilities for a row node to belong
to the row block of corresponding color, $\bm{\rho}$ are the probabilities for
a column node to belong to the column block of corresponding color and
$\bm{\alpha}$ is a matrix $Q_1 \times Q_2$ of the connectivity parameters
between the row and column blocks.
This model can be used to easily generate bipartite graphs with complex and
very varied structures. But when trying to determine the structure of a given
network we need to find those parameters and as the row and column block
memberships are \emph{latent} i.e.,\ they are not known and must be inferred.
For this a common approach is to use a \emph{variational} EM algorithm (proposed
for SBM in~\cite{daudinMixtureModelRandom2008} and for LBM in
~\cite{govaertEMAlgorithmBlock2005}) those groups and the required parameters
can be inferred by maximizing a lower bound of the likelihood.
\section{colSBM model, a joint model for a collection of networks}
\label{sec:colsbm-model-a-joint-model-for-a-collection-of-networks}
The \emph{colSBM} model introduced by ~\cite{chabert-liddellLearningCommonStructures2024a}
propose an extension of the SBM model to collections of simple (or unipartite)
networks. A collection is a set of networks which nodes are not common or linked
between different networks, the interactions have the same valuations and
are of the same type.
The model can retrieve the shared structure in a collection, indicate if
networks should be grouped in a collection and in a large pool of networks,
collections with common structures.
The next step after designing this collection model for unipartite networks was
to extend it to the bipartite case.