165 lines
7.6 KiB
TeX
165 lines
7.6 KiB
TeX
\addtocounter{customchapter}{1}
|
|
\chapter{Introduction}
|
|
|
|
\section{Usage and importance of bipartite graphs}\label{sec:usage-and-importance-of-bipartite-graphs}
|
|
Bipartite graphs, denoted as $G = (U,V,E)$ with $U$ and $V$ two disjoint and
|
|
independent sets of vertices and $E$ the set of edges connecting $U$ vertices to
|
|
$V$ vertices.
|
|
|
|
\begin{minipage}{0.5\linewidth}
|
|
\centering
|
|
Bipartite network\\
|
|
\begin{tikzpicture}[scale=.6]
|
|
\tikzstyle{every edge}=[-,>=stealth',shorten >=1pt,auto,draw,line width=1.5pt]
|
|
\tikzstyle{every state}=[draw, text=black,scale=0.95, transform shape]
|
|
\tikzstyle{every state}=[draw=none,text=black,scale=0.75, transform shape]
|
|
\tikzstyle{every node}=[fill=blueind]
|
|
|
|
\node[state, draw=black!50] (A1) at (0,5) {\textbf{R1}};
|
|
\node[state, draw=black!50] (A2) at (2.5,5) {\textbf{R2}};
|
|
\node[state, draw=black!50] (A3) at (5,5) {\textbf{R3}};
|
|
|
|
\tikzstyle{every node}=[fill=greenind, shape=rectangle]
|
|
\tikzstyle{every state}=[draw=none,text=black,scale=0.75, transform shape, shape=rectangle]
|
|
\node[state, draw=black!50] (B1) at (0,0) {\textbf{C1}};
|
|
\node[state, draw=black!50] (B2) at (1.25,0) {\textbf{C2}};
|
|
\node[state, draw=black!50] (B3) at (2.5,0) {\textbf{C3}};
|
|
\node[state, draw=black!50] (B4) at (3.75,0) {\textbf{C4}};
|
|
\node[state, draw=black!50] (B5) at (5,0) {\textbf{C5}};
|
|
\path (A1) edge [] (B1);
|
|
\path (A1) edge (B2);
|
|
\path (A1) edge (B3);
|
|
\path (A1) edge (B4);
|
|
\path (A2) edge (B3);
|
|
\path (A2) edge (B4);
|
|
\path (A3) edge (B5);
|
|
\path (A2) edge (B5);
|
|
\end{tikzpicture}
|
|
\end{minipage}
|
|
\begin{minipage}{0.5\linewidth}
|
|
\begin{center}
|
|
$X=
|
|
\begin{pmatrix}
|
|
1 & 1 & 1 & 1 & 0 \\
|
|
0 & 0 & 1 & 1 & 1 \\
|
|
0 & 0 & 0 & 0 & 1 \\
|
|
\end{pmatrix}
|
|
$\\
|
|
\vspace*{\baselineskip}
|
|
Incidence matrix
|
|
\end{center}
|
|
\end{minipage}
|
|
|
|
\vspace*{\baselineskip}
|
|
$X$ is the \emph{incidence matrix} and is the mathematical object on which
|
|
computations are performed. It is filled with the following rule:
|
|
\begin{equation*}
|
|
\begin{cases}
|
|
X_{ij} = 0 & \text{if no interaction is observed between species }i\text{ and }j \\
|
|
X_{ij} \neq 0 & \text{otherwise}
|
|
\end{cases}
|
|
\end{equation*}
|
|
If the network represents binary observations (like presence-absence) then
|
|
$X_{ij}\in\mathcal{K}=\{0,1\},\forall(i,j)$; if the interactions are weighted
|
|
(like an abundance count), $X_{ij}\in\mathcal{K}=\mathbb{N},\forall(i,j)$.
|
|
|
|
This representation can be used to represent various forms of interactions were
|
|
two kinds of ``actors`` interact. Those interactions can be binary or valued
|
|
and a numeric representation is the incidence matrix, in the above example
|
|
$X$.\\
|
|
|
|
Among the use case of bipartite graphs one can find the Netflix Problem, which
|
|
was a prize organized by Netflix to improve its Recommender system. The row
|
|
nodes are the movies and the columns are the user, at the intersection the
|
|
value is the review of the user $j$ for the movie $i$.\\
|
|
|
|
Another use is the representation of ecological interactions like
|
|
plant-pollinator \parencite{ramos-jilibertoTopologicalChangeAndean2010},
|
|
birds-seed dispersion, prey-predator or host-parasite
|
|
\parencite{kaszewska-gilasGlobalStudiesHostParasite2021}. For plant-pollinator
|
|
interactions, the rows are pollinator species and the columns are plant species,
|
|
and the intersection is a value, binary if it is a presence/absence or a value
|
|
if it is an abundance count.
|
|
|
|
Bipartite graphs are widely used in biology, in various fields, among which the
|
|
previously cited ecological networks, but also in medicine with biomedical
|
|
networks, biomolecular networks or epidemiological networks.
|
|
\parencite{pavlopoulosBipartiteGraphsSystems2018}
|
|
|
|
Some interesting results can arise when applying a tool widely used on a
|
|
particular kind of interactions is used on another kind of interactions.
|
|
Companies like Netflix use recommender system, to recommend another product to
|
|
consumers based on their previous interactions. In
|
|
~\cite{desjardins-proulxEcologicalInteractionsNetflix2017} the authors use the
|
|
\emph{K-nearest neighbour} (KNN) algorithm as a Recommender to predict missing
|
|
preys for predators in a predator-prey network.
|
|
|
|
\section{Latent Block Model}
|
|
\label{sec:latent-block-model}
|
|
The Latent Block Model (LBM) introduced by ~\cite{govaertLatentBlockModel2010}
|
|
adapts the Stochastic Block Model (SBM)
|
|
\parencite{hollandStochasticBlockmodelsFirst1983, snijdersEstimationPredictionStochastic1997}
|
|
to bipartite graphs.
|
|
|
|
\begin{small}
|
|
Please note that we prefer the term ``BiSBM`` and will use both LBM and BiSBM to
|
|
designate the Stochastic Block model applied on bipartite networks.
|
|
\end{small}
|
|
|
|
This model supposes that:
|
|
\begin{itemize}
|
|
\item Row nodes are members of row blocks and column nodes are members of column
|
|
blocks.
|
|
\item The connectivity of two individuals is determined by their block memberships.
|
|
\item An interaction can only occur between a row and a column node.
|
|
\end{itemize}
|
|
|
|
\begin{figure}[H]
|
|
\center
|
|
\begin{tikzpicture}[scale=.6]
|
|
\input{../tikz/lbm.tex}
|
|
\end{tikzpicture}
|
|
\caption{An LBM model visualization}
|
|
\label{fig:LBMvisu}
|
|
\end{figure}
|
|
|
|
\begin{itemize}
|
|
\item $Q_1 = |\{{\color{blueind}\bullet},{\color{cyanind}\bullet},{\color{electricblue}\bullet}\}|$ \emph{given} blocks in rows
|
|
\item $Q_2 = |\{{\color{burntorange}\bullet},{\color{goldenyellow}\bullet},{\color{peach}\bullet}\}|$ \emph{given} blocks in columns
|
|
\end{itemize}
|
|
Parameters
|
|
\begin{itemize}
|
|
\item $\pi_{\bullet} = \mathbb{P}(Z_i = \bullet)$ for rows and $\rho_{\bullet} = \mathbb{P}(W_j = \bullet)$ for columns
|
|
\item $\alpha_{{\color{blueind}\bullet}{\color{burntorange}\bullet}} = \mathbb{P}(X_{ij} = 1 | Z_i = {\color{blueind}\bullet}, W_j = {\color{burntorange}\bullet})$, probability of connectivity knowing node membership blocks.
|
|
\end{itemize}
|
|
|
|
On \ref{fig:LBMvisu}, $\bm{\pi}$ are the probabilities for a row node to belong
|
|
to the row block of corresponding color, $\bm{\rho}$ are the probabilities for
|
|
a column node to belong to the column block of corresponding color and
|
|
$\bm{\alpha}$ is a matrix $Q_1 \times Q_2$ of the connectivity parameters
|
|
between the row and column blocks.
|
|
|
|
This model can be used to easily generate bipartite graphs with complex and
|
|
very varied structures. But when trying to determine the structure of a given
|
|
network we need to find those parameters and as the row and column block
|
|
memberships are \emph{latent} i.e.,\ they are not known and must be inferred.
|
|
|
|
For this a common approach is to use a \emph{variational} EM algorithm (proposed
|
|
for SBM in~\cite{daudinMixtureModelRandom2008} and for LBM in
|
|
~\cite{govaertEMAlgorithmBlock2005}) those groups and the required parameters
|
|
can be inferred by maximizing a lower bound of the likelihood.
|
|
|
|
\section{colSBM model, a joint model for a collection of networks}
|
|
\label{sec:colsbm-model-a-joint-model-for-a-collection-of-networks}
|
|
The \emph{colSBM} model introduced by ~\cite{chabert-liddellLearningCommonStructures2024a}
|
|
propose an extension of the SBM model to collections of simple (or unipartite)
|
|
networks. A collection is a set of networks which nodes are not common or linked
|
|
between different networks, the interactions have the same valuations and
|
|
are of the same type.
|
|
|
|
The model can retrieve the shared structure in a collection, indicate if
|
|
networks should be grouped in a collection and in a large pool of networks,
|
|
collections with common structures.
|
|
|
|
The next step after designing this collection model for unipartite networks was
|
|
to extend it to the bipartite case.
|