\addtocounter{customchapter}{1} \chapter{Introduction} \section{Usage and importance of bipartite graphs}\label{sec:usage-and-importance-of-bipartite-graphs} Bipartite graphs, denoted as $G = (U,V,E)$ with $U$ and $V$ two disjoint and independent sets of vertices and $E$ the set of edges connecting $U$ vertices to $V$ vertices. \begin{minipage}{0.5\linewidth} \centering Bipartite network\\ \begin{tikzpicture}[scale=.6] \tikzstyle{every edge}=[-,>=stealth',shorten >=1pt,auto,draw,line width=1.5pt] \tikzstyle{every state}=[draw, text=black,scale=0.95, transform shape] \tikzstyle{every state}=[draw=none,text=black,scale=0.75, transform shape] \tikzstyle{every node}=[fill=blueind] \node[state, draw=black!50] (A1) at (0,5) {\textbf{R1}}; \node[state, draw=black!50] (A2) at (2.5,5) {\textbf{R2}}; \node[state, draw=black!50] (A3) at (5,5) {\textbf{R3}}; \tikzstyle{every node}=[fill=greenind, shape=rectangle] \tikzstyle{every state}=[draw=none,text=black,scale=0.75, transform shape, shape=rectangle] \node[state, draw=black!50] (B1) at (0,0) {\textbf{C1}}; \node[state, draw=black!50] (B2) at (1.25,0) {\textbf{C2}}; \node[state, draw=black!50] (B3) at (2.5,0) {\textbf{C3}}; \node[state, draw=black!50] (B4) at (3.75,0) {\textbf{C4}}; \node[state, draw=black!50] (B5) at (5,0) {\textbf{C5}}; \path (A1) edge [] (B1); \path (A1) edge (B2); \path (A1) edge (B3); \path (A1) edge (B4); \path (A2) edge (B3); \path (A2) edge (B4); \path (A3) edge (B5); \path (A2) edge (B5); \end{tikzpicture} \end{minipage} \begin{minipage}{0.5\linewidth} \begin{center} $X= \begin{pmatrix} 1 & 1 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 \\ \end{pmatrix} $\\ \vspace*{\baselineskip} Incidence matrix \end{center} \end{minipage} \vspace*{\baselineskip} $X$ is the \emph{incidence matrix} and is the mathematical object on which computations are performed. It is filled with the following rule: \begin{equation*} \begin{cases} X_{ij} = 0 & \text{if no interaction is observed between species }i\text{ and }j \\ X_{ij} \neq 0 & \text{otherwise} \end{cases} \end{equation*} If the network represents binary observations (like presence-absence) then $X_{ij}\in\mathcal{K}=\{0,1\},\forall(i,j)$; if the interactions are weighted (like an abundance count), $X_{ij}\in\mathcal{K}=\mathbb{N},\forall(i,j)$. This representation can be used to represent various forms of interactions were two kinds of ``actors`` interact. Those interactions can be binary or valued and a numeric representation is the incidence matrix, in the above example $X$.\\ Among the use case of bipartite graphs one can find the Netflix Problem, which was a prize organized by Netflix to improve its Recommender system. The row nodes are the movies and the columns are the user, at the intersection the value is the review of the user $j$ for the movie $i$.\\ Another use is the representation of ecological interactions like plant-pollinator \parencite{ramos-jilibertoTopologicalChangeAndean2010}, birds-seed dispersion, prey-predator or host-parasite \parencite{kaszewska-gilasGlobalStudiesHostParasite2021}. For plant-pollinator interactions, the rows are pollinator species and the columns are plant species, and the intersection is a value, binary if it is a presence/absence or a value if it is an abundance count. Bipartite graphs are widely used in biology in general, in various fields, among which the previously cited ecological networks, but also in medicine with biomedical networks, biomolecular networks or epidemiological networks. \parencite{pavlopoulosBipartiteGraphsSystems2018} Some interesting results can arise when applying a tool widely used on a particular kind of interactions is used on another kind of interactions. Companies like Netflix or Amazon use recommender system, to recommend other products to consumers based on their previous interactions. In ~\cite{desjardins-proulxEcologicalInteractionsNetflix2017} the authors use the \emph{K-nearest neighbour} (KNN) algorithm as a Recommender to predict missing preys for predators in a predator-prey network. \section{Latent Block Model} \label{sec:latent-block-model} The Latent Block Model (LBM) introduced by ~\cite{govaertLatentBlockModel2010} adapts the Stochastic Block Model (SBM) \parencite{hollandStochasticBlockmodelsFirst1983, snijdersEstimationPredictionStochastic1997} to bipartite graphs. \textit{Note :}\begin{small} Please note that we prefer the term ``BiSBM`` and will use both LBM and BiSBM to designate the Stochastic Block Model applied on bipartite networks. \end{small} This model supposes that: \begin{itemize} \item Row nodes are members of row blocks and column nodes are members of column blocks. \item The connectivity of two individuals is determined by their block memberships. \item An interaction can only occur between a row and a column node. \end{itemize} \begin{figure}[H] \center \begin{tikzpicture}[scale=.6] \input{../tikz/lbm.tex} \end{tikzpicture} \caption{An LBM model visualization} \label{fig:LBMvisu} \end{figure} \begin{itemize} \item $Q_1 = |\{{\color{blueind}\bullet},{\color{cyanind}\bullet},{\color{electricblue}\bullet}\}|$ \emph{given} blocks in rows \item $Q_2 = |\{{\color{burntorange}\bullet},{\color{goldenyellow}\bullet},{\color{peach}\bullet}\}|$ \emph{given} blocks in columns \end{itemize} Parameters \begin{itemize} \item $\pi_{\bullet} = \mathbb{P}(Z_i = \bullet)$ for rows and $\rho_{\bullet} = \mathbb{P}(W_j = \bullet)$ for columns \item $\alpha_{{\color{blueind}\bullet}{\color{burntorange}\bullet}} = \mathbb{P}(X_{ij} = 1 | Z_i = {\color{blueind}\bullet}, W_j = {\color{burntorange}\bullet})$, probability of connectivity knowing node membership blocks. \end{itemize} On \ref{fig:LBMvisu}, $\bm{\pi}$ are the probabilities for a row node to belong to the row block of corresponding color, $\bm{\rho}$ are the probabilities for a column node to belong to the column block of corresponding color and $\bm{\alpha}$ is a matrix $Q_1 \times Q_2$ of the connectivity parameters between the row and column blocks. This model can be used to easily generate bipartite graphs with complex and very varied structures. But when trying to determine the structure of a given network we need to find those parameters and as the row and column block memberships are \emph{latent} i.e.,\ they are not known and must be inferred. For this a common approach is to use a \emph{variational} EM algorithm (proposed for SBM in~\cite{daudinMixtureModelRandom2008} and for LBM in ~\cite{govaertEMAlgorithmBlock2005}) those groups and the required parameters can be inferred by maximizing a lower bound of the likelihood. \section{colSBM model, a joint model for a collection of networks} \label{sec:colsbm-model-a-joint-model-for-a-collection-of-networks} The \emph{colSBM} model introduced by ~\cite{chabert-liddellLearningCommonStructures2024a} propose an extension of the SBM model to collections of simple (or unipartite) networks. A collection is a set of networks which nodes are not common or linked between different networks, the interactions have the same valuations and are of the same type. The model can retrieve the shared structure in a collection, indicate if networks should be grouped in a collection and in a large pool of networks, collections with common structures. The next step after designing this collection model for unipartite networks was to extend it to the bipartite case.