rapport : modifier le rapport

This commit is contained in:
Louis Lacoste 2024-07-05 16:59:31 +02:00
parent 5921c7fa60
commit 43485078a7
10 changed files with 138 additions and 5110 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 120 KiB

File diff suppressed because it is too large Load diff

Before

Width:  |  Height:  |  Size: 364 KiB

Binary file not shown.

View file

@ -1,5 +1,6 @@
\addtocounter{customchapter}{1}
\chapter{L'UMR MIA Paris-Saclay}
% \addtocounter{customchapter}{1}
\chapter*{L'UMR MIA Paris-Saclay}
\pagestyle{intro}
L'UMR MIA Paris-Saclay est une entité de recherche qui regroupe des
statisticiens et des informaticiens spécialisés dans la modélisation et
@ -37,25 +38,25 @@ La figure \ref{fig:organigramme-umr} présente l'organigramme complet de l'unit
\newline
\emph{Source:~\cite{AccueilMIAParisSaclay}}\\
\begin{sidewaysfigure}[h!]
\begin{sidewaysfigure}
\begin{center}
% \includegraphics[scale=0.4]{img/Organigramme_MIA-Paris-Saclay}
\includegraphics[scale=0.45]{Organigramme_MIA-Paris-Saclay_GS 06-2024.jpg}
\includegraphics[scale=0.37]{Organigramme_MIA-Paris-Saclay_GS 06-2024.jpg}
\caption{Organigramme de l'UMR}
\label{fig:organigramme-umr}
\end{center}
\end{sidewaysfigure}
\section[Encadrement]{Encadrement et vie en stage}
\section*{Encadrement et vie en stage}
Au cours de mon stage, j'étais encadré par Pierre Barbillon et Sophie Donnet
et fréquemment en discussion avec eux et Saint-Clair Chabert-Liddell dont
j'ai poursuivi les travaux.
Le contexte de travail, au sein des ingénieurs d'études, des doctorants, des
chercheurs et des maîtres de conférences, a été pour moi très enrichissant. Ce
stage s'inscrit dans la construction de mon parcours professionnel en validant
le désir que je présentais de faire de la recherche.
chercheurs et des maîtres de conférences, a été pour moi très enrichissant.
% Ce stage s'inscrit dans la construction de mon parcours professionnel en validant
% le désir que je présentais de faire de la recherche.
Par ailleurs, divers projets entrepris au sein du laboratoire ont permis de
nouer des relations amicales en dehors des heures de travail. Par exemple, le

View file

@ -1,5 +1,5 @@
\addtocounter{customchapter}{1}
\chapter{Context of the study}
\chapter{Introduction}
\section{Usage and importance of bipartite graphs}\label{sec:usage-and-importance-of-bipartite-graphs}
Bipartite graphs, denoted as $G = (U,V,E)$ with $U$ and $V$ two disjoint and
@ -38,17 +38,19 @@ $V$ vertices.
\end{minipage}
\begin{minipage}{0.5\linewidth}
\begin{center}
Incidence matrix
$X=\left(
\begin{array}{rrrrr}
$X=
\begin{pmatrix}
1 & 1 & 1 & 1 & 0 \\
0 & 0 & 1 & 1 & 1 \\
0 & 0 & 0 & 0 & 1 \\
\end{array}\right)
\end{pmatrix}
$\\
\vspace*{\baselineskip}
Incidence matrix
\end{center}
\end{minipage}
\vspace*{\baselineskip}
$X$ is the \emph{incidence matrix} and is the mathematical object on which
computations are performed. It is filled with the following rule:
\begin{equation*}
@ -57,7 +59,7 @@ computations are performed. It is filled with the following rule:
X_{ij} \neq 0 & \text{otherwise}
\end{cases}
\end{equation*}
If the network represents binary observation (like presence-absence observation) then
If the network represents binary observations (like presence-absence) then
$X_{ij}\in\mathcal{K}=\{0,1\},\forall(i,j)$; if the interactions are weighted
(like an abundance count), $X_{ij}\in\mathcal{K}=\mathbb{N},\forall(i,j)$.
@ -74,10 +76,10 @@ value is the review of the user $j$ for the movie $i$.\\
Another use is the representation of ecological interactions like
plant-pollinator \parencite{ramos-jilibertoTopologicalChangeAndean2010},
birds-seed dispersion, prey-predator or host-parasite
\parencite{kaszewska-gilasGlobalStudiesHostParasite2021}. In those cases, the
rows are pollinator species and the columns are plant species, and the
intersection is a value, binary if it is a presence/absence or a value if it is
an abundance count.
\parencite{kaszewska-gilasGlobalStudiesHostParasite2021}. For plant-pollinator
interactions, the rows are pollinator species and the columns are plant species,
and the intersection is a value, binary if it is a presence/absence or a value
if it is an abundance count.
Bipartite graphs are widely used in biology, in various fields, among which the
previously cited ecological networks, but also in medicine with biomedical
@ -134,29 +136,30 @@ Parameters
On \ref{fig:LBMvisu}, $\bm{\pi}$ are the probabilities for a row node to belong
to the row block of corresponding color, $\bm{\rho}$ are the probabilities for
a column node to belong to the column block of corresponding color and
$\bm{\alpha}$ are the connectivity parameters between the row and column
blocks.
$\bm{\alpha}$ is a matrix $Q_1 \times Q_2$ of the connectivity parameters
between the row and column blocks.
This model can be used to easily generate bipartite graphs with complex and
very varied structures. But when trying to determine the structure of a given
network we need to find those parameters and as the row and column block
memberships are \emph{latent} i.e.,\ they are not known and must be inferred.
For this a common approach is to use a VEM algorithm (proposed for SBM in
~\cite{daudinMixtureModelRandom2008} and for LBM in
For this a common approach is to use a \emph{variational} EM algorithm (proposed
for SBM in~\cite{daudinMixtureModelRandom2008} and for LBM in
~\cite{govaertEMAlgorithmBlock2005}) those groups and the required parameters
can be inferred by maximizing a lower bound of the likelihood minus a penalty.
can be inferred by maximizing a lower bound of the likelihood.
\section{colSBM model, a joint model for a collection of networks}
\label{sec:colsbm-model-a-joint-model-for-a-collection-of-networks}
The \emph{colSBM} model introduced by ~\cite{chabert-liddellLearningCommonStructures2024a}
propose an extension of the SBM model to collections of SBMs. A collection is a
set of networks which nodes are not common or linked between different networks,
the interactions have the same valuations and are of the same type.
propose an extension of the SBM model to collections of simple (or unipartite)
networks. A collection is a set of networks which nodes are not common or linked
between different networks, the interactions have the same valuations and
are of the same type.
The model can retrieve the shared structure in a collection, indicate if
networks should be grouped in a collection and in a large pool of networks,
collections with common structures.
The next step after designing this collection model for unipartite was to adapt
it to the bipartite case.
The next step after designing this collection model for unipartite networks was
to extend it to the bipartite case.

View file

@ -247,28 +247,6 @@ And we obtain the following formulae for the $\bm{\tau^m}$:
which are used to update iteratively the values by a fixed point algorithm with
only one step.
% TODO move to technical.tex
% From the above formulae we obtain for the Bernoulli distribution:
% \begin{itemize}
% \item[-] \textit{iid} :
% \[ \bm{\tau}^{m,1} = ~^{t}\pi + \exp((\text{Mask}^{m} \odot A^{m})
% \bm{\tau}^{m,2} ~^{t}(\text{logit}(\alpha)) + \text{Mask}^{m}
% \bm{\tau}^{m,2} ~^{t}\log(\bm{1} - \alpha)) \]
% \[ \bm{\tau}^{m,2} = ~^{t}\rho + \exp(~^{t}(\text{Mask}^{m} \odot A^{m})
% \bm{\tau}^{m,1} \text{logit}(\alpha) + ~^{t}\text{Mask}^{m}
% \bm{\tau}^{m,1} \log(\bm{1} - \alpha)) \]
% \item[-] $\rho\pi$ :
% \[ \bm{\tau}^{m,1} = ~^{t}\pi^{m} + \exp((\text{Mask}^{m} \odot A^{m})
% \bm{\tau}^{m,2} ~^{t}(\text{logit}(\alpha)) + \text{Mask}^{m}
% \bm{\tau}^{m,2} ~^{t}\log(\bm{1} - \alpha)) \]
% \[ \bm{\tau}^{m,2} = ~^{t}\rho^{m} + \exp(~^{t}(\text{Mask}^{m} \odot A^{m})
% \bm{\tau}^{m,1} \text{logit}(\alpha) + ~^{t}\text{Mask}^{m}
% \bm{\tau}^{m,1} \log(\bm{1} - \alpha)) \]
% \end{itemize}
% with $\text{Mask}^{m}$ the matrix containing $0$ if the value is a NA and a 1
% otherwise.
\subsection{M step of the algorithm}
\label{ssec:m-step-of-the-algorithm}
At iteration $(t)$ the M-step maximizes the variational bound with respect to
@ -353,40 +331,41 @@ BIC-like criterion in the following manner:
We provide below the expression for the penalties for the 4 models that we
propose.
\paragraph*{\textit{iid-colBiSBM}}
For the \textit{iid-colBiSBM} the penalties were modified in the following way:
\begin{itemize}
\item For the $\pi$s and $\rho$s:
\[\text{pen}_{\pi}(Q_1) = (Q_1 - 1)\log(\sum_{m=1}^{M}n_{1}^{m})\]
\[\text{pen}_{\rho}(Q_2) = (Q_2 - 1)\log(\sum_{m=1}^{M}n_{2}^{m})\]
\item For the $\alpha$s :
\begin{description}
\item[\textit{iid-colBiSBM}] For the $\bm\pi$ and $\bm\rho$:
\begin{align*}
\text{pen}_{\pi}(Q_1) = (Q_1 - 1)\log(\sum_{m=1}^{M}n_{1}^{m}) & , &
\text{pen}_{\rho}(Q_2) = (Q_2 - 1)\log(\sum_{m=1}^{M}n_{2}^{m})
\end{align*}
For the $\bm\alpha$:
\[\text{pen}_{\alpha}(Q_1, Q_2) = Q_1 \times Q_2 \log(N_M)\]
with
\[ N_M = \sum_{m = 1}^{M} n_{1}^{m} \times n_{2}^{m} \]
\end{itemize}
And thus the $\text{BIC-L}$ formula is now:
\[ \text{BIC-L}(\bm{X},Q_1, Q_2) = \max_{\theta} \mathcal{J} (\mathcal{\hat{R}}, \bm{\theta})
- \frac{1}{2} [\text{pen}_{\pi}(Q_1) + \text{pen}_{\rho}(Q_2) + \text{pen}_{\alpha}(Q_1, Q_2)]\]
\paragraph*{\textit{$\rho\pi$-colBiSBM}}
For the \textit{$\rho\pi$-colBiSBM} the penalties are the following:
\begin{itemize}
\item The support penalties are:
\[ \text{pen}_{S_1}(Q_1) = -2 \log p_{Q_1} (S_1) \]
\[ \text{pen}_{S_2}(Q_2) = -2 \log p_{Q_2} (S_2) \]
with
\[ \log p_{Q_1}(S_1) = - M \log(Q_1) - \sum_{m=1}^{M} \log {Q_1 \choose Q_1^{(m)}} \]
\[ \log p_{Q_2}(S_2) = - M \log(Q_2) - \sum_{m=1}^{M} \log {Q_2 \choose Q_2^{(m)}} \]
\item Penalties for the $\rho$s and $\pi$s:
\[ \text{pen}_{\pi}(Q_1, S_1) = \sum_{m=1}^{M} (Q_{1}^{(m)} - 1) \log n_{1}^{m} \]
\[ \text{pen}_{\rho}(Q_2, S_2) = \sum_{m=1}^{M} (Q_{2}^{(m)} - 1) \log n_{2}^{m} \]
\item Penalties for the $\alpha$s:
\[ \text{pen}_{\alpha}(Q_1, Q_2, S_1, S_2) = (\sum_{q=1}^{Q_1} \sum_{r=1}^{Q_2} \mathbb{1}_{(S_1)'S_2 > 0}) \log (N_M) \]
\end{itemize}
And the corresponding BIC-L formula:
And thus the $\text{BIC-L}$ formula is the following:
\[ \text{BIC-L}(\bm{X},Q_1, Q_2) = \max_{\theta}
\mathcal{J} (\mathcal{\hat{R}}, \bm{\theta})
- \frac{1}{2} [\text{pen}_{\pi}(Q_1) + \text{pen}_{\rho}(Q_2) +
\text{pen}_{\alpha}(Q_1, Q_2)]\]
\item[\textit{$\bm{\pi\rho}$-colBiSBM}] The support penalties are
\begin{align*}
\text{pen}_{S_1}(Q_1) = -2 \log p_{Q_1} (S_1) & , &
\text{pen}_{S_2}(Q_2) = -2 \log p_{Q_2} (S_2)
\end{align*}
with \begin{align*}
\log p_{Q_1}(S_1) = - M \log(Q_1) - \sum_{m=1}^{M} \log {Q_1
\choose Q_1^{(m)}}, &
\log p_{Q_2}(S_2) = - M \log(Q_2) - \sum_{m=1}^{M} \log {Q_2
\choose Q_2^{(m)}}.
\end{align*}
And penalties for the $\bm\rho$ and $\bm\pi$ are
\[ \text{pen}_{\pi}(Q_1, S_1) = \sum_{m=1}^{M} (Q_{1}^{(m)} - 1)
\log n_{1}^{m},
~\text{pen}_{\rho}(Q_2, S_2) = \sum_{m=1}^{M} (Q_{2}^{(m)} - 1)
\log n_{2}^{m}. \]
Penalties for the $\bm\alpha$
\[ \text{pen}_{\alpha}(Q_1, Q_2, S_1, S_2) = (\sum_{q=1}^{Q_1}
\sum_{r=1}^{Q_2} \mathbb{1}_{(S_1)'S_2 > 0}) \log (N_M). \]
And the corresponding BIC-L formula,
\[
\begin{aligned}
\text{BIC-L}(\bm{X},Q_1, Q_2) =
@ -397,6 +376,7 @@ And the corresponding BIC-L formula:
& + \text{pen}_{S_1}(Q_1) + \text{pen}_{S_2}(Q_2))] \\
\end{aligned}
\]
\end{description}
\subsection{Initialization and pairing of the models}
\label{ssec:initialization-and-pairing-of-the-models}
@ -407,18 +387,18 @@ previously described VEM algorithm we obtain for each network its parameters
We then compute the marginal laws for each dimension, for each network. Then we
order the network blocks by the probabilities obtained in decreasing order.
\begin{itemize}
\item For the memberships on the columns: $col~order_m = order\left(\pi_m \times
\alpha_m\right)$
\item For the memberships on the rows: $row~order_m = order\left(\rho_m \times
~^{t}(\alpha_m)\right)$
\end{itemize}
For the memberships on the columns: $col~order_m = order\left(\pi_m \times
\alpha_m\right)$.
For the memberships on the rows: $row~order_m = order\left(\rho_m \times
~^{t}(\alpha_m)\right)$.
Using this order we relabel the memberships for the $M$ fitted collection of a
single network. Then we use the $M$ memberships to fit a collection containing
the $M$ networks.
\subsection{Greedy exploration to find an estimation of the mode}
\label{ssec:greedy-exploration-to-find-an-estimation-of-the-mode}
\subsection{Greedy exploration to find an estimation of the mode}\label{ssec:greedy-exploration-to-find-an-estimation-of-the-mode}
Using the previously fitted models for $Q = (1,2)$ and $Q = (2,1)$ we choose to
perform a greedy exploration to find a first mode.
@ -428,7 +408,7 @@ memberships for the points $Q \in \{(Q_1 + 1, Q_2),(Q_1, Q_2 + 1),(Q_1 - 1,
maximizes the BIC-L as the next point from which to repeat the procedure. We
repeat the procedure until the BIC-L stops increasing $2$ times in a row.
\begin{algorithm}[H]
\begin{algorithm}[t]
\caption{Greedy Exploration for Mode Estimation}
\SetAlgoLined
\SetKwInOut{Input}{Input}
@ -486,7 +466,7 @@ consists of two alternating steps:
model.
\end{itemize}
\begin{algorithm}[H]
\begin{algorithm}[t]
\caption{Moving Window Procedure}
\SetAlgoLined
\SetKwInOut{Input}{Input}
@ -530,7 +510,7 @@ consists of two alternating steps:
\textbf{Output:} Best model with maximum BIC-L in the window
\end{algorithm}
\begin{figure}[H]
\begin{figure}[t]
\definecolor{mypurple}{RGB}{128,0,128}
\begin{subfigure}[b]{0.48\textwidth}
\begin{tikzpicture}[scale=1.5]
@ -698,7 +678,7 @@ And the dissimilarity between any pair of networks $(m,m')\in\mathcal{M}^2$ is t
D_{\mathcal{M}}(m,m') = \sum_{q = 1}^{Q_1} \sum_{r = 1}^{Q_2} \max(\widetilde{\pi}_{q}^{m}, \widetilde{\pi}_{q}^{m'}) \left( \widetilde{\alpha}_{qr}^{m} - \widetilde{\alpha}_{qr}^{m'}\right)^{2} \max(\widetilde{\rho}_{r}^{m}, \widetilde{\rho}_{r}^{m'})
\]
\begin{figure}[H]
\begin{figure}[t]
\centering
\begin{tikzpicture}
\tikzstyle{instruct}=[font=\small, text justified, rectangle,draw,fill=yellow!50]

View file

@ -4,7 +4,15 @@
\newgeometry{left=7.5cm,bottom=2cm, top=1cm, right=1cm}
% \tikz[remember picture,overlay] \node[opacity=1,inner sep=0pt] at (-28mm,-135mm){\includegraphics{Bandeau_UPaS.pdf}};
\begin{tikzpicture}[remember picture,overlay]
\fill [pruneps] (-4,-28.3) rectangle (-8.15, 1.4);
\foreach \x in {-8.1, -7.9, -7.6, -7.2}
\draw[white, line width=0.5mm] (\x, -28.3) -- (\x, 1.4);
\node[inner sep=0pt, rotate=90, font=\fontfamily{fvs}\fontseries{b}\fontsize{26}{26}\selectfont, text=white] (rapport) at (-6.3, -22.4) {Rapport de stage};
\node[inner sep=0pt, opacity=1] (logo-UPS) at (-0.85,0) {\includegraphics{logo/Logotype_UPSaclay_CMJN.eps}};
\end{tikzpicture}
% fonte sans empattement pour la page de titre
\fontfamily{fvs}\fontseries{m}\selectfont
@ -16,7 +24,7 @@
%** CHANGER L'IMAGE PAR DÉFAUT **
%*****************************************************
\vspace{-10mm} % à ajuster en fonction de la hauteur du logo
\flushright\includesvg[scale=0.3]{logo/APT_Logo_RVB_Positif.svg}
\flushright\includegraphics[scale=0.3]{logo/APT_Logo_RVB_Positif}
\flushright\includegraphics[scale=0.3]{logo/X-IPparis-RVB.eps}

Binary file not shown.

View file

@ -39,38 +39,24 @@
\usepackage{fancyhdr}
\pagestyle{fancy}
\fancyhf{}
\renewcommand{\chaptermark}[1]{\markboth{#1}{#1}}
\fancyhead[lo]{\slshape\nouppercase{\rightmark}}
\fancyhead[re]{\slshape\nouppercase{\leftmark}}
\fancyhead[ro,le]{\thepage}
% \pagestyle{fancy}
% % Clear all headers and footers
% \fancyhf{}
\fancypagestyle{intro}{%
\fancyhf{}
\fancyfoot[C]{\thepage}
\renewcommand{\headrulewidth}{0pt}
\renewcommand{\footrulewidth}{0pt}
}
% % Header for even pages (left side)
% \fancyhead[LE]{\thechapter\quad\leftmark}
% % Header for odd pages (right side)
% \fancyhead[RO]{\rightmark\quad\thesection}
% % Ensure that chapter and section marks are used correctly
% \renewcommand{\chaptermark}[1]{\markboth{#1}{}}
% \renewcommand{\sectionmark}[1]{\markright{#1}}
% % Optional: define the appearance of chapter and section titles in the header
% \usepackage{titlesec}
% \titleformat{\chapter}[display]
% {\normalfont\Large\bfseries\color{pruneps}}
% {\chaptertitlename\ \thechapter}{20pt}{\LARGE}
% \titleformat{\section}
% {\normalfont\Large\bfseries\color{vertps}}
% {\thesection}{1em}{}
% Images
\graphicspath{{../img/}{../figure/}}
% Figure placement
\floatplacement{figure}{H}
\floatplacement{figure}{t}
%% Tikz Related
\usetikzlibrary{calc,shapes,backgrounds,arrows,automata,shadows,positioning,
@ -95,6 +81,19 @@ automata,positioning}
% Bibliographie
\input{../shared/biblio}
% Modification titre
\usepackage{titlesec}
\titlespacing*% the star= don't indent first paragraph after
{\subsection}% which command you want to set the spacing for
{0pt}% spacing to the left of heading
{1ex}% spacing before the heading
{1ex}% spacing after the heading
\titlespacing*%
{\section}%
{0pt}%
{1ex}%
{1ex}%
\newcounter{customchapter}
\newcounter{maincontentend}
% Important : modifie ici le nombre de chapitres que tu as.
@ -115,6 +114,7 @@ automata,positioning}
opacity=0.5,
contents={
\ifnum\value{maincontentend}=0
\ifnum\value{customchapter}>0
\checkoddpage
\ifoddpage
\begin{tikzpicture}[remember picture,overlay]
@ -128,6 +128,7 @@ automata,positioning}
\end{tikzpicture}
\fi
\fi
\fi
}
}
}
@ -199,13 +200,10 @@ automata,positioning}
\ActivateBG
\begin{selectlanguage}{french}
% \maketitle
\tableofcontents
\pagenumbering{roman}
\tableofcontents
\include{remerciements}
\include{chapter1-presentation_UMR}
\end{selectlanguage}
\begin{selectlanguage}{english}

View file

@ -13,7 +13,7 @@ Merci à Farida, Christelle et Sébastien pour avoir expliqué et mené les
démarches administratives.
Un merci tout particulier à tous les doctorants : Mary,
Marina, Emré, Tam, Caroline, Jérémy, Florian, Annaïg, Jules, Tanguy, Barbara,
Marina, Emré, Tam, Caroline, Jérémy, Florian, Annaïg, Jules, Hayato, Tanguy, Barbara,
Bastien et Armand. Merci à tous les autres stagiaires, particulièrement:
Alizée, Taliesin, Antoine, Alexandre, Francois, Pierre, Camille et Maxime.