rapport : modifier le rapport

2024-07-05 16:59:31 +02:00 · 2024-07-05 16:59:31 +02:00 · 43485078a7
commit 43485078a7
parent 5921c7fa60
10 changed files with 138 additions and 5110 deletions
--- a/img/logo/APT_Logo_RVB_Positif.png
+++ b/img/logo/APT_Logo_RVB_Positif.png
--- a/img/logo/APT_Logo_RVB_Positif.svg
+++ b/img/logo/APT_Logo_RVB_Positif.svg
--- a/img/logo/Logotype_UPSaclay_CMJN.eps
+++ b/img/logo/Logotype_UPSaclay_CMJN.eps
--- a/rapport/chapter1-presentation_UMR.tex
+++ b/rapport/chapter1-presentation_UMR.tex
@ -1,5 +1,6 @@
-\addtocounter{customchapter}{1}
-\chapter{L'UMR MIA Paris-Saclay}
+% \addtocounter{customchapter}{1}
+\chapter*{L'UMR MIA Paris-Saclay}
+\pagestyle{intro}

 L'UMR MIA Paris-Saclay est une entité de recherche qui regroupe des
 statisticiens et des informaticiens spécialisés dans la modélisation et
@ -37,25 +38,25 @@ La figure \ref{fig:organigramme-umr} présente l'organigramme complet de l'unit
 \newline
 \emph{Source:~\cite{AccueilMIAParisSaclay}}\\

-\begin{sidewaysfigure}[h!]
+\begin{sidewaysfigure}
    \begin{center}
        % \includegraphics[scale=0.4]{img/Organigramme_MIA-Paris-Saclay}
-        \includegraphics[scale=0.45]{Organigramme_MIA-Paris-Saclay_GS 06-2024.jpg}
+        \includegraphics[scale=0.37]{Organigramme_MIA-Paris-Saclay_GS 06-2024.jpg}
        \caption{Organigramme de l'UMR}
        \label{fig:organigramme-umr}
    \end{center}
 \end{sidewaysfigure}

-\section[Encadrement]{Encadrement et vie en stage}
+\section*{Encadrement et vie en stage}

 Au cours de mon stage, j'étais encadré par Pierre Barbillon et Sophie Donnet
 et fréquemment en discussion avec eux et Saint-Clair Chabert-Liddell dont
 j'ai poursuivi les travaux.

 Le contexte de travail, au sein des ingénieurs d'études, des doctorants, des
-chercheurs et des maîtres de conférences, a été pour moi très enrichissant. Ce
-stage s'inscrit dans la construction de mon parcours professionnel en validant
-le désir que je présentais de faire de la recherche.
+chercheurs et des maîtres de conférences, a été pour moi très enrichissant.
+% Ce stage s'inscrit dans la construction de mon parcours professionnel en validant
+% le désir que je présentais de faire de la recherche.

 Par ailleurs, divers projets entrepris au sein du laboratoire ont permis de
 nouer des relations amicales en dehors des heures de travail. Par exemple, le
--- a/rapport/chapter2-context.tex
+++ b/rapport/chapter2-context.tex
@ -1,5 +1,5 @@
 \addtocounter{customchapter}{1}
-\chapter{Context of the study}
+\chapter{Introduction}

 \section{Usage and importance of bipartite graphs}\label{sec:usage-and-importance-of-bipartite-graphs}
 Bipartite graphs, denoted as $G = (U,V,E)$ with $U$ and $V$ two disjoint and
@ -38,17 +38,19 @@ $V$ vertices.
 \end{minipage}
 \begin{minipage}{0.5\linewidth}
    \begin{center}
-        Incidence matrix
-        $X=\left(
-            \begin{array}{rrrrr}
+        $X=
+            \begin{pmatrix}
                1 & 1 & 1 & 1 & 0 \\
                0 & 0 & 1 & 1 & 1 \\
                0 & 0 & 0 & 0 & 1 \\
-            \end{array}\right)
+            \end{pmatrix}
        $\\
+        \vspace*{\baselineskip}
+        Incidence matrix
    \end{center}
 \end{minipage}

+\vspace*{\baselineskip}
 $X$ is the \emph{incidence matrix} and is the mathematical object on which
 computations are performed. It is filled with the following rule:
 \begin{equation*}
@ -57,7 +59,7 @@ computations are performed. It is filled with the following rule:
        X_{ij} \neq 0 & \text{otherwise}
    \end{cases}
 \end{equation*}
-If the network represents binary observation (like presence-absence observation) then
+If the network represents binary observations (like presence-absence) then
 $X_{ij}\in\mathcal{K}=\{0,1\},\forall(i,j)$; if the interactions are weighted
 (like an abundance count), $X_{ij}\in\mathcal{K}=\mathbb{N},\forall(i,j)$.

@ -74,10 +76,10 @@ value is the review of the user $j$ for the movie $i$.\\
 Another use is the representation of ecological interactions like
 plant-pollinator \parencite{ramos-jilibertoTopologicalChangeAndean2010},
 birds-seed dispersion, prey-predator or host-parasite
-\parencite{kaszewska-gilasGlobalStudiesHostParasite2021}. In those cases, the
-rows are pollinator species and the columns are plant species, and the
-intersection is a value, binary if it is a presence/absence or a value if it is
-an abundance count.
+\parencite{kaszewska-gilasGlobalStudiesHostParasite2021}. For plant-pollinator
+interactions, the rows are pollinator species and the columns are plant species,
+and the intersection is a value, binary if it is a presence/absence or a value
+if it is an abundance count.

 Bipartite graphs are widely used in biology, in various fields, among which the
 previously cited ecological networks, but also in medicine with biomedical
@ -134,29 +136,30 @@ Parameters
 On \ref{fig:LBMvisu}, $\bm{\pi}$ are the probabilities for a row node to belong
 to the row block of corresponding color, $\bm{\rho}$ are the probabilities for
 a column node to belong to the column block of corresponding color and
-$\bm{\alpha}$ are the connectivity parameters between the row and column
-blocks.
+$\bm{\alpha}$ is a matrix $Q_1 \times Q_2$ of the connectivity parameters
+between the row and column blocks.

 This model can be used to easily generate bipartite graphs with complex and
 very varied structures. But when trying to determine the structure of a given
 network we need to find those parameters and as the row and column block
 memberships are \emph{latent} i.e.,\ they are not known and must be inferred.

-For this a common approach is to use a VEM algorithm (proposed for SBM in
-~\cite{daudinMixtureModelRandom2008} and for LBM in
+For this a common approach is to use a \emph{variational} EM algorithm (proposed
+for SBM in~\cite{daudinMixtureModelRandom2008} and for LBM in
 ~\cite{govaertEMAlgorithmBlock2005}) those groups and the required parameters
-can be inferred by maximizing a lower bound of the likelihood minus a penalty.
+can be inferred by maximizing a lower bound of the likelihood.

 \section{colSBM model, a joint model for a collection of networks}
 \label{sec:colsbm-model-a-joint-model-for-a-collection-of-networks}
 The \emph{colSBM} model introduced by ~\cite{chabert-liddellLearningCommonStructures2024a}
-propose an extension of the SBM model to collections of SBMs. A collection is a
-set of networks which nodes are not common or linked between different networks,
-the interactions have the same valuations and are of the same type.
+propose an extension of the SBM model to collections of simple (or unipartite)
+networks. A collection is a set of networks which nodes are not common or linked
+between different networks, the interactions have the same valuations and
+are of the same type.

 The model can retrieve the shared structure in a collection, indicate if
 networks should be grouped in a collection and in a large pool of networks,
 collections with common structures.

-The next step after designing this collection model for unipartite was to adapt
-it to the bipartite case.
+The next step after designing this collection model for unipartite networks was
+to extend it to the bipartite case.
--- a/rapport/chapter3-structure-detection.tex
+++ b/rapport/chapter3-structure-detection.tex
@ -247,28 +247,6 @@ And we obtain the following formulae for the $\bm{\tau^m}$:
 which are used to update iteratively the values by a fixed point algorithm with
 only one step.

-% TODO move to technical.tex
-% From the above formulae we obtain for the Bernoulli distribution:
-% \begin{itemize}
-%     \item[-] \textit{iid} :
-%         \[ \bm{\tau}^{m,1} = ~^{t}\pi + \exp((\text{Mask}^{m} \odot A^{m})
-%             \bm{\tau}^{m,2} ~^{t}(\text{logit}(\alpha)) + \text{Mask}^{m}
-%             \bm{\tau}^{m,2} ~^{t}\log(\bm{1} - \alpha)) \]
-%         \[ \bm{\tau}^{m,2} = ~^{t}\rho + \exp(~^{t}(\text{Mask}^{m} \odot A^{m})
-%             \bm{\tau}^{m,1} \text{logit}(\alpha) + ~^{t}\text{Mask}^{m}
-%             \bm{\tau}^{m,1} \log(\bm{1} - \alpha)) \]
-%     \item[-] $\rho\pi$ :
-%         \[ \bm{\tau}^{m,1} = ~^{t}\pi^{m} + \exp((\text{Mask}^{m} \odot A^{m})
-%             \bm{\tau}^{m,2} ~^{t}(\text{logit}(\alpha)) + \text{Mask}^{m}
-%             \bm{\tau}^{m,2} ~^{t}\log(\bm{1} - \alpha)) \]
-%         \[ \bm{\tau}^{m,2} = ~^{t}\rho^{m} + \exp(~^{t}(\text{Mask}^{m} \odot A^{m})
-%             \bm{\tau}^{m,1} \text{logit}(\alpha) + ~^{t}\text{Mask}^{m}
-%             \bm{\tau}^{m,1} \log(\bm{1} - \alpha)) \]
-% \end{itemize}
-
-% with $\text{Mask}^{m}$ the matrix containing $0$ if the value is a NA and a 1
-% otherwise.
-
 \subsection{M step of the algorithm}
 \label{ssec:m-step-of-the-algorithm}
 At iteration $(t)$ the M-step maximizes the variational bound with respect to
@ -353,40 +331,41 @@ BIC-like criterion in the following manner:

 We provide below the expression for the penalties for the 4 models that we
 propose.
-
-\paragraph*{\textit{iid-colBiSBM}}
-For the \textit{iid-colBiSBM} the penalties were modified in the following way:
-
-\begin{itemize}
-    \item For the $\pi$s and $\rho$s:
-          \[\text{pen}_{\pi}(Q_1) = (Q_1 - 1)\log(\sum_{m=1}^{M}n_{1}^{m})\]
-          \[\text{pen}_{\rho}(Q_2) = (Q_2 - 1)\log(\sum_{m=1}^{M}n_{2}^{m})\]
-    \item For the $\alpha$s :
+\begin{description}
+    \item[\textit{iid-colBiSBM}] For the $\bm\pi$ and $\bm\rho$:
+          \begin{align*}
+              \text{pen}_{\pi}(Q_1) = (Q_1 - 1)\log(\sum_{m=1}^{M}n_{1}^{m}) & , &
+              \text{pen}_{\rho}(Q_2) = (Q_2 - 1)\log(\sum_{m=1}^{M}n_{2}^{m})
+          \end{align*}
+          For the $\bm\alpha$:
          \[\text{pen}_{\alpha}(Q_1, Q_2) = Q_1 \times Q_2 \log(N_M)\]
          with
          \[ N_M = \sum_{m = 1}^{M} n_{1}^{m} \times n_{2}^{m} \]
-\end{itemize}
-And thus the $\text{BIC-L}$ formula is now:
-\[ \text{BIC-L}(\bm{X},Q_1, Q_2) = \max_{\theta} \mathcal{J} (\mathcal{\hat{R}}, \bm{\theta})
-    - \frac{1}{2} [\text{pen}_{\pi}(Q_1) + \text{pen}_{\rho}(Q_2) + \text{pen}_{\alpha}(Q_1, Q_2)]\]
-
-\paragraph*{\textit{$\rho\pi$-colBiSBM}}
-For the \textit{$\rho\pi$-colBiSBM} the penalties are the following:
-
-\begin{itemize}
-    \item The support penalties are:
-          \[ \text{pen}_{S_1}(Q_1) = -2 \log p_{Q_1} (S_1) \]
-          \[ \text{pen}_{S_2}(Q_2) = -2 \log p_{Q_2} (S_2) \]
-          with
-          \[ \log p_{Q_1}(S_1) = - M \log(Q_1) - \sum_{m=1}^{M} \log {Q_1 \choose Q_1^{(m)}} \]
-          \[ \log p_{Q_2}(S_2) = - M \log(Q_2) - \sum_{m=1}^{M} \log {Q_2 \choose Q_2^{(m)}} \]
-    \item Penalties for the $\rho$s and $\pi$s:
-          \[ \text{pen}_{\pi}(Q_1, S_1) = \sum_{m=1}^{M} (Q_{1}^{(m)} - 1) \log n_{1}^{m} \]
-          \[ \text{pen}_{\rho}(Q_2, S_2) = \sum_{m=1}^{M} (Q_{2}^{(m)} - 1) \log n_{2}^{m} \]
-    \item Penalties for the $\alpha$s:
-          \[ \text{pen}_{\alpha}(Q_1, Q_2, S_1, S_2) = (\sum_{q=1}^{Q_1} \sum_{r=1}^{Q_2} \mathbb{1}_{(S_1)'S_2 > 0}) \log (N_M) \]
-\end{itemize}
-And the corresponding BIC-L formula:
+          And thus the $\text{BIC-L}$ formula is the following:
+          \[ \text{BIC-L}(\bm{X},Q_1, Q_2) = \max_{\theta}
+              \mathcal{J} (\mathcal{\hat{R}}, \bm{\theta})
+              - \frac{1}{2} [\text{pen}_{\pi}(Q_1) + \text{pen}_{\rho}(Q_2) +
+                  \text{pen}_{\alpha}(Q_1, Q_2)]\]
+    \item[\textit{$\bm{\pi\rho}$-colBiSBM}] The support penalties are
+          \begin{align*}
+              \text{pen}_{S_1}(Q_1) = -2 \log p_{Q_1} (S_1) & , &
+              \text{pen}_{S_2}(Q_2) = -2 \log p_{Q_2} (S_2)
+          \end{align*}
+          with \begin{align*}
+              \log p_{Q_1}(S_1) = - M \log(Q_1) - \sum_{m=1}^{M} \log {Q_1
+              \choose Q_1^{(m)}}, &
+              \log p_{Q_2}(S_2) = - M \log(Q_2) - \sum_{m=1}^{M} \log {Q_2
+              \choose Q_2^{(m)}}.
+          \end{align*}
+          And penalties for the $\bm\rho$ and $\bm\pi$ are
+          \[ \text{pen}_{\pi}(Q_1, S_1) = \sum_{m=1}^{M} (Q_{1}^{(m)} - 1)
+              \log n_{1}^{m},
+              ~\text{pen}_{\rho}(Q_2, S_2) = \sum_{m=1}^{M} (Q_{2}^{(m)} - 1)
+              \log n_{2}^{m}. \]
+          Penalties for the $\bm\alpha$
+          \[ \text{pen}_{\alpha}(Q_1, Q_2, S_1, S_2) = (\sum_{q=1}^{Q_1}
+              \sum_{r=1}^{Q_2} \mathbb{1}_{(S_1)'S_2 > 0}) \log (N_M). \]
+          And the corresponding BIC-L formula,
          \[
              \begin{aligned}
                  \text{BIC-L}(\bm{X},Q_1, Q_2) =
@ -397,6 +376,7 @@ And the corresponding BIC-L formula:
                                & + \text{pen}_{S_1}(Q_1) + \text{pen}_{S_2}(Q_2))]                                            \\
              \end{aligned}
          \]
+\end{description}

 \subsection{Initialization and pairing of the models}
 \label{ssec:initialization-and-pairing-of-the-models}
@ -407,18 +387,18 @@ previously described VEM algorithm we obtain for each network its parameters

 We then compute the marginal laws for each dimension, for each network. Then we
 order the network blocks by the probabilities obtained in decreasing order.
-\begin{itemize}
-    \item For the memberships on the columns: $col~order_m = order\left(\pi_m \times
-              \alpha_m\right)$
-    \item For the memberships on the rows: $row~order_m = order\left(\rho_m \times
-              ~^{t}(\alpha_m)\right)$
-\end{itemize}
+
+For the memberships on the columns: $col~order_m = order\left(\pi_m \times
+    \alpha_m\right)$.
+
+For the memberships on the rows: $row~order_m = order\left(\rho_m \times
+    ~^{t}(\alpha_m)\right)$.

 Using this order we relabel the memberships for the $M$ fitted collection of a
 single network. Then we use the $M$ memberships to fit a collection containing
 the $M$ networks.
-\subsection{Greedy exploration to find an estimation of the mode}
-\label{ssec:greedy-exploration-to-find-an-estimation-of-the-mode}
+\subsection{Greedy exploration to find an estimation of the mode}\label{ssec:greedy-exploration-to-find-an-estimation-of-the-mode}
+
 Using the previously fitted models for $Q = (1,2)$ and $Q = (2,1)$ we choose to
 perform a greedy exploration to find a first mode.

@ -428,7 +408,7 @@ memberships for the points $Q \in \{(Q_1 + 1, Q_2),(Q_1, Q_2 + 1),(Q_1 - 1,
 maximizes the BIC-L as the next point from which to repeat the procedure. We
 repeat the procedure until the BIC-L stops increasing $2$ times in a row.

-\begin{algorithm}[H]
+\begin{algorithm}[t]
    \caption{Greedy Exploration for Mode Estimation}
    \SetAlgoLined
    \SetKwInOut{Input}{Input}
@ -486,7 +466,7 @@ consists of two alternating steps:
          model.
 \end{itemize}

-\begin{algorithm}[H]
+\begin{algorithm}[t]
    \caption{Moving Window Procedure}
    \SetAlgoLined
    \SetKwInOut{Input}{Input}
@ -530,7 +510,7 @@ consists of two alternating steps:
    \textbf{Output:} Best model with maximum BIC-L in the window
 \end{algorithm}

-\begin{figure}[H]
+\begin{figure}[t]
    \definecolor{mypurple}{RGB}{128,0,128}
    \begin{subfigure}[b]{0.48\textwidth}
        \begin{tikzpicture}[scale=1.5]
@ -698,7 +678,7 @@ And the dissimilarity between any pair of networks $(m,m')\in\mathcal{M}^2$ is t
    D_{\mathcal{M}}(m,m') = \sum_{q = 1}^{Q_1} \sum_{r = 1}^{Q_2} \max(\widetilde{\pi}_{q}^{m}, \widetilde{\pi}_{q}^{m'}) \left( \widetilde{\alpha}_{qr}^{m} - \widetilde{\alpha}_{qr}^{m'}\right)^{2} \max(\widetilde{\rho}_{r}^{m}, \widetilde{\rho}_{r}^{m'})
 \]

-\begin{figure}[H]
+\begin{figure}[t]
    \centering
    \begin{tikzpicture}
        \tikzstyle{instruct}=[font=\small, text justified, rectangle,draw,fill=yellow!50]
--- a/rapport/page-garde.tex
+++ b/rapport/page-garde.tex
@ -4,7 +4,15 @@

    \newgeometry{left=7.5cm,bottom=2cm, top=1cm, right=1cm}

-    % \tikz[remember picture,overlay] \node[opacity=1,inner sep=0pt] at (-28mm,-135mm){\includegraphics{Bandeau_UPaS.pdf}};
+    \begin{tikzpicture}[remember picture,overlay]
+        \fill [pruneps] (-4,-28.3) rectangle (-8.15, 1.4);
+
+        \foreach \x  in {-8.1, -7.9, -7.6, -7.2}
+        \draw[white, line width=0.5mm] (\x, -28.3) -- (\x, 1.4);
+
+        \node[inner sep=0pt, rotate=90, font=\fontfamily{fvs}\fontseries{b}\fontsize{26}{26}\selectfont, text=white] (rapport) at (-6.3, -22.4) {Rapport de stage};
+        \node[inner sep=0pt, opacity=1] (logo-UPS) at (-0.85,0) {\includegraphics{logo/Logotype_UPSaclay_CMJN.eps}};
+    \end{tikzpicture}

    % fonte sans empattement pour la page de titre
    \fontfamily{fvs}\fontseries{m}\selectfont
@ -16,7 +24,7 @@
    %**  CHANGER L'IMAGE PAR DÉFAUT **
    %*****************************************************
    \vspace{-10mm} % à ajuster en fonction de la hauteur du logo
-    \flushright\includesvg[scale=0.3]{logo/APT_Logo_RVB_Positif.svg}
+    \flushright\includegraphics[scale=0.3]{logo/APT_Logo_RVB_Positif}
    \flushright\includegraphics[scale=0.3]{logo/X-IPparis-RVB.eps}


--- a/rapport/rapport.pdf
+++ b/rapport/rapport.pdf
--- a/rapport/rapport.tex
+++ b/rapport/rapport.tex
@ -39,38 +39,24 @@
 \usepackage{fancyhdr}
 \pagestyle{fancy}
 \fancyhf{}
+\renewcommand{\chaptermark}[1]{\markboth{#1}{#1}}
 \fancyhead[lo]{\slshape\nouppercase{\rightmark}}
 \fancyhead[re]{\slshape\nouppercase{\leftmark}}
 \fancyhead[ro,le]{\thepage}
-% \pagestyle{fancy}

-% % Clear all headers and footers
-% \fancyhf{}
+\fancypagestyle{intro}{%
+    \fancyhf{}
+    \fancyfoot[C]{\thepage}
+    \renewcommand{\headrulewidth}{0pt}
+    \renewcommand{\footrulewidth}{0pt}
+}

-% % Header for even pages (left side)
-% \fancyhead[LE]{\thechapter\quad\leftmark}
-
-% % Header for odd pages (right side)
-% \fancyhead[RO]{\rightmark\quad\thesection}
-
-% % Ensure that chapter and section marks are used correctly
-% \renewcommand{\chaptermark}[1]{\markboth{#1}{}}
-% \renewcommand{\sectionmark}[1]{\markright{#1}}
-
-% % Optional: define the appearance of chapter and section titles in the header
-% \usepackage{titlesec}
-% \titleformat{\chapter}[display]
-%   {\normalfont\Large\bfseries\color{pruneps}}
-%   {\chaptertitlename\ \thechapter}{20pt}{\LARGE}
-% \titleformat{\section}
-%   {\normalfont\Large\bfseries\color{vertps}}
-%   {\thesection}{1em}{}

 % Images
 \graphicspath{{../img/}{../figure/}}

 % Figure placement
-\floatplacement{figure}{H}
+\floatplacement{figure}{t}

 %% Tikz Related
 \usetikzlibrary{calc,shapes,backgrounds,arrows,automata,shadows,positioning,
@ -95,6 +81,19 @@ automata,positioning}
 % Bibliographie
 \input{../shared/biblio}

+% Modification titre
+\usepackage{titlesec}
+\titlespacing*% the star= don't indent first paragraph after
+    {\subsection}% which command you want to set the spacing for
+    {0pt}% spacing to the left of heading
+    {1ex}% spacing before the heading
+    {1ex}% spacing after the heading
+\titlespacing*%
+    {\section}% 
+    {0pt}% 
+    {1ex}% 
+    {1ex}%
+
 \newcounter{customchapter}
 \newcounter{maincontentend}
 % Important : modifie ici le nombre de chapitres que tu as.
@ -115,6 +114,7 @@ automata,positioning}
 		opacity=0.5,
 		contents={
            \ifnum\value{maincontentend}=0
+			\ifnum\value{customchapter}>0
 			\checkoddpage
 			\ifoddpage
 			\begin{tikzpicture}[remember picture,overlay]
@ -128,6 +128,7 @@ automata,positioning}
 			\end{tikzpicture}
 			\fi
            \fi
+			\fi
 }
 }
 }
@ -199,13 +200,10 @@ automata,positioning}
 \ActivateBG
 \begin{selectlanguage}{french}
 	% \maketitle
-
-    \tableofcontents
 	\pagenumbering{roman}
+	\tableofcontents
 	\include{remerciements}
-
 	\include{chapter1-presentation_UMR}
-
 \end{selectlanguage}

 \begin{selectlanguage}{english}
--- a/rapport/remerciements.tex
+++ b/rapport/remerciements.tex
@ -13,7 +13,7 @@ Merci à Farida, Christelle et Sébastien pour avoir expliqué et mené les
 démarches administratives.

 Un merci tout particulier à tous les doctorants : Mary,
-Marina, Emré, Tam, Caroline, Jérémy, Florian, Annaïg, Jules, Tanguy, Barbara,
+Marina, Emré, Tam, Caroline, Jérémy, Florian, Annaïg, Jules, Hayato, Tanguy, Barbara,
 Bastien et Armand. Merci à tous les autres stagiaires, particulièrement:
 Alizée, Taliesin, Antoine, Alexandre, Francois, Pierre, Camille et Maxime.