Ajout sem 28

2025-07-07 13:30:34 +02:00 · 2025-07-07 13:30:34 +02:00 · aa9c40c310
commit aa9c40c310
parent 2f5525a507
2 changed files with 190 additions and 0 deletions
--- a/suivi/2025-28/2025-28.qmd
+++ b/suivi/2025-28/2025-28.qmd
@ -0,0 +1,97 @@
+---
+title: "Bilan semaine 28 2025 : 07 juillet - 11 juillet"
+categories: [colBiSBM, inférence, GNN]
+date: 2025-07-07
+date-modified: last-modified
+bibliography: references.bib
+---
+
+## TODO List
+
+- Pour clustering de collections sur données réelles :    
+    &rarr; L'intuition de Pierre semble être confirmé, les dissimilarités semblent arrêter de varier sensiblement pour de grandes valeurs $(Q_1,Q_2)$.
+    - ❓Je n'arrive plus à reproduire le bug pour l'inférence...
+    - S'assurer que ça marche et relancer
+
+- Creuser et explorer avec easy16s !
+
+- ⌛ **Calcul du score F1**Revérifier que j'entraîne correctement le VGAE car résultats de généralisation trop bons sur les autres réseaux Doré, ce qui est étonnant
+
+- Regarder la liste des cours du MathSV et de l'Université Paris-Saclay.
+
+- Dé-bugger les simulations :
+
+    - ⌛ Inférence : Relancer simus d'inférence avec n = 240 pour voir si la qualité augmenter (se rassurer). En fait on est déjà à 240, j'ai relancé avec M = 4 au lieu de M = 2.
+    En attente résultats MIGALE -> BUG, dois creuser mais juste des problèmes techniques -> Visiblement il y a d'autres problèmes que juste le plan de parallélisation.
+        - ⌛Bon le bug ne se reproduit plus... les jobs sont juste trop longs (> 120h) j'ai relancé, il ne reste que 182/972 conditions.
+
+- ⌛ **Plutôt regarder pour introduire un modèle $\delta$-colBiSBM**. Kmeans sur la densité des réseaux subdoré pour pré-partitionner et *clusteriser*.
+Car densités déséquilibrées. 
+
+:::{#ref-kmeans-vae}
+
+- Faire GNN-VAE Doré et sub-Doré avec kmeans et clustering sur l'espace latent
+J'ai commencé à regarder un peu
+
+:::
+
+### Inférence et microbes
+
+- Lancer *colBiSBM* sur $OTU\times Sample$ &rarr; problème du chargement en mémoire des données à voir
+- Lancer *colSBM* sur $OTU\times OTU$
+- TabNet pratiquer les [exercices](https://github.com/cregouby/Tutoriel_torch)
+- Regarder **SPARTA** Rennes
+- Lire Papiers compositional data (Aitchison et al. intro)
+- Lire article multi-niveaux Saint-Clair
+- Ecrire et étudier les modèles pour différents niveaux taxonomiques.
+\begin{align*}
+i \rightarrow &~N^1_i \subseteq N^2_i \subseteq N^3_i & \text{Taxonomie}\\
+Z^0_i \overset{?}{=} & Z^1_i \overset{?}{=}  Z^2_i \overset{?}{=} Z^3_i & \text{Groupes fonctionnels}
+\end{align*}
+
+#### Causalité
+
+Plus sur le temps long, à regarder
+
+- GT causalité
+- Daria Bystrova lire présentation @bystrovaCausalDiscovery (Meek rules, V-structure)
+
+## Lectures en cours 📚
+
+### OT
+- ⌛ @mazeletUnsupervisedLearningOptimal Intéressant pour le transport optimal entre graphes de tailles différentes
+- ⌛ @nennaLecture2Entropic Pour comprendre le problème d'OT régularisé pour l'entropie.
+- ⌛ @nennaLecture1Monge
+
+### Inférence de graphes
+
+- ⌛ @aitchisonStatisticalAnalysisCompositional1982a, en cours
+
+- ❗📖 @payneFiniteMixturesMultivariate2023 sur MixMPLN
+
+### Causalité
+
+- ❗📖 @bystrovaCausalDiscovery
+
+## A discuter
+
+### Congés P&S
+
+### Thèse
+
+- Faire préz CSI
+- Faire rapport CSI
+
+- 👨‍🏫 **Demander à Pierre** Comment valider les enseignements comme formations Adum ?
+
+### Interprétation écologiques résultats de Baldock
+
+- ⌛ Point avec Elisa, **oui on relance**
+
+### Inférence
+
+- pbs : variance, bcp de zero, covariables, offset et taxonomie (Reseaux arretes differents niveaux : Genre, OTU ...)
+
+> Combine networks at different taxonomic levels
+
+- Inférence + GREMLINS
--- a/suivi/2025-28/references.bib
+++ b/suivi/2025-28/references.bib
@ -0,0 +1,93 @@
+@article{mazeletUnsupervisedLearningOptimal,
+  title      = {Unsupervised {{Learning}} for {{Optimal Transport}} Plan Prediction between Unbalanced Graphs},
+  author     = {Mazelet, Sonia and Flamary, Rémi and Thirion, Bertrand},
+  abstract   = {Optimal transport between graphs, based on Gromov-Wasserstein and other extensions, is a powerful tool for comparing and aligning graph structures. However, solving the associated non-convex optimization problems is computationally expensive, which limits the scalability of these methods to large graphs. In this work, we present Unbalanced Learning of Optimal Transport (ULOT), a deep learning method that predicts optimal transport plans between two graphs. Our method is trained by minimizing the fused unbalanced Gromov-Wasserstein (FUGW) loss. We propose a novel neural architecture with cross-attention that is conditioned on the FUGW tradeoff hyperparameters. We evaluate ULOT on synthetic stochastic block model (SBM) graphs and on real cortical surface data obtained from fMRI. ULOT predicts transport plans with competitive loss up to two orders of magnitude faster than classical solvers. Furthermore, the predicted plan can be used as a warm start for classical solvers to accelerate their convergence. Finally, the predicted transport plan is fully differentiable with respect to the graph inputs and FUGW hyperparameters, enabling the optimization of functionals of the ULOT plan.},
+  langid     = {english},
+  keywords   = {/unread},
+  annotation = {Read\_Status: New\\
+                Read\_Status\_Date: 2025-06-11T09:08:09.864Z},
+  file       = {/home/louis/snap/zotero-snap/common/Zotero/storage/HPZEYMM9/Mazelet et al. - Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs.pdf}
+}
+
+@article{nennaLecture2Entropic,
+  title      = {Lecture 2: {{Entropic Optimal Transport}}},
+  author     = {Nenna, Luca},
+  langid     = {english},
+  keywords   = {/unread},
+  annotation = {Read\_Status: New\\
+                Read\_Status\_Date: 2025-06-11T16:06:28.547Z},
+  file       = {/home/louis/snap/zotero-snap/common/Zotero/storage/WGFIISDB/Nenna - Lecture 2 Entropic Optimal Transport.pdf}
+}
+
+@article{nennaLecture1Monge,
+  title      = {Lecture 1 {{Monge}} and {{Kantorovich}} Problems: From Primal to Dual},
+  author     = {Nenna, Luca},
+  langid     = {english},
+  keywords   = {/unread},
+  annotation = {Read\_Status: New\\
+                Read\_Status\_Date: 2025-06-13T09:24:13.832Z},
+  file       = {/home/louis/snap/zotero-snap/common/Zotero/storage/7LVQPD6D/Nenna - Lecture 1 Monge and Kantorovich problems from primal to dual.pdf}
+}
+
+@article{Morton2021.11.09.467939,
+  title        = {Scalable Estimation of Microbial Co-Occurrence Networks with {{Variational Autoencoders}}},
+  author       = {Morton, James T. and Silverman, Justin and Tikhonov, Gleb and Lähdesmäki, Harri and Bonneau, Rich},
+  date         = {2021},
+  journaltitle = {bioRxiv : the preprint server for biology},
+  shortjournal = {bioRxiv},
+  eprint       = {https://www.biorxiv.org/content/early/2021/11/11/2021.11.09.467939.full.pdf},
+  publisher    = {Cold Spring Harbor Laboratory},
+  doi          = {10.1101/2021.11.09.467939},
+  url          = {https://www.biorxiv.org/content/early/2021/11/11/2021.11.09.467939},
+  abstract     = {Estimating microbe-microbe interactions is critical for understanding the ecological laws governing microbial communities. Rapidly decreasing sequencing costs have promised new opportunities to estimate microbe-microbe interactions across thousands of uncultured, unknown microbes. However, typical microbiome datasets are very high dimensional and accurate estimation of microbial correlations requires tens of thousands of samples, exceeding the computational capabilities of existing methodologies. Furthermore, the vast majority of microbiome studies collect compositional metagenomics data which enforces a negative bias when computing microbe-microbe correlations. The Multinomial Logistic Normal (MLN) distribution has been shown to be effective at inferring microbe-microbe correlations, however scalable Bayesian inference of these distributions has remained elusive. Here, we show that carefully constructed Variational Autoencoders (VAEs) augmented with the Isometric Log-ratio (ILR) transform can estimate low-rank MLN distributions thousands of times faster than existing methods. These VAEs can be trained on tens of thousands of samples, enabling co-occurrence inference across tens of thousands of microbes without regularization. The latent embedding distances computed from these VAEs are competitive with existing beta-diversity methods across a variety of mouse and human microbiome classification and regression tasks, with notable improvements on longitudinal studies.Competing Interest StatementThe authors have declared no competing interest.},
+  elocation-id = {2021.11.09.467939},
+  keywords     = {/unread},
+  annotation   = {Read\_Status: New\\
+                  Read\_Status\_Date: 2025-06-30T14:17:29.518Z}
+}
+@article{aitchisonStatisticalAnalysisCompositional1982a,
+  title        = {The {{Statistical Analysis}} of {{Compositional Data}}},
+  author       = {Aitchison, J.},
+  date         = {1982},
+  journaltitle = {Journal of the Royal Statistical Society. Series B (Methodological)},
+  volume       = {44},
+  number       = {2},
+  eprint       = {2345821},
+  eprinttype   = {jstor},
+  pages        = {139--177},
+  publisher    = {[Royal Statistical Society, Oxford University Press]},
+  issn         = {0035-9246},
+  url          = {https://www.jstor.org/stable/2345821},
+  urldate      = {2025-05-07},
+  abstract     = {The simplex plays an important role as sample space in many practical situations where compositional data, in the form of proportions of some whole, require interpretation. It is argued that the statistical analysis of such data has proved difficult because of a lack both of concepts of independence and of rich enough parametric classes of distributions in the simplex. A variety of independence hypotheses are introduced and interrelated, and new classes of transformed-normal distributions in the simplex are provided as models within which the independence hypotheses can be tested through standard theory of parametric hypothesis testing. The new concepts and statistical methodology are illustrated by a number of applications.},
+  keywords     = {/unread},
+  annotation   = {Read\_Status: New\\
+                  Read\_Status\_Date: 2025-05-07T07:43:38.485Z},
+  file         = {/home/louis/snap/zotero-snap/common/Zotero/storage/S97URH4Y/Aitchison - 1982 - The Statistical Analysis of Compositional Data.pdf}
+}
+@online{payneFiniteMixturesMultivariate2023,
+  title       = {Finite {{Mixtures}} of {{Multivariate Poisson-Log Normal Factor Analyzers}} for {{Clustering Count Data}}},
+  author      = {Payne, Andrea and Silva, Anjali and Rothstein, Steven J. and McNicholas, Paul D. and Subedi, Sanjeena},
+  date        = {2023-11-13},
+  eprint      = {2311.07762},
+  eprinttype  = {arXiv},
+  eprintclass = {stat},
+  doi         = {10.48550/arXiv.2311.07762},
+  url         = {http://arxiv.org/abs/2311.07762},
+  urldate     = {2025-07-02},
+  abstract    = {A mixture of multivariate Poisson-log normal factor analyzers is introduced by imposing constraints on the covariance matrix, which resulted in flexible models for clustering purposes. In particular, a class of eight parsimonious mixture models based on the mixtures of factor analyzers model are introduced. Variational Gaussian approximation is used for parameter estimation, and information criteria are used for model selection. The proposed models are explored in the context of clustering discrete data arising from RNA sequencing studies. Using real and simulated data, the models are shown to give favourable clustering performance. The GitHub R package for this work is available at https://github.com/anjalisilva/mixMPLNFA and is released under the open-source MIT license.},
+  pubstate    = {prepublished},
+  keywords    = {/unread,Statistics - Computation,Statistics - Machine Learning,Statistics - Methodology},
+  annotation  = {Read\_Status: New\\
+                 Read\_Status\_Date: 2025-07-02T09:31:47.579Z},
+  file        = {/home/louis/snap/zotero-snap/common/Zotero/storage/BXVPEIDD/Payne et al. - 2023 - Finite Mixtures of Multivariate Poisson-Log Normal Factor Analyzers for Clustering Count Data.pdf;/home/louis/snap/zotero-snap/common/Zotero/storage/L5DAS5C2/2311.html}
+}
+@unpublished{bystrovaCausalDiscovery,
+  title      = {Causal Discovery},
+  author     = {Bystrova, Daria},
+  langid     = {english},
+  keywords   = {/unread},
+  annotation = {Read\_Status: New\\
+                Read\_Status\_Date: 2025-07-02T09:34:39.476Z},
+  file       = {/home/louis/snap/zotero-snap/common/Zotero/storage/NQE5DY92/Bystrova - Causal discovery.pdf}
+}