Affected files: .obsidian/workspace.json Thèse/Articles/Review papier colBiSBM.md
133 lines
11 KiB
Markdown
133 lines
11 KiB
Markdown
## Reviewer 1
|
||
|
||
This manuscript introduces colBiSBM, a family of probabilistic models designed to identify shared mesoscale structure across collections of bipartite networks. The approach generalises the Latent Block Model (LBM) by assuming that networks are independent realisations of a common bipartite stochastic block model, with varying degrees of flexibility in block proportions across networks. The authors derive likelihood expressions, identifiability results, a variational EM estimation procedure, and a BIC-like model selection criterion. They also propose a strategy to partition a large collection of networks into subgroups sharing similar structures. The theoretical developments are complemented by a substantial simulation study and an application to five ecological plant–pollinator networks (four British cities and a Kenyan savanna). The real data application illustrates the method’s usefulness in uncovering shared ecological roles and detecting sub-collections of networks with distinct connectivity patterns.
|
||
|
||
The manuscript is technically strong and addresses a timely methodological gap. It is also clearly written and well motivated. I think this is indeed a significant addition to the literature of network models. The proposed methodology is also well supported with technical justifications and synthetic experiments. I think the style of the writing is good and the key idea of the proposed block modelling has been well described. However, I have a few concerns/queries that I would like to point out below .
|
||
|
||
1. The paper presents a meaningful extension to existing SBM/LBM models by ex-
|
||
plicitly handling multiple bipartite networks with independent node sets. This is
|
||
a genuine gap: many multilayer or multigraph SBMs assume shared nodes or only
|
||
handle uni-partite graphs. However, the assumption that networks share an identical
|
||
connectivity matrix (with optional empty blocks) is quite strong. While the authors
|
||
acknowledge this limitation, a deeper discussion of its implications—and possible
|
||
alternatives (e.g., smoothly varying connectivity or say two different connectivity
|
||
parameters indicating different connectivity patterns could be worth discussing and
|
||
may strengthen the paper.
|
||
2. Although sufficient conditions in identifiability results proof need not be realistic,
|
||
the authors could clarify whether these conditions hold (or approximately hold)
|
||
in typical applications, especially ecological ones where interaction intensities may
|
||
exhibit similarity across blocks.
|
||
3. The derivation of the BIC-L criterion is original and well justified. One strength is
|
||
the explicit accounting for support matrices in πρ-colBiSBM, which many related
|
||
works ignore. That said, the computational cost is high: evaluating many combi-
|
||
nations of (Q1, Q2) and support matrices, along with multiple initialisations, can
|
||
rapidly become prohibitive for large M or large graphs. Some empirical results of
|
||
computational complexity in the synthetic data examples would be useful.
|
||
4. The authors mentioned that the original real dataset focused on the daily temporal
|
||
structure. Does concatenation valid here or are you ignoring any trend structure int
|
||
he original data? Have you detrended it already?
|
||
5. I can see the proposed method performing well than its competitors in the real
|
||
world dataset but I wasn’t sure about the ”take home message” from an ecological
|
||
perspective?
|
||
6. Apart from VGAE, state-of-the-art heterogeneous Graph Neural Networks (GNN)
|
||
or bipartite graph neural architectures might be worth selecting as possible Some
|
||
discussion on representational differences with the competitors rather than focusing
|
||
solely on AUC—would be welcome.
|
||
7. (minor comment) Improve figure readability (e.g. axis labels in Fig. 4).
|
||
The problem is indeed an important aspect of bi-partite network models and the
|
||
proposed methodology try to resolve issues by proposing a modelling approach both com-
|
||
putationally and theoretically tractable. The limitations identified above mainly concern
|
||
additional discussion, clarification, and minor refinements, rather than fundamental flaws.
|
||
Addressing these points will further enhance the clarity and applicability of the paper.
|
||
Hence, I would recommend minor revision.
|
||
|
||
## Reviewer 2
|
||
|
||
Comments to the Author
|
||
Common Structure Discovery in Collections of Bipartite Networks: Application to Pollination Systems
|
||
This paper introduces colBiSBM, a stochastic block model for bipartite networks. They present
|
||
• A variational EM algorithm for parameter estimation, coupled with
|
||
• an adaptation of the Integrated Classification Likelihood (ICL) criterion for model selection.
|
||
They conduct simulation studies to:
|
||
• recover common structures,
|
||
• improve clustering performance, and
|
||
• enhance link prediction by borrowing strength across networks.
|
||
The empirical application:
|
||
• plant–pollinator networks
|
||
• and authors claim to highlight that the proposed model uncovers shared ecological roles and partitions the networks into sub-collections with similar connectivity patterns.
|
||
They further set out these aims:
|
||
• the nodes representing the same individual/species across multiple networks may be allocated to different blocks
|
||
• allow for fluctuations in block proportion.. and the possibility of a block being “unpopulated”
|
||
• unclear how both of these are evidenced from the experiments see below
|
||
Comments
|
||
Introduction
|
||
• Stronger first paragraph, for instance “references therein” is not really appropriate for a first page
|
||
• The literature motivating ecological studies should be higher up in the introduction, e.g. “Ecological studies..”. It is not clear from your writing whether SBMs are a commonly used or accepted tool for ecological studies – explain this and why or why not, why your model is used
|
||
• Rewrite to clearly refer to what is done in Section 3, like you did for Section 4
|
||
• Needs to be rewritten for grammar as some sentences are hard to follow, for instance:
|
||
o Bipartite graphs are widely applied in biology, not only for ecological networks but also in fields such as medicine, where they are used to model biomedical, biomolecular, and epidemiological networks (Pavlopoulos et al. (2018)).
|
||
• The authors state that the model enables the identification of a shared connectivity structure that accounts for the observed collection of networks
|
||
o Claim: facilitates the identification of nodes that exhibit similar ecological roles across different networks
|
||
o This appears to be contradictory from your experiments see below
|
||
o The degree corrected SBM would make a good null-model to compare against for a single network to justify if there is real improvement
|
||
• Can be used to partition the collection of networks thus leading to sub-collections of networks that are similar in terms of their connectivity structures
|
||
o This is somewhat evidenced
|
||
• its ability to transfer information between networks that share a common structure.
|
||
o It is not immediately clear from your experiments/simulation studies show this particular objective was obtained.
|
||
|
||
|
||
|
||
Data motivation and existing models
|
||
• The presentation of British plant pollinator networks should be moved to the experiments section. This section should only refer to mathematical models. Mistakes are presented in definition of SBM. This section needs to be rewritten.
|
||
|
||
|
||
|
||
3. Joint modelisation of multiple bipartite networks
|
||
The presentation of Section 3.2 is a bit clunky at times. Needs rewriting:
|
||
• Does: “The possible nullity of some block proportion” mean “null block proportions”?
|
||
I do not follow the definition or role of the support matrix.
|
||
The purpose of Table 2 is unclear and it does not objectively describe the difference between models, a stronger motivation needed in section 3.
|
||
|
||
4. Statistical inference
|
||
4.3. Variational estimation of the parameters
|
||
• This approach is logical: colBiSBM models, the VE-step is computed independently for each network and during the M-step the parameters are updated with formulas that link the networks together.
|
||
• It is not clear if this is VEM due to using mini batches or Stochastic Variational Inference (SVI)
|
||
• What happens when a block is absent in network m ?
|
||
Section 4.6 is not grammatically correct and there are half sentences
|
||
Should name the region or city in Kenya the dataset came from - the network came from NOT just “Kenya” whereas the British networks list individual cities
|
||
Wild card ?? on Page 15
|
||
Is the penalty function correct? What is the potential for model selection to get stuck locally if it starts too far away from the initial structure? How are you addressing this?
|
||
|
||
5. Analyzing real-world ecological networks
|
||
Please be direct with your contribution “In a preliminary study (results not shown here), we..” don’t do this! If this is required add it to the supplementary.
|
||
You mention that the preferred configuration is to split the networks into two partitions (Kenya vs. UK) because that provides the best BIC-L of -9466.866 but the authors claim that \pi\rho-colBiSBM leads to keeping together the five networks. This is a contradiction.
|
||
A ground-truth model needs to be fitted to ensure that the specified biological traits are not simply a result of overfitting.
|
||
It does not seem consistent to use a Variational Graph AutoEncoder (VGAE) with integer node features for this type of data - another feature should be considered
|
||
|
||
|
||
General comments
|
||
Please consider and compare to this approach: https://proceedings.neurips.cc/paper_files/paper/2018/file/ab7314887865c4265e896c6e209d1cd6-Paper.pdf
|
||
This is a potentially relevant and interesting contribution
|
||
Thank you for your submission. I hope these comments enable you to revise the work successfully.
|
||
|
||
## TODO
|
||
|
||
- [x] Écrire simulation $n=300, M=3$ et $(Q_{1},Q_{2})$ varient 🆔 rwls7u ⏫ ✅ 2026-05-20
|
||
- [ ] Finir la review
|
||
- [ ] Écrire document de réponse aux reviewers ⛔ rwls7u ⏫ 📅 2026-05-22
|
||
- [-] Créer une branche reviewer1 et reviewer1 ❌ 2026-05-19
|
||
- [x] Commencer à intégrer les reviews de forme ✅ 2026-05-29
|
||
- [x] Trouver les coquilles et les phrases non finies ✅ 2026-05-29
|
||
- [x] Écrire fonction vérifiant l'identifiabilité des simulations fournies 🛫 2026-05-19 ✅ 2026-05-19
|
||
- [ ] Faire tourner les simulations (en cours de débug) ⛔ rwls7u
|
||
- [ ] Besoin de débugger Migale 🔺 ⛔ ckx0ew
|
||
- [x] Lire le papier de [[@neumannBipartiteStochasticBlock2018]] 🆔 d2pqyo ✅ 2026-06-01
|
||
- [x] Lire le papier ArXiv de Pierre sur la taille pour ajouter Neumann en conclusion. ✅ 2026-06-09
|
||
- [ ] Ajouter les VGAEs dans le dépôt code-colBiSBM 🆔 hszuud ⏫ ⛔ ej7w4j
|
||
- [ ] Reprendre les VGAE sur Baldock et faire tourner avec : 🆔 p0n5me ⛔ hszuud ⏫
|
||
+ une constante
|
||
+ Le degré corrigé des NAs
|
||
- [x] Il faut qu'on discute de comment on parle du point 1 du reviewer 1 sur comment adoucir l'hypothèse des $\alpha$ ✅ 2026-06-09
|
||
- [x] Ajouter le paragraphe sur la comparaison biSBM et VGAE ✅ 2026-05-29
|
||
- [x] Ajouter des références sur l'utilisation en écologie des modèles à blocs latents ✅ 2026-06-10
|
||
- [ ] Agrandir la taille des figures de l'article
|