Louis/Thèse/Axes/Phylogénie/SBM avec covariance latente.md
2026-06-09 14:45:18 +02:00

911 lines
22 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Idée du modèle
![[local_macros.tex]]
$$
\newcommand{\ilr}{\operatorname{ilr}}
\newcommand{\clr}{\operatorname{clr}}
\newcommand{\Cat}{\operatorname{Cat}}
$$
Pierre a proposé que l'on pose une structure latente sur les $\pmb{Z}$. C'est à dire
$$
\begin{aligned}
& P \sim \Normal_{n_1, K-1} (O_{n_1, K-1}, \Sigma, \sigma^2 Id_{K-1}), \\
\forall i \in \{1,\dots,n_1\}, & Z_i \mid P_i \overset{ind}{\sim} \Cat_{K} ({\ilr}^{-1}(P_i) = \pi_{1:K}^{(i)}), \\
\forall j \in \{1,\dots,n_2\}, & W_j \overset{iid}{\sim} \Cat_R (\rho_{1:R}),\\
\forall i,j \in \{1,\dots,n_1\}\times\{1,\dots,n_2\}, & Y_{ij} \mid Z_i = k, W_j = r \overset{ind}{\sim} \mathcal{F}(\alpha_{qr}),
\end{aligned}
$$
avec $\Sigma$, la matrice de variance-covariance déterminée en fonction de l'apparentement (phylogénétique) des noeuds.
```tikz
\usepackage{tikz}
\usepackage{amsmath,amssymb}
\usetikzlibrary{arrows.meta,positioning,shapes.geometric,calc}
\begin{document}
\begin{tikzpicture}[
font=\sffamily,
>=Latex,
node distance=1.5cm and 2cm,
directed/.style={-{Latex}, line width=0.8pt, draw=gray!75},
bidirected/.style={-, line width=0.8pt, draw=red!75},
base/.style={
draw=gray!70,
line width=0.9pt,
align=center,
inner sep=4pt
},
prior/.style={base,rectangle,rounded corners=1pt,fill=blue!7},
latent/.style={base,rectangle,rounded corners=6pt,fill=teal!8},
known/.style={base,diamond,aspect=1.35,fill=orange!12},
observed/.style={base,circle,fill=purple!8}
]
%--------------------------------------------------
% Hyperparameters rows
%--------------------------------------------------
\node[prior] (sigma) {$\sigma^2$};
\node[known,right=of sigma] (Sigma) {$\Sigma$};
\node[latent,below=of sigma] (Pi) {$P_i$};
\node[latent,below=of Sigma] (Pip) {$P_{i'}$};
\node[latent,below=of Pi] (Zi) {$Z_i$};
\node[latent,below=of Pip] (Zip) {$Z_{i'}$};
%--------------------------------------------------
% Hyperparameters columns
%--------------------------------------------------
\node[prior,right=4cm of Sigma] (rho) {$\rho$};
\node[latent,below left=of rho] (Wj) {$W_j$};
\node[latent,below right=of rho] (Wjp) {$W_{j'}$};
%--------------------------------------------------
% Intercept
%--------------------------------------------------
\node[prior] (alpha)
at ($(Zi)!0.5!(Wj)+(0,-3)$)
{$\alpha$};
%--------------------------------------------------
% Observations
%--------------------------------------------------
\node[observed]
at ($(Zi)!0.5!(Wj)+(0,-1.4)$)
(Yij) {$Y_{ij}$};
\node[observed]
at ($(Zi)!0.5!(Wjp)+(2.0,-1.4)$)
(Yijp) {$Y_{ij'}$};
\node[observed]
at ($(Zip)!0.5!(Wj)+(-2.0,-1.4)$)
(Yipj) {$Y_{i'j}$};
\node[observed]
at ($(Zip)!0.5!(Wjp)+(0,-1.4)$)
(Yipjp) {$Y_{i'j'}$};
%--------------------------------------------------
% Row side
%--------------------------------------------------
\draw[directed] (sigma) -- (Pi);
\draw[directed] (Sigma) -- (Pi);
\draw[directed] (sigma) -- (Pip);
\draw[directed] (Sigma) -- (Pip);
\draw[directed] (Pi) -- (Zi);
\draw[directed] (Pip) -- (Zip);
%--------------------------------------------------
% Column side
%--------------------------------------------------
\draw[directed] (rho) -- (Wj);
\draw[directed] (rho) -- (Wjp);
%--------------------------------------------------
% Likelihood
%--------------------------------------------------
\foreach \y in {Yij,Yijp}
\draw[directed] (Zi) -- (\y);
\foreach \y in {Yipj,Yipjp}
\draw[directed] (Zip) -- (\y);
\foreach \y in {Yij,Yipj}
\draw[directed] (Wj) -- (\y);
\foreach \y in {Yijp,Yipjp}
\draw[directed] (Wjp) -- (\y);
\foreach \y in {Yij,Yijp,Yipj,Yipjp}
\draw[directed] (alpha) -- (\y);
%--------------------------------------------------
% Correlation structure
%--------------------------------------------------
\draw[bidirected] (alpha) -- (Zi);
\draw[bidirected] (alpha) -- (Zip);
\draw[bidirected] (alpha) -- (Wj);
\draw[bidirected] (alpha) -- (Wjp);
\end{tikzpicture}
\end{document}
```
```tikz
\usepackage{tikz}
\usepackage{amsmath}
\usetikzlibrary{positioning,shapes.arrows, arrows.meta,shapes.geometric}
\begin{document}
\begin{tikzpicture}
\tikzset{
every path/.append style = {
arrows = ->,
> = stealth,},
every node/.append style = {
shape = circle,
draw = black,
minimum size=3em
},
latent/.style = {
fill = lightgray
},
prior/.style = {
fill = red},
moral/.style = {
dashed,
> = {}, % remove arrow tip
arrows = -, % ensure no arrows
}}
\node (y) {$Y$};
\node[latent] (z) [above left = of y] {$Z$};
\node[latent] (w) [above right = of y] {$W$};
\node[latent] (P) [above = of z] {$P$};
\node[prior] (sigma2) [above = of P] {$\sigma^2$};
\node[prior] (rho) [above = of w] {$\rho_{1:R}$};
\node[prior] (alpha) [below = of y] {$\boldsymbol{\alpha}$};
\path (z) edge (y);
\path (w) edge (y);
\path (rho) edge (w);
\path (alpha) edge (y);
\path (P) edge (z);
\path (sigma2) edge (P);
% moral
\path[moral] (z) edge (alpha);
\path[moral] (w) edge (alpha);
\path[moral] (z) edge (w);
\end{tikzpicture}
\end{document}
```
# Détails sur $\ilr$
L'*Isometric Log Ratio* est une transformation qui permet d'envoyer de façon bijective des données depuis le simplexe $\Delta_{K}$ vers $\R^{(K-1)}$. Elle se construit à partir de la transformation *Center Log Ratio* ($\clr$), définie comme:
$$
\clr(\pmb{x}) = (\log(\frac{x_j}{\sqrt[n]{\prod^{n}_{i=1} x_i}}))_{j = 1,\dots,n} = (\log(x_j)- \frac{1}{n}\sum^{n}_{i=1} \log(x_i))_{j = 1,\dots,n}.
$$
On peut ensuite définir l'ilr en prenant $\Psi$ une base orthonormée de $\Delta_{K}$:
$$
\ilr(\pmb{x}) = \clr(\pmb{x}) \Psi^{\top}
$$
Dans [[@hronImputationMissingValues2010]], les auteurs définissent la base suivante:
$$
\ilr(\pmb{x}) = (z_1,\dots,z_{K-1})^{\top}, z_j = \sqrt{\frac{K-j}{K-j+1}} \log\biggl(\frac{\sqrt[K-j]{\prod_{l=j+1}^{K} x_l}}{x_j}\biggr) = \sqrt{\frac{K-j}{K-j+1}} \bigl[\frac{1}{K-j}\sum_{l=j+1}^{K}[\log(x_l)] - \log(x_j)\bigr]
$$
D'où l'on obtient la formule de la base suivante :
$$
\begin{aligned}
\psi_j & = \sqrt{\frac{K-j}{K-j+1}} \Biggl( \underbrace{0,\dots,0}_{j-1},-1,\underbrace{\frac{1}{K-j},\dots,\frac{1}{K-j}}_{K-j}\Biggr) \\
\end{aligned}
$$
avec $j\in\{1,\dots,K-1\}$, $\Psi$ est donc de taille $K-1\times K$.
Donc on obtient la matrice :
$$
\begin{aligned}
\Psi & = \begin{pmatrix}
- \sqrt{\frac{K-1}{K}} & \frac{1}{K-1}\sqrt{\frac{K-1}{K}} & \dots & \frac{1}{K-1}\sqrt{\frac{K-1}{K}} \\
0 & - \sqrt{\frac{K-2}{K-1}} & \dots & \frac{1}{K-2}\sqrt{\frac{K-2}{K-1}} \\
\vdots & & \ddots & \vdots \\
0 & \dots & - \sqrt{\frac{1}{2}} & \sqrt{\frac{1}{2}}
\end{pmatrix}
\end{aligned}
$$
Et d'après [[@williamsSimplextoEuclideanBijectionConjugate2026]] on a que pour $z\in\mathbb{R}^K$
$$
\ilr^{-1} (z)= \operatorname{softmax}(\Psi z) = x \in \Delta_{K}
$$
# Détails sur le calcul des directions privilégiées
Avec une base orthonormée $\Psi$ on peut calculer les directions privilégiée de la transformation $\ilr^{-1}$.
On sait qu'on cherche le vecteur $v_{k}$ qui est envoyé par $\Psi^{\top}$ sur $s_{k}$ :
$$
\begin{align*}
\Psi^{\top} v_{k} & = s_{k}\\
\implies \Psi\Psi^{\top} v_{k} &= \Psi s_{k} \\
\end{align*}
$$
Mais $\Psi$ étant une base orthonormée, on sait que $\Psi\Psi^{\top} = Id$ et donc :
$$
v_{k} =\Psi s_{k}
$$
Et afin de chercher les directions particulière envoyées sur les sommets du simplexe, on cherche $s_{k}^{\top} = (-M,\dots,0,\dots, -M)$ avec le $0$ en $k$ième position.
# Lois conditionnelles pour échantillonage de Gibbs
- [ ] Écrire pour loi d'émission Poisson et Binomiale négative (une conjuguée et l'autre MH)
## Loi de $P_i\mid Z_i, P_{-i}, \sigma^2$
On va vouloir simuler selon $p(P_i\mid Z_i, P_{-i}, \sigma^2) \propto p(Z_i\mid P_i) p(P_i\mid P_{-i}, \sigma^2)$. $p(Z_i\mid P_i)$ est une loi catégorielle, nommément $Z_i \mid P_i \sim \Cat_{K} ({\ilr}^{-1}(P_i)$). Pour $P_i|P_{-i}, \sigma^2$, il faut expliciter les formules:
$$
vec(P) \sim \Normal_{n(K-1)}(0, \sigma^2 I_{K-1} \otimes \Sigma), \Omega = (\sigma^2 I_{K-1} \otimes \Sigma)^{-1} = \frac{1}{\sigma^2}I_{K-1} \otimes \Sigma^{-1}
$$
En notant:
$$
\Theta = \Sigma^{-1}, \Omega = \frac{1}{\sigma^2}I_{K-1} \otimes \Theta
$$
En utilisant la formule suivante pour les vecteurs gaussiens conditionnés:
$$
\pmb{x} \sim \Normal(\pmb{\mu}, \Sigma) \qquad x_i \mid \pmb{x_{-i}} \sim \Normal(\mu_{i} - \Sigma_{i,i}\Sigma_{i,-i}^{-1}(\pmb{x_{-i}} - \pmb{\mu_{-i}}), \Sigma_{i,i} - \Sigma_{i,-i}\Sigma_{-i-,-i}^{-1}\Sigma_{-i,i}),
$$
Ce qui donne pour notre application:
$$
P_i\mid P_{-i} \sim \Normal_{K-1}(M_i + \Sigma_{i,-i} \Sigma^{-1}_{-i,-i}(P_{-i} - M_{-i}), \sigma^2(\Sigma_{i,i} - \Sigma_{i,-i}\Sigma^{-1}_{-i,-i}\Sigma_{-i,i})I_{K-1})
$$
On peut alors ici faire du *Metropolis within Gibbs* pour simuler la loi $P_i\mid P_{-i}, Z_i, \sigma^2$.
Pour tout $i$ :
```pseudo
\begin{algorithm}
\begin{algorithmic}
\State $P_i^c \gets \sim P_i\mid P_{-i}^{(t)}$
\State $\alpha \gets \frac{p(Z_i\mid P_i^c)}{p(Z_i\mid P_i^{(t)})}$, le taux d'acceptation. Simplification grâce au choix du noyau de transition.
\State $u \gets \sim \mathcal{U}(0,1)$
\If{$u\leq \alpha$}
\State $P_i^{(t+1)} \gets P_i^c$
\Else
\State $P_i^{(t+1)} \gets P_i^{(t)}$
\EndIf
\end{algorithmic}
\caption{Metropolis within Gibbs}
\end{algorithm}
```
## Loi de $\sigma^2\mid P$
$\DeclareMathOperator{\Inv}{Inv}$
On a un prior $\Inv-\Gamma$ sur $\sigma^{2}$ et on sait qu'il est conjugué avec la loi normale et il l'est aussi avec la loi normale matricielle (se voit grâce à la définition vectorisée de la loi normale matricielle).
La vraisemblance des $P$ et le prior sur $\sigma^2$
$$
\begin{align*}
p(P\mid \sigma^2) &= \frac{\exp\left( -\frac{1}{2\sigma^2} Tr((P-M)^{\top}\Sigma^{-1}(P-M)\right)}{(2\pi)^{n(K-1)/2}(\sigma^{2})^{n/2}|\Sigma|^\frac{(K-1)}{2}}\\
p(\sigma^2|\alpha_{0},\beta_{0}) & = \frac{\beta_{0}^{\alpha_{0}}}{\Gamma(\alpha_{0})} \left( \frac{1}{\sigma²}\right)^{\alpha_{0}+1} \exp\left( -\frac{\beta_{0}}{\sigma²} \right)
\end{align*}
$$
Et alors on a par le théorème de Bayes:
$$
\begin{align*}
p(\sigma^2 \mid P)
\propto
\left(\frac{1}{\sigma^2}\right)^{\boxed{\alpha_0+\frac{n}{2}}+1}
\exp\!\left(
-\frac{1}{\sigma^2}
\left[
\boxed{\beta_0
+
\frac{1}{2}
\operatorname{Tr}\!\left(
(P-M)^{\top}\Sigma^{-1}(P-M)
\right)}
\right]
\right).
\end{align*}
$$
Avec donc
$$
\sigma^{2}\mid P \sim \Inv-\Gamma (\alpha_{0}+\frac{n}{2}, \beta_0+\frac{1}{2}
\operatorname{Tr}\!\left((P-M)^{\top}\Sigma^{-1}(P-M)\right)
$$
## Loi de $\rho\mid W$
On pose un prior Dirichlet sur $\boldsymbol{\rho}\sim Dir(\gamma_{1},\dots,\gamma_{r},\dots,\gamma_{R})$ qui est conjugué avec la loi Catégorielle:
$$
\begin{align*}
p(\boldsymbol{\rho}|\boldsymbol{\gamma}) &= \frac{1}{B(\boldsymbol{\gamma})} \prod_{r = 1}^{R}\rho_{r}^{\gamma_{r}-1}\\
p(W\mid \boldsymbol{\gamma}) &= \prod_{j=1}^{n_{2}} p(W_{j}\mid \boldsymbol{\gamma})\\
& = \prod_{j=1}^{n_{2}} \prod_{r=1}^{R} \rho_{r}^{\mathbb{1}_{W_{j}=r}}\\
p(\pmb{\rho}\mid W) &\propto p(W\mid \pmb{\rho}) p(\pmb{\rho}\mid\boldsymbol{\gamma})\\
& \propto \prod_{j=1}^{n_{2}} \prod_{r=1}^{R} \rho_{r}^{\mathbb{1}_{W_{j}=r}} \rho_{r}^{\gamma_{r}-1} = \prod_{j=1}^{n_{2}} \prod_{r=1}^{R} \rho_{r}^{\mathbb{1}_{W_{j}=r}+\gamma_{r}-1}\\
& \propto \prod_{r=1}^{R} \prod_{j=1}^{n_{2}} \rho_{r}^{\mathbb{1}_{W_{j}=r}+\gamma_{r}-1} = \prod_{r=1}^{R} \rho_{r}^{\sum_{j=1}^{n_{2}}(\mathbb{1}_{W_{j}=r})+\gamma_{r}-1}\\
& \propto \prod_{r=1}^{R} \rho_{r}^{\boxed{N_{r}+\gamma_{r}}-1}
\end{align*}
$$
avec $N_r = \sum_{j=1}^{n_{2}} \mathbb{1}_{W_{j}=r}$, le comptage des $W_{j}$ qui valent $r$.
Et on trouve ainsi que $\boldsymbol{\rho}\mid W\sim Dir(N_{1}+\gamma_{1},\dots,,N_{r}+\gamma_{r},\dots,N_{R}+\gamma_{R})$.
*Usuellement*, on prend $\gamma_r = \frac{1}{R}$ (Jeffreys) selon Wikipédia si proche de 0 favorise parcimonie, tandis que 1 charge uniformément.
- [ ] Tester les effets du prior sur Dirichlet
## Loi de $Z\mid\alpha ,P,Y,W$
```tikz
\usepackage{tikz}
\usepackage{amsmath,amssymb}
\usetikzlibrary{arrows.meta,positioning,shapes.geometric,calc}
\begin{document}
\begin{tikzpicture}[
font=\sffamily,
node distance=1.35cm and 2.0cm,
>=Latex,
directed/.style={-{Latex}, line width=0.8pt, draw=gray!75},
bidirected/.style={-, line width=0.8pt, draw=red!75},
base/.style={draw=gray!70, line width=0.9pt, align=center, inner sep=5pt, text=black},
prior/.style={base, rectangle, rounded corners=1pt, fill=blue!7},
latent/.style={base, rectangle, rounded corners=6pt, fill=teal!8},
known/.style={base, diamond, aspect=1.35, fill=orange!12, inner sep=2pt},
observed/.style={base, circle, fill=purple!8,inner sep=1pt}
]
% Nodes: variance and covariance hyperparameters
\node[prior] (sigma) at (0, 2.8) {$\sigma^2$};
\node[known] (Sigma) at (3.2,2.8) {$\Sigma$};
% Nodes: latent probabilities
\node[latent] (Pi) [below of = sigma] {$P_i$};
\node[latent] (Pip) [below of = Sigma] {$P_{i^{\prime}}$};
% Nodes: latent assignments
\node[latent] (Zi) [below of = Pi] {$Z_i$};
\node[latent] (Zip) [below of = Pip] {$Z_{i^{\prime}}$};
% Nodes: observed response vectors
\node[observed] (Yidot) [below of = Zi] {$Y_{i,\bullet}$};
\node[observed] (Yipdot) [below of = Zip] {$Y_{i^{\prime},\bullet}$};
% Nodes: shared weights and correlation parameters
\node[prior] (rho) at ($(Zi)!0.5!(Zip)$) {$\boldsymbol{\rho}_{1:R}$};
\node[latent] (W) [below of = rho] {$W$};
% Node: intercept/offset parameter
\node[prior] (alpha) [below of = W] {$\alpha$};
% Directed edges from hyperparameters to latent probabilities
\draw[directed] (sigma) -- (Pi);
\draw[directed] (Sigma) -- (Pi);
\draw[directed] (sigma) -- (Pip);
\draw[directed] (Sigma) -- (Pip);
% Directed edges through latent assignments to observations
\draw[directed] (Pi) -- (Zi);
\draw[directed] (Pip) -- (Zip);
\draw[directed] (Zi) -- (Yidot);
\draw[directed] (Zip) -- (Yipdot);
% Directed edges from correlation parameters through W to observations
\draw[directed] (rho) -- (W);
\draw[directed] (W) -- (Yidot);
\draw[directed] (W) -- (Yipdot);
% Directed effects of alpha on observations
\draw[directed] (alpha) -- (Yidot);
\draw[directed] (alpha) -- (Yipdot);
% Bidirectional associations involving alpha
\draw[bidirected] (alpha) -- (W);
\draw[bidirected] (alpha) -- (Zi);
\draw[bidirected] (alpha) -- (Zip);
\end{tikzpicture}
\end{document}
```
Du DAG détaillé ci-dessus on peut déduire que pour chaque $Z_{i}$ on doit regarder la loi de $Z_{i}\mid Y_{i,\bullet},\alpha,W,P_{i}$
$$
\begin{align*}
p(Z_{i}\mid Y_{i,\bullet},\alpha,W,P_{i}) &\propto p(Y_{i,\bullet}\mid Z_{i},\alpha,W,P_{i})p(Z_{i}\mid W, \alpha, P_{i}) \\
& \propto p(Y_{i,\bullet}\mid Z_{i}, \alpha, W)p(Z_{i}\mid P_{i})
\end{align*}
$$
car $Z_{i}\bot(W,\alpha)\mid P_{i}$ et $Y_{i,\bullet}\bot P_{i}\mid Z_{i}$.
On a :
$$
\begin{align*}
p(Z_{i}\mid P_{i}) & = \ilr^{-1}(P_{i}) = (\pi_{i,1},\dots,\pi_{i,k},\dots,\pi_{i,K}) \\
p(Y_{i,\bullet}\mid Z_{i}, \alpha, W) & = \prod_{j=1}^{n_{2}} \alpha_{Z_{i},W_{j}}^{Y_{ij}}(1- \alpha_{Z_{i},W_{j}})^{1-Y_{ij}} \\
p(Z_{i} = k \mid P_{i}) & = \pi_{i,k} \\
p(Y_{i,\bullet}\mid Z_{i} = k, \alpha, W) & = \prod_{j=1}^{n_{2}} \prod_{r=1}^{R}\alpha_{k,r}^{\mathbb{1}_{W_{j} = r} Y_{ij}}(1- \alpha_{k,r})^{\mathbb{1}_{W_{j} = r}(1-Y_{ij})} \\
& = \prod_{r=1}^{R} \alpha_{k,r}^{\sum_{j=1}^{n_{2}}W_{jr}Y_{ij}} (1-\alpha_{k,r})^{\sum_{j=1}^{n_{2}}W_{jr}(1-Y_{ij})}
\end{align*}
$$
En posant $R_{ir}=\sum_{j=1}^{n_{2}}W_{jr}Y_{ij}$ et $F_{ir}=\sum_{j=1}^{n_{2}}W_{jr}(1-Y_{ij})$ on définit les matrices $\mathbf{R}$ et $\mathbf{F}$ qui comptent les succès et échecs par ligne $i$ et groupe $r$.
Ce qui donne pour les $\tilde{\pi}_{i,k}$ de la posterior:
$$
\begin{align*}
\tilde{\pi}_{i,k} = p(Z_{i} = k\mid Y_{i,\bullet},\alpha,W,P_{i}) & \propto p(Y_{i,\bullet}\mid Z_{i} = k, \alpha, W, P_{i})p(Z_{i}=k\mid P_{i})\\
& \propto \pi_{i,k} \prod_{r=1}^{R} \alpha_{k,r}^{R_{ir}}(1-\alpha_{k,r})^{F_{ir}}
\end{align*}
$$
Et ainsi à la fin :
$$
Z_{i}\mid P_{i}, Y, W, \alpha \sim \Cat_{K}(\tilde{\pi}_{i,1},\dots, \tilde{\pi}_{i,K})
$$
### Implémentation
$$
\begin{align*}
\log \tilde{p_{ik}} & = \log \pi_{i,k} + \sum_{r=1}^{R} [R_{ir} \log\alpha_{k,r} + F_{ir}\log{1-\alpha_{k,r}}] \\
\tilde{\pi}_{i,k} &= \frac{\exp(\log \tilde{p}_{ik} - m_{i})}{\sum_{l=1}^{K}\exp(\log \tilde{p_{il}}-m_{i})},\quad m_{i} = \max_{l} \log p_{il}
\end{align*}
$$
Qui se simplifie encore en remarquant que $\mathbf{F} = (\mathbf{1}_{n_{1}\times n_{2}}-Y)W = \mathbf{1}_{n_{1}\times n_{2}}W - \mathbf{R}$ avec $\mathbf{1}_{n_{1}\times n_{2}}$ la matrice de 1 de taille $n_{1}\times n_{2}$. Et en posant $\mathbf{N} = (N_{r})^{\top}_{r=1,\dots,R}$ (on rappelle $N_{r} = \sum_{j=1}^{n_{2}}W_{jr}$), on a
$$
\begin{align*}
\mathbf{N} &= W^{\top}\mathbf{1}_{n_{2}},\quad \mathbf{N}^{\top} = \mathbf{1}_{n_{2}}^{\top} W\\
\mathbf{1}_{n_{1}} \mathbf{N}^{\top} &= \mathbf{1}_{n_{1}} \mathbf{1}_{n_{2}}^{\top}W = \mathbf{1}_{n_{1}\times n_{2}} W
\end{align*}
$$
Et donc:
$$
\mathbf{F}=\mathbf{1}_{n_{1}}\mathbf{N}^{\top} - \mathbf{R}
$$
Ainsi :
$$
\log \tilde{\Pi} = \log(\ilr^{-1}(P)) + \mathbf{R}\log\alpha^{\top} + \mathbf{F} \log(1-\alpha)^{\top}
$$
## Loi de $W\mid\alpha ,\rho,Y,Z$
```tikz
\usepackage{tikz}
\usepackage{amsmath,amssymb}
\usetikzlibrary{arrows.meta,positioning,shapes.geometric,calc}
\begin{document}
\begin{tikzpicture}[font=\sffamily,
node distance=1.35cm and 2.0cm,
>=Latex,
directed/.style={-{Latex}, line width=0.8pt, draw=gray!75},
bidirected/.style={-, line width=0.8pt, draw=red!75},
base/.style={draw=gray!70, line width=0.9pt, align=center, inner sep=5pt,
text=black},
prior/.style={base, rectangle, rounded corners=1pt, fill=blue!7},
latent/.style={base, rectangle, rounded corners=6pt, fill=teal!8},
known/.style={base, diamond, aspect=1.35, fill=orange!12, inner sep=2pt},
observed/.style={base, circle, fill=purple!8,inner sep=1pt}]
% Nodes: latent assignments
\node[latent] (Wj) at (0, 2.8) {$W_j$};
\node[prior] (rho) [right of = Wj] {$\rho$};
\node[latent] (Wjp) [right of = rho] {$W_{j^{\prime}}$};
\node[latent] (Z) [below of = rho] {$Z$};
% Nodes: observed response vectors
\node[observed] (Ydotj) [below of = Wj] {$Y_{\bullet,j}$};
\node[observed] (Ydotjp) [below of = Wjp] {$Y_{\bullet,j^{\prime}}$};
% Nodes: shared weights and correlation parameters
%\node[prior] (rho) at ($(Zi)!0.5!(Zip)$) {$\boldsymbol{\rho}_{1:R}$};
%\node[latent] (W) [below of = rho] {$W$};
% Node: intercept/offset parameter
\node[prior] (alpha) [below of = Z] {$\alpha$};
% Directed edges through latent assignments to observations
\draw[directed] (Z) -- (Ydotj);
\draw[directed] (Z) -- (Ydotjp);
% Directed edges from correlation parameters through W to observations
\draw[directed] (rho) -- (Wj);
\draw[directed] (rho) -- (Wjp);
\draw[directed] (Z) -- (Ydotj);
\draw[directed] (Z) -- (Ydotjp);
% Directed effects of alpha on observations
\draw[directed] (alpha) -- (Ydotj);
\draw[directed] (alpha) -- (Ydotjp);
% Bidirectional associations involving alpha
\draw[bidirected] (alpha) -- (Z);
\draw[bidirected] (alpha) -- (Wj);
\draw[bidirected] (alpha) -- (Wjp);
\end{tikzpicture}
\end{document}
```
Du DAG détaillé ci-dessus on peut déduire que pour chaque $Z_{i}$ on doit regarder la loi de $W_{j}\mid Y_{\bullet,j},\alpha,Z,\rho$
$$
\begin{align*}
p(W_{j}\mid Y_{\bullet,j},\alpha,Z,\rho) &\propto p(Y_{\bullet,j}\mid W_{j},\alpha,Z,\rho)p(W_{j}\mid Z, \alpha, \rho) \\
%& \propto p(Y_{i,\bullet}\mid Z_{i}, \alpha, W)p(Z_{i}\mid P_{i})
\end{align*}
$$
**A MODIFIER**
car $Z_{i}\bot(W,\alpha)\mid P_{i}$ et $Y_{i,\bullet}\bot P_{i}\mid Z_{i}$.
On a :
$$
\begin{align*}
p(Z_{i}\mid P_{i}) & = \ilr^{-1}(P_{i}) = (\pi_{i,1},\dots,\pi_{i,k},\dots,\pi_{i,K}) \\
p(Y_{i,\bullet}\mid Z_{i}, \alpha, W) & = \prod_{j=1}^{n_{2}} \alpha_{Z_{i},W_{j}}^{Y_{ij}}(1- \alpha_{Z_{i},W_{j}})^{1-Y_{ij}} \\
p(Z_{i} = k \mid P_{i}) & = \pi_{i,k} \\
p(Y_{i,\bullet}\mid Z_{i} = k, \alpha, W) & = \prod_{j=1}^{n_{2}} \prod_{r=1}^{R}\alpha_{k,r}^{\mathbb{1}_{W_{j} = r} Y_{ij}}(1- \alpha_{k,r})^{\mathbb{1}_{W_{j} = r}(1-Y_{ij})} \\
& = \prod_{r=1}^{R} \alpha_{k,r}^{\sum_{j=1}^{n_{2}}W_{jr}Y_{ij}} (1-\alpha_{k,r})^{\sum_{j=1}^{n_{2}}W_{jr}(1-Y_{ij})}
\end{align*}
$$
En posant $R_{ir}=\sum_{j=1}^{n_{2}}W_{jr}Y_{ij}$ et $F_{ir}=\sum_{j=1}^{n_{2}}W_{jr}(1-Y_{ij})$ on définit les matrices $\mathbf{R}$ et $\mathbf{F}$ qui comptent les succès et échecs par ligne $i$ et groupe $r$.
Ce qui donne pour les $\tilde{\pi}_{i,k}$ de la posterior:
$$
\begin{align*}
\tilde{\pi}_{i,k} = p(Z_{i} = k\mid Y_{i,\bullet},\alpha,W,P_{i}) & \propto p(Y_{i,\bullet}\mid Z_{i} = k, \alpha, W, P_{i})p(Z_{i}=k\mid P_{i})\\
& \propto \pi_{i,k} \prod_{r=1}^{R} \alpha_{k,r}^{R_{ir}}(1-\alpha_{k,r})^{F_{ir}}
\end{align*}
$$
Et ainsi à la fin :
$$
Z_{i}\mid P_{i}, Y, W, \alpha \sim \Cat_{K}(\tilde{\pi}_{i,1},\dots, \tilde{\pi}_{i,K})
$$
## Loi de $\alpha \mid Y,Z,W$
On pose un prior Beta sur chaque $\alpha_{qr}\sim Beta(a_{0},b_{0})$ qui est conjugué avec la vraisemblance:
$$
\begin{align*}
p(\alpha_{qr}) &\propto \alpha_{qr}^{a_{0}-1} (1-\alpha_{qr})^{b_{0}-1} \\
p(Y\mid\alpha_{qr}, Z, W) &= \prod_{i,j: Z_i=q, W_{j}=r} p(Y_{ij}\mid\alpha_{qr},Z_{i},W_{j}) \\
&= \prod_{i,j: Z_{i}=r,W_{j}=r} \alpha_{qr}^{Y_{ij}}(1-\alpha_{qr})^{1-Y_{ij}} \\
& = \prod_{i,j} \alpha_{qr}^{\mathbb{1}_{Z_{i}=q}\mathbb{1}_{W_{j}=r}Y_{ij}}(1-\alpha_{qr})^{\mathbb{1}_{Z_{i}=q}\mathbb{1}_{W_{j}=r}(1-Y_{ij})}\\
& = \alpha_{qr}^{S_{qr}}(1-\alpha_{qr})^{E_{qr}}\\
p(\alpha_{qr}\mid Y, Z, W) &\propto p(Y\mid\alpha_{qr}, Z, W) p(\alpha_{qr})\\
& \propto \alpha_{qr}^{\boxed{S_{qr} + a_{0}}-1} (1-\alpha_{qr})^{\boxed{E_{qr}+b_{0}} -1}\\
\end{align*}
$$
avec $S_{qr}=\sum_{i,j}\mathbb{1}_{Z_{i}=q}\mathbb{1}_{W_{j}=r}Y_{ij}$ et $E_{qr}=\sum_{i,j}\mathbb{1}_{Z_{i}=q}\mathbb{1}_{W_{j}=r}(1-Y_{ij})$ qui définissent les matrices de comptages des succès et échecs de la réalisation. Les formules de calculs sont alors $\mathbf{S}=\mathbf{Z}^{\top}\mathbf{Y}\mathbf{W}$ et $\mathbf{E}=\mathbf{Z}^{\top}(1-\mathbf{Y})\mathbf{W}$ avec $\mathbf{Z}$ la matrice de taille $n_{1}\times Q$ et $\mathbf{W}$ la matrice de taille $n_{2}\times R$ qui indiquent laquelle des classes est peuplée par l'individu $i$ (ou $j$).
Pour raccourcir on note $Z_{iq}=\mathbb{1}_{Z_{i}=q}$ et $W_{jr}=\mathbb{1}_{W_{j}=r}$ qui sont aussi les termes des matrices définies avant.
$\alpha_{qr}\sim Beta(S_{qr}+a_{0},E_{qr}+b_{0}), \quad\boldsymbol{\alpha} \sim MatBeta(\mathbf{S}+a_{0},\mathbf{E} +b_{0})$