```{r libraries, echo = FALSE, include = FALSE} require("ggplot2") require("ggokabeito") require("tidyr") require("dplyr") require("patchwork") require("latex2exp") ``` \section{Network clustering of simulated networks}\label{sec:network-clustering-of-simulated-networks} ```{r impoting-data, echo = FALSE} filenames <- list.files( path = "./data", pattern = "simulated_collection_data_clustering_*", full.names = TRUE ) # data_list <- lapply(filenames, function(file) lapply(readRDS(file), function(model) model$list_clustering)) df_netclust <- do.call("rbind", lapply(filenames, readRDS)) df_netclust$model <- factor(df_netclust$model, levels = c( "iid", "pi", "rho", "pirho" )) ``` \paragraph{Simulation settings} For all models we simulate $M = 9$ networks with $\forall m \in \{ 1 \dots M \} , n^m_1 = n^m_2 = 75$ with $Q_1 = Q_2 = 3$. For the simulations the proportions are the following: \begin{align*} \bm{\pi}^1 = \left( 0.2, 0.3, 0.5 \right) & & \bm{\rho}^1 = \left( 0.2, 0.3, 0.5 \right) \end{align*} and for all $m = 2,\dots,9$ \begin{align*} \bm{\pi}^m = \begin{cases} \bm{\pi}^1 & \text{for } iid\text{-}colBiSBM \\ \sigma^1_m(\bm{\pi}^1) & \text{for } \pi\text{-}colBiSBM \text{ and } \pi\rho\text{-}colBiSBM \end{cases}\\ \bm{\rho}^m = \begin{cases} \bm{\rho}^1 & \text{for } iid\text{-}colBiSBM \\ \sigma^2_m(\bm{\rho}^1) & \text{for } \rho\text{-}colBiSBM \text{ and } \pi\rho\text{-}colBiSBM \end{cases} \end{align*} where $\sigma^1_m$ and $\sigma^2_m$ are permutations of {1, 2, 3} proper to network $m$ and $\sigma^1 (\pi)= {(\pi_{\sigma^1 (i)})}_{i=\{1,\dots,3\}}$ and $\sigma^2 (\rho)= {(\rho_{\sigma^2 (i)})}_{i=\{1,\dots,3\}}$. The networks are divided into 3 sub-collections of 3 networks with connectivity parameters as follows: \begin{align*} \bm{\alpha}^{as} = .3 + \begin{pmatrix} \epsilon & - \frac{\epsilon}{2} & - \frac{\epsilon}{2}\\ - \frac{\epsilon}{2} & \epsilon & - \frac{\epsilon}{2}\\ - \frac{\epsilon}{2} & - \frac{\epsilon}{2} & \epsilon \end{pmatrix}, && \bm{\alpha}^{cp} = .3 + \begin{pmatrix} \frac{3 \epsilon}{2} & \epsilon & \frac{\epsilon}{2}\\ \epsilon & \frac{\epsilon}{2} & 0\\ \frac{\epsilon}{2} & 0 & - \frac{\epsilon}{2} \end{pmatrix}, && \bm{\alpha}^{dis} = .3 + \begin{pmatrix} - \frac{\epsilon}{2} & \epsilon & \epsilon\\ \epsilon & - \frac{\epsilon}{2} & \epsilon\\ \epsilon & \epsilon & - \frac{\epsilon}{2} \end{pmatrix}, \end{align*} with $\epsilon \in [.1, .4]$. $\bm{\alpha}^{as}$ represents a classical assortative community structure, while $\bm{\alpha}^{cp}$ is a layered core-periphery structure with block 2 acting as a semi-core. Finally, $\bm{\alpha}^{dis}$ is a disassortative community structure with stronger connections between blocks than within blocks. If $\epsilon = 0$, the three matrices are equal and the 9 networks have the same connection structure. Increasing $\epsilon$ differentiates the 3 sub-collections of networks. ```{r netclustering-ARI-boxplot, echo = FALSE} #| dpi = 300, #| fig.asp = 0.5, #| fig.cap = "\\label{}ARI of the partition obtained by clustering in function of $\\eps$" df_netclust %>% ggplot() + aes(x = as.factor(epsilon), y = ARI) + scale_color_okabe_ito() + scale_fill_okabe_ito() + xlab(TeX("$\\epsilon$")) + guides(fill = guide_legend(title = "Model")) + ylab("ARI of obtained netclustering") + geom_boxplot(aes(fill = model)) ``` \paragraph{Results} The evaluation of our method involves a comparison between the resulting partition of the network collection and the simulated partition using the ARI index. As the value of $\epsilon$ increases, our ability to distinguish between the networks improves, and this distinction becomes nearly perfect in all setups of the $colBiSBM$.