158 lines
No EOL
6.2 KiB
Text
158 lines
No EOL
6.2 KiB
Text
# Application to \cite{doreRelativeEffectsAnthropogenic2021} data
|
|
\label{sec:application-to-dorerelativeeffectsanthropogenic2021-data}
|
|
|
|
```{r, setup, include=FALSE, warning=FALSE}
|
|
knitr::opts_chunk$set(echo = FALSE, dpi = 300)
|
|
```
|
|
|
|
```{r}
|
|
# import fix
|
|
if (getwd() == "/home/polarolouis/Nextcloud/Documents/APT/CEI/Stage Recherche Mathématiques/Depuis PC Portable/Stage MIA 2023/rapport-MIA-2023") {
|
|
path_to_add <- "Rcodes/real_data/"
|
|
} else {
|
|
path_to_add <- ""
|
|
}
|
|
```
|
|
|
|
```{r require_lib, echo = FALSE, include=FALSE, warning=FALSE}
|
|
require("tidyverse")
|
|
require("knitr")
|
|
require("colSBM")
|
|
require("ggplot2")
|
|
require("patchwork")
|
|
source(paste0(path_to_add, "temporary_plot.R"))
|
|
```
|
|
|
|
```{r better_collection_extraction, echo = FALSE, warning=FALSE}
|
|
extract_unlist_reorder <- function(clustering_data_path) {
|
|
clustering <- readRDS(clustering_data_path)
|
|
if (!is.list(clustering)) {
|
|
clustering <- list(clustering)
|
|
}
|
|
best_partition <- extract_best_bipartite_partition(clustering)
|
|
best_partition_unlist <- unlist(best_partition)
|
|
if (!is.list(best_partition_unlist)) {
|
|
best_partition_unlist <- list(best_partition_unlist)
|
|
}
|
|
size <- length(best_partition_unlist)
|
|
return(setNames(lapply(best_partition_unlist, function(collection) {
|
|
reorder_parameters(collection)
|
|
}), paste(rep("Collection", size), seq_len(size))))
|
|
}
|
|
```
|
|
|
|
```{r load data, echo = FALSE, include = FALSE, warning=FALSE}
|
|
# All results
|
|
iid_unlist <- extract_unlist_reorder(paste0(path_to_add, "data/dore_collection_clustering_nb_run1_iid_123networks_24-05-23-21:40:42.Rds"))
|
|
```
|
|
|
|
Here we apply the network clustering procedure (we refer to it as *netclustering*)
|
|
to the data from \cite{doreRelativeEffectsAnthropogenic2021}. These data are
|
|
plant-pollinator bipartite networks from differents areas and times.
|
|
|
|
In a second part we will use additional information for the networks to
|
|
try to identify the impact and correlations with the observed structures.
|
|
|
|
## Netclustering with the $iid\text{-}colBiSBM$ model
|
|
We obtained the more interpretable results with $iid\text{-}colBiSBM$ model.
|
|
This resulted in `r length(iid_unlist)` collections to partition the $M = 123$
|
|
networks.
|
|
|
|
```{r meso-plots, echo = FALSE, results='asis'}
|
|
#| fig.cap=sapply(seq_along(iid_unlist), function(idx) paste0("Collection N°", idx)),
|
|
##| fig.cap = "Structure of the collections in the partition and respective proportions of blocks",
|
|
##| fig.subcap = sapply(seq_along(iid_unlist), function(idx) paste0("Collection N°", idx)),
|
|
#| fig.asp = 0.5
|
|
|
|
|
|
meso_print <- function(unlisted_partition) {
|
|
for (idx in seq_along(unlisted_partition)) {
|
|
print(plot(unlisted_partition[[idx]], type = "meso", mixture = TRUE) + ggtitle(paste("Collection ", idx)))
|
|
cat("\\newline")
|
|
}
|
|
}
|
|
meso_print(iid_unlist)
|
|
|
|
```
|
|
|
|
In all the obtained collections the structure is the classical nested structure.
|
|
As this is a well-known structure for plant-pollinator data this tends to
|
|
indicate that we are not going in a wrong direction.
|
|
|
|
The \nth{3} collection consists of only one network, indicating that for this
|
|
model, the small76 network was really different of all the others.
|
|
One reason might be that it's the oldest network and maybe the data collection
|
|
protocol is different.
|
|
|
|
## Comparison with additional information
|
|
|
|
```{r supinfo, echo = FALSE}
|
|
supinfo <- readxl::read_xlsx(paste0(path_to_add, "data/supinfo.xlsx"), sheet = 2)
|
|
interaction_data <- read.table(file = paste0(path_to_add, "data/interaction-data.txt"), sep = "\t", header = TRUE)
|
|
|
|
seq_ids_network_aggreg <- unique(interaction_data$id_network_aggreg)
|
|
incidence_matrices <- readRDS(file = paste0(path_to_add, "data/dore-matrices.Rds"))
|
|
names_aggreg_networks <- names(incidence_matrices)
|
|
vectorClusteringNet <- numeric(nrow(supinfo))
|
|
for (k in 1:length(iid_unlist)) {
|
|
idclust <- match(iid_unlist[[k]]$net_id, names_aggreg_networks)
|
|
supinfoclust <- match(seq_ids_network_aggreg[idclust], supinfo$Idweb)
|
|
vectorClusteringNet[supinfoclust] <- k
|
|
}
|
|
```
|
|
|
|
Using supplementary information we obtain the following boxplots.
|
|
|
|
|
|
```{r boxplot-function, echo = FALSE}
|
|
supinfo_boxplot <- function(parameter, pretty_name) {
|
|
return(ggplot(supinfo) +
|
|
aes(
|
|
x = vectorClusteringNet, y = parameter,
|
|
fill = as.factor(vectorClusteringNet), group = as.factor(vectorClusteringNet)
|
|
) +
|
|
geom_boxplot() +
|
|
labs(
|
|
x = "Collection number", y = pretty_name,
|
|
fill = "Collection number"
|
|
))
|
|
}
|
|
```
|
|
|
|
```{r boxplots_annual_timespan, echo = FALSE}
|
|
#| fig.cap = "\\label{fig:boxplot-annual-time-span}Boxplot of annual time span in function of the collection number"
|
|
ggplot(supinfo) +
|
|
aes(x = vectorClusteringNet, y = Annual_time_span,
|
|
fill = as.factor(vectorClusteringNet), group = as.factor(vectorClusteringNet)) +
|
|
geom_boxplot() +
|
|
labs(
|
|
x = "Collection number", y = "Annual time span",
|
|
fill = "Collection number"
|
|
)
|
|
```
|
|
|
|
The annual time span is the number of days the sampling period lasted. So we can
|
|
thus see in figure \ref{fig:boxplot-annual-time-span} that collections 1 and 4 were
|
|
sampled for a larger period of time than collections 2 and 5.
|
|
This could explain observed differences in the structure detected : the
|
|
"checkerboard" appearance of the alpha matrices representations may represent
|
|
interactions that only occurs at a given period of time.
|
|
Thus the shorter time period doesn't capture such interactions.
|
|
|
|
```{r boxplot_rainfall, echo = FALSE}
|
|
#| fig.cap = "\\label{fig:boxplot-total-rainfall}Boxplot of total rainfall in function of the collection number"
|
|
supinfo_boxplot(supinfo$Tot_Rainfall_IPCC, "Total rainfall")
|
|
```
|
|
|
|
There seems to be the same trend for the total rainfall.
|
|
|
|
```{r boxplot_sampling_effort, echo = FALSE}
|
|
#| fig.cap = "\\label{fig:boxplot-sampling-effort}Boxplot of the sampling effort in function of the collection number"
|
|
supinfo_boxplot(supinfo$Sampling_effort, "Sampling effort")
|
|
```
|
|
|
|
The sampling effort seems to be quite higher for collection 5 and a little
|
|
higher for collection 2. And collection 1 and 4 have the inverse trend. The
|
|
separation between collections 1,4 and 2,5 seems to still hold. And the sampling
|
|
effort is related to the sampling time that is why it's higher for the
|
|
collections that were sampled for a shorter time period. |