2 Phenotypes of interest
2.1 Somite development period
Figure 2.1 shows the period data generated by pyBOAT for this study, for 100 illustrative F2 samples over 300 minutes. The same data can be represented by boxplots as shown in Figure 2.2. We experimented with using the F2 individuals’ mean period and period intercept as the phenotype of interest. The two measures are highly correlated (\(Pearson's~r =\) 0.84, \(p\) < 2.2 x 10-16), so after displaying the distributions for both measures in Figure 2.4, we only show the analysis of period intercept, as it would appear to potentially be more robust to the changes in slope that can be observed in Figure 2.1.
2.2 Unsegmented presomitic mesoderm area (PSM)
In the proceeding analyses, we also included a second phenotype of interest: the total area of the unsegmented tissue at the stage where 10-11 somites had been formed (PSM area). As the measure is simply based on the total number of pixels covered by the embryo object, we considered it to be potentially more robust than the period measurements, and therefore included it as a type of positive control for the genetic association analyses on the period phenotype. The measurements for PSM area comparing F0 Cab and Kaga strains are shown in Figure 2.3.
2.3 Comparisons between F0, F1 and F2 generations
Figure 2.4 shows the distributions of the period intercept and unsegmented PSM area phenotypes across the F0, F1 and F2 generations. In relation to the period intercept phenotype, only the Cab strain in shown for F0 because only the Cab strain carries the reporter gene, which prohibited the collection of data for Kaga using this pyBOAT method. However, from previous bright field image analyses (which did not require the reporter), we determined that the Kaga strain has a lower (i.e. faster) period than Cab by around 10 minutes (see Figure 1.1). Given these differences, the F1 generation shows the expected intermediate median between the Cab and Kaga F0 strains. We also expected to observe that the F2 generation has a similar median to the F1 generation, but with a wider variance that spans across the extremes of the two F0 parental strains.
Instead, we observed that the F2 generation has a median that is slightly slower than the median of the slower-period F0 Cab strain. These observations were unlikely to have been caused by technical issues. A possible biological explanation is that there are more genetic combinations that slow down the clock rather than speed it up (Sanchez et al. 2022; Schröter and Oates 2010). This phenomenon could be exacerbated by the Cab and Kaga strains originating from different Japanese medaka populations (southern and northern respectively), which are understood to be at the point of speciation (Katsumura et al. 2019). This slower period may therefore be driven by a biological incompatibility between their genomes in cases where they do not have a complete chromosome from each parent (as the F1 generation does). We nevertheless proceeded with the genetic analysis with a view to potentially discovering the reason for this unusual distribution.
Code
= here::here("data/F0_F1_period.xlsx")
IN_F01 = here::here("config/phenos_with_reporter_genoandpheno.csv")
IN_F2
########################
# Plotting parameters
########################
# Intercept
= c("#8D99AE", "#2b2d42")
intercept_pal
# Mean
= c("#177E89", "#084C61")
mean_pal
# PSM
= c("#D9D0DE", "#401F3E")
unsegmented_psm_area_pal
# Get lighter/darker functions
::source_gist("c5015ee666cdf8d9f7e25fa3c8063c99")
devtools
########################
# Read in file
########################
= readr::read_delim(IN_F2, delim = ";") %>%
df_f2 # add `GEN` column
::mutate(GEN = "F2")
dplyr
# Read in F0 and F1 data
= readxl::read_xlsx(IN_F01) %>%
df_f01 ::mutate(sample = fish) %>%
dplyr::mutate(GEN = dplyr::case_when(str_detect(fish, "^C") ~ "F0",
dplyrstr_detect(fish, "^K") ~ "F1"))
# Bind two data frames
= dplyr::bind_rows(df_f01, df_f2) %>%
df_all # factorise Microscope
::mutate(Microscope = factor(Microscope, levels = c("AU", "DB")))
dplyr
########################
# Kruskal-Wallis test
########################
## Difference between microscopes in period intercept for F2s
= df_all %>%
kw_df # take only F2 individuals
::filter(GEN == "F2") %>%
dplyr# pivot longer to put phenotypes values in one column
::pivot_longer(cols = c(mean, intercept, unsegmented_psm_area),
tidyrnames_to = "phenotype",
values_to = "value") %>%
::group_by(phenotype) %>%
dplyr::nest() %>%
tidyr::mutate(model = purrr::map(data,
dplyr~kruskal.test(x = .$value, g = .$Microscope))) %>%
::select(-data) %>%
dplyr::mutate(model_tidy = purrr::map(model, broom::tidy)) %>%
dplyr::unnest(model_tidy) %>%
tidyr::add_significance(p.col = "p.value") %>%
rstatix# remove model
::select(-model) %>%
dplyr# reduce to 3 digits
::mutate(p.value = signif(p.value, digits = 3)) %>%
dplyr# paste p-value with significance
::mutate(p_final = dplyr::case_when(p.value.signif == "ns" ~ paste("p =", p.value),
dplyrTRUE ~ paste("p =", p.value, p.value.signif))) %>%
# add `Microscope` column with 'DB' so that the text maps there on the plots
::mutate(Microscope = factor("DB", levels = c("AU", "DB")))
dplyr
########################
# Plot
########################
########### Intercept
= df_all %>%
intercept_fig # remove NAs
::filter(!is.na(Microscope)) %>%
dplyrggplot(aes(GEN, intercept, fill = Microscope)) +
geom_violin() +
geom_boxplot(width = 0.3) +
::geom_beeswarm(aes(GEN, intercept, colour = Microscope), size = 0.4, alpha = 0.5) +
ggbeeswarmfacet_grid(cols = vars(Microscope)) +
scale_colour_manual(values = lighter(intercept_pal, amount = 50)) +
scale_fill_manual(values = darker(intercept_pal, amount = 50)) +
::theme_cowplot() +
cowplottheme(strip.background.x = element_blank(),
strip.text.x = element_text(face = "bold")) +
xlab("generation") +
ylab("period intercept (minutes)") +
guides(fill = "none",
colour = "none") +
# add p-value
geom_text(data = kw_df %>%
::filter(phenotype == "intercept"),
dplyraes(x = "F2", y = -Inf, label = p_final,
vjust = -1
))
########### PSM
= df_all %>%
psm_fig # remove NAs
::filter(!is.na(Microscope)) %>%
dplyrggplot(aes(GEN, unsegmented_psm_area, fill = Microscope)) +
geom_violin() +
geom_boxplot(width = 0.3) +
::geom_beeswarm(aes(GEN, unsegmented_psm_area, colour = Microscope), size = 0.4, alpha = 0.5) +
ggbeeswarmfacet_grid(cols = vars(Microscope)) +
scale_colour_manual(values = lighter(unsegmented_psm_area_pal, amount = 50)) +
scale_fill_manual(values = darker(unsegmented_psm_area_pal, amount = 50)) +
::theme_cowplot() +
cowplottheme(strip.background.x = element_blank(),
strip.text.x = element_text(face = "bold")) +
xlab("generation") +
ylab("unsegmented PSM area (pixels)") +
guides(fill = "none",
colour = "none") +
# add p-value
geom_text(data = kw_df %>%
::filter(phenotype == "unsegmented_psm_area"),
dplyraes(x = "F2", y = -Inf, label = p_final,
vjust = -1
))
= cowplot::plot_grid(intercept_fig,
period_final
psm_fig,align = "hv",
nrow = 2,
labels = c("A", "B"),
label_size = 16)
period_final
Another important issue to note is that the F2 individuals were imaged using different microscopes of the same model (Zeiss LSM 780) but with different temperature control units and incubator boxes, denoted as ‘AU’ and ‘DB’.1 We noticed that there was a difference between the microscopes in their temperatures of 0.7-0.8°C, translating to a 4-minute difference in the F2 means for the period intercept measure (Kruskal-Wallis = 177.97, \(p\) = 1.34 x 10-40), and a 3.5-minute difference in the F2 means for the period mean measure (Kruskal-Wallis = 141.79, \(p\) = 1.08 x 10-32). This difference would need to be accounted for in the downstream analysis through either adjusting the phenotype prior to running the genetic association model, or by including microscope as a covariate in the model. For unsegmented PSM area, we did not find a significant difference between microscopes, so we determined that it was not necessary to control for microscope in the downstream analysis for this phenotype.
2.3.0.1 Inverse-normalisation
To resolve this difference between microscopes for the period intercept data, we elected to transform it for the F2 generation by “inverse-normalising” the period intercept within each microscope (Figure 2.5), and used this transformed phenotype for the downstream analysis. Inverse-normalisation is a rank-based normalisation approach which involves replacing the values in the phenotype vector with their rank (where ties are averaged), then converting the ranks into a normal distribution with the quantile function (Wichura 1988). The inverse-normalisation function I used for this analysis is set out in the following R
code:
= function(x) {
invnorm = rank(x)
res # The arbitrary 0.5 value is added to the denominator below
# to avoid `qnorm()` returning 'Inf' for the last-ranked value
= qnorm(res/(length(res)+0.5))
res return(res)
}
Code
# Set variables
## Debug
= here::here("config/phenos_with_reporter_genoandpheno.csv")
IN_F2 = here::here("book/plots/phenotypes/invnorm_intercept.png")
OUT_PNG = here::here("book/plots/phenotypes/invnorm_intercept.pdf")
OUT_PDF
########################
# Plotting parameters
########################
# Intercept
= c("#8D99AE", "#2b2d42")
intercept_pal
# Mean
= c("#177E89", "#084C61")
mean_pal
# PSM
= c("#D9D0DE", "#401F3E")
unsegmented_psm_area_pal
# Get lighter/darker functions
::source_gist("c5015ee666cdf8d9f7e25fa3c8063c99")
devtools
########################
# Read in file
########################
= readr::read_delim(IN_F2, delim = ";") %>%
df_f2 # add `GEN` column
::mutate(GEN = "F2") %>%
dplyr# factorise Microscope
::mutate(Microscope = factor(Microscope, levels = c("AU", "DB")))
dplyr
# Get means per microscope
= df_f2 %>%
f2_means_notrans ::filter(!is.na(Microscope)) %>%
dplyr::group_by(Microscope) %>%
dplyr::summarise(MEAN = mean(intercept, na.rm = T))
dplyr
########################
# Plot
########################
########### Histogram raw
= df_f2 %>%
raw_fig # remove NAs
::filter(!is.na(Microscope)) %>%
dplyrggplot(aes(intercept)) +
geom_histogram(aes(y = ..density.., fill = Microscope), bins = 40) +
geom_density(aes(colour = Microscope)) +
geom_vline(data = f2_means_notrans, aes(xintercept = MEAN)) +
scale_fill_manual(values = intercept_pal) +
scale_colour_manual(values = darker(intercept_pal,amount = 75)) +
::theme_cowplot() +
cowplotfacet_grid(rows = vars(GEN, Microscope)) +
xlab('period intercept') +
guides(fill = "none", colour = "none") +
labs(subtitle = "original data")
########### Histogram inverse-normalised
= df_f2 %>%
trans_df # inverse-normalise within microscope
::group_by(Microscope) %>%
dplyr::mutate(intercept = invnorm(intercept)) %>%
dplyr::ungroup() %>%
dplyr# remove NAs
::filter(!is.na(Microscope))
dplyr
# Get means per microscope
= trans_df %>%
f2_means_trans ::filter(!is.na(Microscope)) %>%
dplyr::group_by(Microscope) %>%
dplyr::summarise(MEAN = mean(intercept, na.rm = T))
dplyr
= trans_df %>%
trans_fig # plot
ggplot(aes(intercept)) +
geom_histogram(aes(y = ..density.., fill = Microscope), bins = 40) +
geom_density(aes(colour = Microscope)) +
geom_vline(data = f2_means_trans, aes(xintercept = MEAN)) +
scale_fill_manual(values = intercept_pal) +
scale_colour_manual(values = darker(intercept_pal,amount = 75)) +
::theme_cowplot() +
cowplotfacet_grid(rows = vars(GEN, Microscope)) +
xlab('period intercept (inverse-normalised)') +
guides(fill = "none", colour = "none") +
labs(subtitle = "inverse-normalised within microscope")
########### Together
= cowplot::plot_grid(raw_fig,
final
trans_fig,align = "hv",
nrow = 2,
labels = c("A", "B"),
label_size = 16)
final
‘AU’ for the Aulehla Lab microscope, and ‘DB’ for EMBL-Heidelberg’s Developmental Biology Unit microscope.↩︎