library(cellmig)
library(ggplot2)
library(ggforce)
ggplot2::theme_set(new = theme_bw(base_size = 10))
High-throughput time-lapse microscopy allows us to track thousands of cells across many wells and plates. However, these experiments generate complex data with multiple sources of variability:
Biological variability: Cells on different experimental plates (biological replicates) behave differently even under the same conditions.
Technical variability: Differences between wells on the same plate.
Batch effects: Systematic differences between plates (e.g., imaging days, reagent batches).
Standard statistical tests often ignore this hierarchical structure (cells \(\rightarrow\) wells \(\rightarrow\) plates), which can lead to false positives or missed discoveries.
The Bioconductor package cellmig addresses this by using
Bayesian hierarchical models. It separates biological signals from
technical noise and provides uncertainty-aware estimates (credible
intervals) for treatment effects. This vignette guides you through
installing cellmig, formatting your data, fitting a model, and
interpreting the results.
To install this package, start R (version “4.5”) and enter:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("cellmig")
| Column | Description | Example |
|---|---|---|
well |
Unique well ID | “w1”, “w2”, … |
plate |
Unique plate ID (Biological Replicate) | “p1”, “p2”, … |
compound |
Name of the compound/drug | “DMSO”, “DrugA”, … |
dose |
Concentration or level | “0”, “10”, “high”, … |
v |
Observed migration velocity (numeric) | 0.54 (µm/min) |
offset |
Batch correction flag (0 or 1) | 0 or 1 |
offset ColumnTo correct for plate-to-plate batch effects, you must identify a control treatment that appears on every plate (e.g., DMSO).
Set offset = 1 for cells in this control group.
Set offset = 0 for all other treatments.
Note: The model uses this group to normalize velocities across plates
We will use the simulated dataset included in the package. It contains:
3 Plates (Biological replicates)
378 Wells (Technical replicates nested in plates)
6 Compounds at 7 Doses (42 Treatment Groups)
data("d", package = "cellmig")
str(d)
FALSE 'data.frame': 7560 obs. of 6 variables:
FALSE $ well : chr "1" "1" "1" "1" ...
FALSE $ plate : chr "1" "1" "1" "1" ...
FALSE $ compound: chr "C1" "C1" "C1" "C1" ...
FALSE $ dose : chr "D1" "D1" "D1" "D1" ...
FALSE $ v : num 21.905 0.535 3.348 5.351 1.194 ...
FALSE $ offset : num 1 1 1 1 1 1 1 1 1 1 ...
head(d)
FALSE well plate compound dose v offset
FALSE 1 1 1 C1 D1 21.9054754 1
FALSE 2 1 1 C1 D1 0.5351645 1
FALSE 3 1 1 C1 D1 3.3484316 1
FALSE 4 1 1 C1 D1 5.3511913 1
FALSE 5 1 1 C1 D1 1.1942105 1
FALSE 6 1 1 C1 D1 26.7272783 1
Before modeling, it is good practice to visualize the raw data to check for obvious batch effects or outliers.
Each dot represents a single cell. Facets show different compounds. Colors indicate plates. If plate colors cluster separately within facets, you likely have batch effects.
ggplot(data = d)+
facet_wrap(facets = ~paste0("compound=", compound),
scales = "free_y", ncol = 2)+
geom_sina(aes(x = as.factor(dose), col = plate, y = v, group = well),
size = 0.5)+
theme_bw()+
theme(legend.position = "top",
strip.text.x = element_text(margin = margin(0.03,0,0.03,0, "cm")))+
ylab(label = "migration velocity")+
xlab(label = '')+
scale_color_grey()+
guides(color = guide_legend(override.aes = list(size = 3)))+
guides(shape = guide_legend(override.aes = list(size = 3)))+
scale_y_log10()+
annotation_logticks(base = 10, sides = "l")
Aggregating to the well level often makes batch effects clearer. Here, we see the mean velocity per well.
dm <- aggregate(v~well+plate+compound+dose, data = d, FUN = mean)
ggplot(data = dm)+
facet_wrap(facets = ~paste0("compound=", compound),
scales = "free_y", ncol = 2)+
geom_sina(aes(x = as.factor(dose), col = plate, y = v, group = well),
size = 1.5, alpha = 0.7)+
theme_bw()+
theme(legend.position = "top",
strip.text.x = element_text(margin = margin(0.03,0,0.03,0, "cm")))+
ylab(label = "migration velocity")+
xlab(label = '')+
scale_color_grey()+
guides(color = guide_legend(override.aes = list(size = 3)))+
guides(shape = guide_legend(override.aes = list(size = 3)))+
scale_y_log10()+
annotation_logticks(base = 10, sides = "l")
Now we fit the hierarchical Bayesian model. The cellmig() function
handles the complex Stan modeling behind the scenes.
You can adjust the MCMC (Markov Chain Monte Carlo) settings via the
control list.
mcmc_warmup: Steps to tune the sampler.
mcmc_steps: Actual sampling steps used for inference.
mcmc_chains: Number of independent chains (recommend \(\ge\) 2 for
convergence checks).
Note: For this vignette, we use fewer steps for speed. For real
analysis, increase mcmc_steps to 2000+.
o <- cellmig(x = d,
control = list(mcmc_warmup = 300, # Warmup iterations
mcmc_steps = 1000, # Sampling iterations
mcmc_chains = 2, # Number of chains
mcmc_cores = 2)) # Parallel cores
The model output o contains posterior distributions for all
parameters. We focus on the overall treatment effects (\(\delta_t\)),
which represent the change in velocity relative to the control (offset).
The delta_t data frame contains the mean, median, and 95% Highest
Density Intervals (HDI) for each treatment.
knitr::kable(o$posteriors$delta_t, digits = 2)
| group_id | mean | se_mean | sd | X2.5. | X25. | X50. | X75. | X97.5. | n_eff | Rhat | pmax | group | compound | dose | plate_id | plate |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | -0.84 | 0 | 0.07 | -0.97 | -0.89 | -0.84 | -0.80 | -0.71 | 1246.87 | 1 | 1.00 | C2|D1 | C2 | D1 | 1 | 1 |
| 2 | -0.44 | 0 | 0.07 | -0.57 | -0.49 | -0.44 | -0.39 | -0.30 | 1781.80 | 1 | 1.00 | C2|D2 | C2 | D2 | 1 | 1 |
| 3 | -0.20 | 0 | 0.08 | -0.34 | -0.25 | -0.20 | -0.15 | -0.05 | 2213.86 | 1 | 1.00 | C2|D3 | C2 | D3 | 1 | 1 |
| 4 | 0.06 | 0 | 0.07 | -0.07 | 0.02 | 0.06 | 0.11 | 0.19 | 1969.57 | 1 | 0.64 | C2|D4 | C2 | D4 | 1 | 1 |
| 5 | 0.11 | 0 | 0.07 | -0.02 | 0.07 | 0.11 | 0.16 | 0.25 | 1725.58 | 1 | 0.91 | C2|D5 | C2 | D5 | 1 | 1 |
| 6 | 0.44 | 0 | 0.07 | 0.31 | 0.40 | 0.44 | 0.49 | 0.58 | 1397.83 | 1 | 1.00 | C2|D6 | C2 | D6 | 1 | 1 |
| 7 | 0.82 | 0 | 0.07 | 0.68 | 0.77 | 0.82 | 0.86 | 0.95 | 1249.49 | 1 | 1.00 | C2|D7 | C2 | D7 | 1 | 1 |
| 8 | -0.48 | 0 | 0.07 | -0.61 | -0.53 | -0.48 | -0.43 | -0.34 | 1663.34 | 1 | 1.00 | C3|D1 | C3 | D1 | 1 | 1 |
| 9 | -0.22 | 0 | 0.07 | -0.36 | -0.27 | -0.22 | -0.17 | -0.09 | 1823.58 | 1 | 1.00 | C3|D2 | C3 | D2 | 1 | 1 |
| 10 | -0.02 | 0 | 0.07 | -0.16 | -0.06 | -0.02 | 0.03 | 0.12 | 1709.57 | 1 | 0.17 | C3|D3 | C3 | D3 | 1 | 1 |
| 11 | 0.01 | 0 | 0.07 | -0.13 | -0.04 | 0.01 | 0.06 | 0.16 | 1676.56 | 1 | 0.09 | C3|D4 | C3 | D4 | 1 | 1 |
| 12 | 0.21 | 0 | 0.07 | 0.08 | 0.17 | 0.21 | 0.26 | 0.34 | 1259.91 | 1 | 1.00 | C3|D5 | C3 | D5 | 1 | 1 |
| 13 | 0.14 | 0 | 0.07 | 0.00 | 0.09 | 0.14 | 0.18 | 0.29 | 1834.44 | 1 | 0.95 | C3|D6 | C3 | D6 | 1 | 1 |
| 14 | 0.23 | 0 | 0.07 | 0.10 | 0.19 | 0.23 | 0.28 | 0.37 | 1424.11 | 1 | 1.00 | C3|D7 | C3 | D7 | 1 | 1 |
| 15 | -0.14 | 0 | 0.07 | -0.27 | -0.18 | -0.14 | -0.09 | 0.01 | 2159.42 | 1 | 0.94 | C4|D1 | C4 | D1 | 1 | 1 |
| 16 | 0.01 | 0 | 0.07 | -0.12 | -0.03 | 0.01 | 0.05 | 0.15 | 1580.67 | 1 | 0.14 | C4|D2 | C4 | D2 | 1 | 1 |
| 17 | -0.18 | 0 | 0.07 | -0.32 | -0.23 | -0.18 | -0.14 | -0.04 | 1810.97 | 1 | 0.99 | C4|D3 | C4 | D3 | 1 | 1 |
| 18 | 0.01 | 0 | 0.07 | -0.12 | -0.04 | 0.01 | 0.06 | 0.14 | 1970.70 | 1 | 0.13 | C4|D4 | C4 | D4 | 1 | 1 |
| 19 | 0.11 | 0 | 0.07 | -0.03 | 0.06 | 0.11 | 0.15 | 0.24 | 1502.41 | 1 | 0.88 | C4|D5 | C4 | D5 | 1 | 1 |
| 20 | 0.50 | 0 | 0.07 | 0.36 | 0.45 | 0.50 | 0.55 | 0.63 | 1550.91 | 1 | 1.00 | C4|D6 | C4 | D6 | 1 | 1 |
| 21 | 0.52 | 0 | 0.07 | 0.39 | 0.47 | 0.52 | 0.56 | 0.65 | 1366.59 | 1 | 1.00 | C4|D7 | C4 | D7 | 1 | 1 |
| 22 | 0.39 | 0 | 0.07 | 0.25 | 0.34 | 0.39 | 0.43 | 0.52 | 1583.15 | 1 | 1.00 | C5|D1 | C5 | D1 | 1 | 1 |
| 23 | 0.19 | 0 | 0.07 | 0.05 | 0.14 | 0.19 | 0.24 | 0.33 | 2323.01 | 1 | 1.00 | C5|D2 | C5 | D2 | 1 | 1 |
| 24 | 0.13 | 0 | 0.07 | -0.01 | 0.08 | 0.13 | 0.17 | 0.26 | 2181.93 | 1 | 0.93 | C5|D3 | C5 | D3 | 1 | 1 |
| 25 | -0.01 | 0 | 0.07 | -0.15 | -0.05 | 0.00 | 0.04 | 0.14 | 1892.00 | 1 | 0.03 | C5|D4 | C5 | D4 | 1 | 1 |
| 26 | -0.04 | 0 | 0.07 | -0.18 | -0.09 | -0.04 | 0.01 | 0.10 | 2362.06 | 1 | 0.42 | C5|D5 | C5 | D5 | 1 | 1 |
| 27 | -0.32 | 0 | 0.07 | -0.46 | -0.37 | -0.32 | -0.28 | -0.18 | 1645.73 | 1 | 1.00 | C5|D6 | C5 | D6 | 1 | 1 |
| 28 | -0.37 | 0 | 0.07 | -0.52 | -0.42 | -0.37 | -0.32 | -0.23 | 1247.07 | 1 | 1.00 | C5|D7 | C5 | D7 | 1 | 1 |
| 29 | 0.56 | 0 | 0.07 | 0.42 | 0.51 | 0.56 | 0.61 | 0.69 | 1680.48 | 1 | 1.00 | C6|D1 | C6 | D1 | 1 | 1 |
| 30 | 0.28 | 0 | 0.07 | 0.15 | 0.23 | 0.28 | 0.33 | 0.41 | 1984.44 | 1 | 1.00 | C6|D2 | C6 | D2 | 1 | 1 |
| 31 | 0.00 | 0 | 0.07 | -0.13 | -0.05 | 0.00 | 0.05 | 0.14 | 2826.42 | 1 | 0.02 | C6|D3 | C6 | D3 | 1 | 1 |
| 32 | 0.00 | 0 | 0.07 | -0.14 | -0.05 | 0.00 | 0.05 | 0.13 | 1633.77 | 1 | 0.01 | C6|D4 | C6 | D4 | 1 | 1 |
| 33 | -0.15 | 0 | 0.07 | -0.29 | -0.20 | -0.15 | -0.11 | -0.02 | 2222.58 | 1 | 0.97 | C6|D5 | C6 | D5 | 1 | 1 |
| 34 | -0.30 | 0 | 0.07 | -0.43 | -0.35 | -0.30 | -0.25 | -0.15 | 1655.60 | 1 | 1.00 | C6|D6 | C6 | D6 | 1 | 1 |
| 35 | -0.68 | 0 | 0.07 | -0.81 | -0.73 | -0.69 | -0.64 | -0.55 | 1370.78 | 1 | 1.00 | C6|D7 | C6 | D7 | 1 | 1 |
ggplot(data = o$posteriors$delta_t) +
geom_line(aes(x = dose, y = mean, col = compound, group = compound)) +
geom_point(aes(x = dose, y = mean, col = compound)) +
geom_errorbar(aes(x = dose, y = mean, ymin = X2.5., ymax = X97.5.,
col = compound), width = 0.1) +
ylab(label = expression("Log-Fold Change ("*delta*")")) +
xlab("Dose") +
theme(legend.position = "top")
Log-scale values can be hard to interpret biologically. We often prefer Fold-Change, where 1 = no effect, >1 = increase, <1 = decrease. We calculate this by exponentiating the values (\(\exp(\delta)\)).
ggplot(data = o$posteriors$delta_t) +
geom_line(aes(x = dose, y = exp(mean), col = compound, group = compound)) +
geom_point(aes(x = dose, y = exp(mean), col = compound)) +
geom_errorbar(aes(x = dose, y = exp(mean), ymin = exp(X2.5.),
ymax = exp(X97.5.), col = compound), width = 0.1) +
ylab(label = expression("Fold Change ("*delta*"')")) +
xlab("Dose") +
theme(legend.position = "top")
Often, you want to compare specific treatments against each other (e.g., Drug A vs. Drug B), not just against the control.
The get_pairs() function computes the difference between all treatment
groups.
\(\rho\): Log-fold change difference between Treatment X and Treatment Y.
\(\pi\): Probability that the effect is truly different (0 to 1). Closer to 1 means strong evidence.
# Get pairwise comparisons (log-scale)
u <- get_pairs(x = o, exponentiate = FALSE)
# vislualize matrix of rhos
u$plot_rho
# visualize matrix of pis
u$plot_pi
# visualize volcano plot
u$plot_volcano
Tip: Set exponentiate = TRUE to get fold-change ratios instead of
log-differences.
To inspect the distribution of differences between a specific target
group and all others, use get_violins().
# View available group labels
groups <- get_groups(x = o)
head(groups)
FALSE group_id group compound dose
FALSE 1 1 C2|D1 C2 D1
FALSE 2 2 C2|D2 C2 D2
FALSE 3 3 C2|D3 C2 D3
FALSE 4 4 C2|D4 C2 D4
FALSE 5 5 C2|D5 C2 D5
FALSE 6 6 C2|D6 C2 D6
# Compare all groups against Compound 2, Dose 1
u_violin <- get_violins(x = o,
from_groups = groups$group,
to_group = "C2|D1",
exponentiate = FALSE)
u_violin$plot
It is crucial to verify that the model fits your data well. cellmig
provides Posterior Predictive Checks (PPC).
Do the simulated velocities (pink) match the observed velocities (black)? If they overlap well, the model captures the data distribution.
g <- get_ppc_violins(x = o, wrap = TRUE, ncol = 3)
g + scale_y_log10()
Does the model predict the mean velocity per well accurately? Points should lie close to the diagonal line.
g <- get_ppc_means(x = o)
g
You can inspect how much variability comes from different sources (plates, wells, treatments). This helps in planning future experiments.
# Plate-specific baseline effects
g_alpha_p <- ggplot(data = o$posteriors$alpha_p) +
geom_errorbarh(aes(y = plate, x = mean, xmin = X2.5., xmax = X97.5.),
height = 0.2) +
geom_point(aes(y = plate, x = mean)) +
xlab("Plate Effect (log-scale)")
# Variance parameters (Biological vs Technical)
g_sigma <- ggplot() +
geom_errorbarh(data = o$posteriors$sigma_bio,
aes(y = "Biological (Plate)",
x = mean, xmin = X2.5., xmax = X97.5.), height = 0.2) +
geom_errorbarh(data = o$posteriors$sigma_tech,
aes(y = "Technical (Well)",
x = mean, xmin = X2.5., xmax = X97.5.), height = 0.2) +
geom_errorbarh(data = o$posteriors$sigma_delta,
aes(y = "Treatment Variation",
x = mean, xmin = X2.5., xmax = X97.5.), height = 0.2) +
geom_point(data = o$posteriors$sigma_bio,
aes(y = "Biological (Plate)", x = mean)) +
geom_point(data = o$posteriors$sigma_tech,
aes(y = "Technical (Well)", x = mean)) +
geom_point(data = o$posteriors$sigma_delta,
aes(y = "Treatment Variation", x = mean)) +
xlab("Standard Deviation")
g_alpha_p | g_sigma
For screens with multiple compounds and overlapping doses, cellmig can
cluster compounds based on their dose-response similarity, accounting
for uncertainty.
get_dose_response_profile(x = o, exponentiate = TRUE) +
patchwork::plot_layout(widths = c(.7, 1, 2))
sessionInfo()
FALSE R version 4.6.0 RC (2026-04-17 r89917)
FALSE Platform: x86_64-pc-linux-gnu
FALSE Running under: Ubuntu 24.04.4 LTS
FALSE
FALSE Matrix products: default
FALSE BLAS: /home/biocbuild/bbs-3.24-bioc/R/lib/libRblas.so
FALSE LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
FALSE
FALSE locale:
FALSE [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
FALSE [3] LC_TIME=en_GB LC_COLLATE=C
FALSE [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
FALSE [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
FALSE [9] LC_ADDRESS=C LC_TELEPHONE=C
FALSE [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
FALSE
FALSE time zone: America/New_York
FALSE tzcode source: system (glibc)
FALSE
FALSE attached base packages:
FALSE [1] stats graphics grDevices utils datasets methods base
FALSE
FALSE other attached packages:
FALSE [1] ggforce_0.5.0 ggplot2_4.0.3 cellmig_1.3.3 BiocStyle_2.41.0
FALSE
FALSE loaded via a namespace (and not attached):
FALSE [1] ggiraph_0.9.6 tidyselect_1.2.1 dplyr_1.2.1
FALSE [4] farver_2.1.2 loo_2.9.0 S7_0.2.2
FALSE [7] fastmap_1.2.0 lazyeval_0.2.3 tweenr_2.0.3
FALSE [10] fontquiver_0.2.1 digest_0.6.39 lifecycle_1.0.5
FALSE [13] StanHeaders_2.32.10 tidytree_0.4.7 magrittr_2.0.5
FALSE [16] compiler_4.6.0 rlang_1.2.0 sass_0.4.10
FALSE [19] tools_4.6.0 yaml_2.3.12 knitr_1.51
FALSE [22] labeling_0.4.3 htmlwidgets_1.6.4 pkgbuild_1.4.8
FALSE [25] curl_7.1.0 plyr_1.8.9 RColorBrewer_1.1-3
FALSE [28] aplot_0.2.9 withr_3.0.2 purrr_1.2.2
FALSE [31] grid_4.6.0 polyclip_1.10-7 stats4_4.6.0
FALSE [34] gdtools_0.5.0 inline_0.3.21 scales_1.4.0
FALSE [37] MASS_7.3-65 tinytex_0.59 dichromat_2.0-0.1
FALSE [40] cli_3.6.6 rmarkdown_2.31 treeio_1.37.0
FALSE [43] generics_0.1.4 otel_0.2.0 RcppParallel_5.1.11-2
FALSE [46] ggtree_4.3.0 reshape2_1.4.5 ape_5.8-1
FALSE [49] cachem_1.1.0 rstan_2.32.7 stringr_1.6.0
FALSE [52] parallel_4.6.0 ggplotify_0.1.3 BiocManager_1.30.27
FALSE [55] matrixStats_1.5.0 vctrs_0.7.3 yulab.utils_0.2.4
FALSE [58] V8_8.2.0 jsonlite_2.0.0 fontBitstreamVera_0.1.1
FALSE [61] bookdown_0.46 gridGraphics_0.5-1 patchwork_1.3.2
FALSE [64] magick_2.9.1 systemfonts_1.3.2 tidyr_1.3.2
FALSE [67] jquerylib_0.1.4 glue_1.8.1 codetools_0.2-20
FALSE [70] stringi_1.8.7 gtable_0.3.6 QuickJSR_1.9.2
FALSE [73] tibble_3.3.1 pillar_1.11.1 rappdirs_0.3.4
FALSE [76] htmltools_0.5.9 R6_2.6.1 evaluate_1.0.5
FALSE [79] lattice_0.22-9 ggfun_0.2.0 fontLiberation_0.1.0
FALSE [82] bslib_0.10.0 rstantools_2.6.0 Rcpp_1.1.1-1.1
FALSE [85] gridExtra_2.3 nlme_3.1-169 xfun_0.57
FALSE [88] fs_2.1.0 pkgconfig_2.0.3