Contents

library(cellmig)
library(ggplot2)
library(ggforce)
ggplot2::theme_set(new = theme_bw(base_size = 10))

1 Introduction

High-throughput time-lapse microscopy allows us to track thousands of cells across many wells and plates. However, these experiments generate complex data with multiple sources of variability:

  1. Biological variability: Cells on different experimental plates (biological replicates) behave differently even under the same conditions.

  2. Technical variability: Differences between wells on the same plate.

  3. Batch effects: Systematic differences between plates (e.g., imaging days, reagent batches).

Standard statistical tests often ignore this hierarchical structure (cells \(\rightarrow\) wells \(\rightarrow\) plates), which can lead to false positives or missed discoveries.

The Bioconductor package cellmig addresses this by using Bayesian hierarchical models. It separates biological signals from technical noise and provides uncertainty-aware estimates (credible intervals) for treatment effects. This vignette guides you through installing cellmig, formatting your data, fitting a model, and interpreting the results.

2 Installation

To install this package, start R (version “4.5”) and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("cellmig")

3 Data Structure

3.1 Required Columns

Column Description Example
well Unique well ID “w1”, “w2”, …
plate Unique plate ID (Biological Replicate) “p1”, “p2”, …
compound Name of the compound/drug “DMSO”, “DrugA”, …
dose Concentration or level “0”, “10”, “high”, …
v Observed migration velocity (numeric) 0.54 (µm/min)
offset Batch correction flag (0 or 1) 0 or 1

3.1.1 Important: The offset Column

To correct for plate-to-plate batch effects, you must identify a control treatment that appears on every plate (e.g., DMSO).

  • Set offset = 1 for cells in this control group.

  • Set offset = 0 for all other treatments.

  • Note: The model uses this group to normalize velocities across plates

3.2 Example Data

We will use the simulated dataset included in the package. It contains:

  • 3 Plates (Biological replicates)

  • 378 Wells (Technical replicates nested in plates)

  • 6 Compounds at 7 Doses (42 Treatment Groups)

data("d", package = "cellmig")
str(d)
FALSE 'data.frame': 7560 obs. of  6 variables:
FALSE  $ well    : chr  "1" "1" "1" "1" ...
FALSE  $ plate   : chr  "1" "1" "1" "1" ...
FALSE  $ compound: chr  "C1" "C1" "C1" "C1" ...
FALSE  $ dose    : chr  "D1" "D1" "D1" "D1" ...
FALSE  $ v       : num  21.905 0.535 3.348 5.351 1.194 ...
FALSE  $ offset  : num  1 1 1 1 1 1 1 1 1 1 ...
head(d)
FALSE   well plate compound dose          v offset
FALSE 1    1     1       C1   D1 21.9054754      1
FALSE 2    1     1       C1   D1  0.5351645      1
FALSE 3    1     1       C1   D1  3.3484316      1
FALSE 4    1     1       C1   D1  5.3511913      1
FALSE 5    1     1       C1   D1  1.1942105      1
FALSE 6    1     1       C1   D1 26.7272783      1

4 Exploratory Data Analysis

Before modeling, it is good practice to visualize the raw data to check for obvious batch effects or outliers.

4.1 Raw Cell Velocities

Each dot represents a single cell. Facets show different compounds. Colors indicate plates. If plate colors cluster separately within facets, you likely have batch effects.

ggplot(data = d)+
  facet_wrap(facets = ~paste0("compound=", compound), 
             scales = "free_y", ncol = 2)+
  geom_sina(aes(x = as.factor(dose), col = plate, y = v, group = well), 
            size = 0.5)+
  theme_bw()+
  theme(legend.position = "top",
        strip.text.x = element_text(margin = margin(0.03,0,0.03,0, "cm")))+
  ylab(label = "migration velocity")+
  xlab(label = '')+
  scale_color_grey()+
  guides(color = guide_legend(override.aes = list(size = 3)))+
  guides(shape = guide_legend(override.aes = list(size = 3)))+
  scale_y_log10()+
  annotation_logticks(base = 10, sides = "l")

4.2 Mean Velocity per Well

Aggregating to the well level often makes batch effects clearer. Here, we see the mean velocity per well.

dm <- aggregate(v~well+plate+compound+dose, data = d, FUN = mean)
ggplot(data = dm)+
  facet_wrap(facets = ~paste0("compound=", compound), 
             scales = "free_y", ncol = 2)+
  geom_sina(aes(x = as.factor(dose), col = plate, y = v, group = well), 
            size = 1.5, alpha = 0.7)+
  theme_bw()+
  theme(legend.position = "top",
        strip.text.x = element_text(margin = margin(0.03,0,0.03,0, "cm")))+
  ylab(label = "migration velocity")+
  xlab(label = '')+
  scale_color_grey()+
  guides(color = guide_legend(override.aes = list(size = 3)))+
  guides(shape = guide_legend(override.aes = list(size = 3)))+
  scale_y_log10()+
  annotation_logticks(base = 10, sides = "l")

5 Model Fitting

Now we fit the hierarchical Bayesian model. The cellmig() function handles the complex Stan modeling behind the scenes.

5.1 Control Parameters

You can adjust the MCMC (Markov Chain Monte Carlo) settings via the control list.

  • mcmc_warmup: Steps to tune the sampler.

  • mcmc_steps: Actual sampling steps used for inference.

  • mcmc_chains: Number of independent chains (recommend \(\ge\) 2 for convergence checks).

Note: For this vignette, we use fewer steps for speed. For real analysis, increase mcmc_steps to 2000+.

o <- cellmig(x = d,
             control = list(mcmc_warmup = 300,  # Warmup iterations
                            mcmc_steps = 1000,  # Sampling iterations
                            mcmc_chains = 2,    # Number of chains
                            mcmc_cores = 2))    # Parallel cores

6 Interpreting Results

The model output o contains posterior distributions for all parameters. We focus on the overall treatment effects (\(\delta_t\)), which represent the change in velocity relative to the control (offset).

6.1 Overall Treatment Effects (\(\delta_t\))

The delta_t data frame contains the mean, median, and 95% Highest Density Intervals (HDI) for each treatment.

knitr::kable(o$posteriors$delta_t, digits = 2)
group_id mean se_mean sd X2.5. X25. X50. X75. X97.5. n_eff Rhat pmax group compound dose plate_id plate
1 -0.84 0 0.07 -0.97 -0.89 -0.84 -0.80 -0.71 1246.87 1 1.00 C2|D1 C2 D1 1 1
2 -0.44 0 0.07 -0.57 -0.49 -0.44 -0.39 -0.30 1781.80 1 1.00 C2|D2 C2 D2 1 1
3 -0.20 0 0.08 -0.34 -0.25 -0.20 -0.15 -0.05 2213.86 1 1.00 C2|D3 C2 D3 1 1
4 0.06 0 0.07 -0.07 0.02 0.06 0.11 0.19 1969.57 1 0.64 C2|D4 C2 D4 1 1
5 0.11 0 0.07 -0.02 0.07 0.11 0.16 0.25 1725.58 1 0.91 C2|D5 C2 D5 1 1
6 0.44 0 0.07 0.31 0.40 0.44 0.49 0.58 1397.83 1 1.00 C2|D6 C2 D6 1 1
7 0.82 0 0.07 0.68 0.77 0.82 0.86 0.95 1249.49 1 1.00 C2|D7 C2 D7 1 1
8 -0.48 0 0.07 -0.61 -0.53 -0.48 -0.43 -0.34 1663.34 1 1.00 C3|D1 C3 D1 1 1
9 -0.22 0 0.07 -0.36 -0.27 -0.22 -0.17 -0.09 1823.58 1 1.00 C3|D2 C3 D2 1 1
10 -0.02 0 0.07 -0.16 -0.06 -0.02 0.03 0.12 1709.57 1 0.17 C3|D3 C3 D3 1 1
11 0.01 0 0.07 -0.13 -0.04 0.01 0.06 0.16 1676.56 1 0.09 C3|D4 C3 D4 1 1
12 0.21 0 0.07 0.08 0.17 0.21 0.26 0.34 1259.91 1 1.00 C3|D5 C3 D5 1 1
13 0.14 0 0.07 0.00 0.09 0.14 0.18 0.29 1834.44 1 0.95 C3|D6 C3 D6 1 1
14 0.23 0 0.07 0.10 0.19 0.23 0.28 0.37 1424.11 1 1.00 C3|D7 C3 D7 1 1
15 -0.14 0 0.07 -0.27 -0.18 -0.14 -0.09 0.01 2159.42 1 0.94 C4|D1 C4 D1 1 1
16 0.01 0 0.07 -0.12 -0.03 0.01 0.05 0.15 1580.67 1 0.14 C4|D2 C4 D2 1 1
17 -0.18 0 0.07 -0.32 -0.23 -0.18 -0.14 -0.04 1810.97 1 0.99 C4|D3 C4 D3 1 1
18 0.01 0 0.07 -0.12 -0.04 0.01 0.06 0.14 1970.70 1 0.13 C4|D4 C4 D4 1 1
19 0.11 0 0.07 -0.03 0.06 0.11 0.15 0.24 1502.41 1 0.88 C4|D5 C4 D5 1 1
20 0.50 0 0.07 0.36 0.45 0.50 0.55 0.63 1550.91 1 1.00 C4|D6 C4 D6 1 1
21 0.52 0 0.07 0.39 0.47 0.52 0.56 0.65 1366.59 1 1.00 C4|D7 C4 D7 1 1
22 0.39 0 0.07 0.25 0.34 0.39 0.43 0.52 1583.15 1 1.00 C5|D1 C5 D1 1 1
23 0.19 0 0.07 0.05 0.14 0.19 0.24 0.33 2323.01 1 1.00 C5|D2 C5 D2 1 1
24 0.13 0 0.07 -0.01 0.08 0.13 0.17 0.26 2181.93 1 0.93 C5|D3 C5 D3 1 1
25 -0.01 0 0.07 -0.15 -0.05 0.00 0.04 0.14 1892.00 1 0.03 C5|D4 C5 D4 1 1
26 -0.04 0 0.07 -0.18 -0.09 -0.04 0.01 0.10 2362.06 1 0.42 C5|D5 C5 D5 1 1
27 -0.32 0 0.07 -0.46 -0.37 -0.32 -0.28 -0.18 1645.73 1 1.00 C5|D6 C5 D6 1 1
28 -0.37 0 0.07 -0.52 -0.42 -0.37 -0.32 -0.23 1247.07 1 1.00 C5|D7 C5 D7 1 1
29 0.56 0 0.07 0.42 0.51 0.56 0.61 0.69 1680.48 1 1.00 C6|D1 C6 D1 1 1
30 0.28 0 0.07 0.15 0.23 0.28 0.33 0.41 1984.44 1 1.00 C6|D2 C6 D2 1 1
31 0.00 0 0.07 -0.13 -0.05 0.00 0.05 0.14 2826.42 1 0.02 C6|D3 C6 D3 1 1
32 0.00 0 0.07 -0.14 -0.05 0.00 0.05 0.13 1633.77 1 0.01 C6|D4 C6 D4 1 1
33 -0.15 0 0.07 -0.29 -0.20 -0.15 -0.11 -0.02 2222.58 1 0.97 C6|D5 C6 D5 1 1
34 -0.30 0 0.07 -0.43 -0.35 -0.30 -0.25 -0.15 1655.60 1 1.00 C6|D6 C6 D6 1 1
35 -0.68 0 0.07 -0.81 -0.73 -0.69 -0.64 -0.55 1370.78 1 1.00 C6|D7 C6 D7 1 1

6.1.1 Visualizing Effects

  • Dot: Posterior mean effect.
  • Error Bar: 95% HDI (Uncertainty range).
  • Y-axis: Log-fold change (\(\delta\)).
    • If the error bar crosses 0, there is no clear evidence of an effect.
    • Positive values: Increased migration.
    • Negative values: Decreased migration.
ggplot(data = o$posteriors$delta_t) +
  geom_line(aes(x = dose, y = mean, col = compound, group = compound)) +
  geom_point(aes(x = dose, y = mean, col = compound)) +
  geom_errorbar(aes(x = dose, y = mean, ymin = X2.5., ymax = X97.5., 
                    col = compound), width = 0.1) +
  ylab(label = expression("Log-Fold Change ("*delta*")")) +
  xlab("Dose") +
  theme(legend.position = "top")

6.2 From Log-Fold-Change to Fold-Change

Log-scale values can be hard to interpret biologically. We often prefer Fold-Change, where 1 = no effect, >1 = increase, <1 = decrease. We calculate this by exponentiating the values (\(\exp(\delta)\)).

ggplot(data = o$posteriors$delta_t) +
  geom_line(aes(x = dose, y = exp(mean), col = compound, group = compound)) +
  geom_point(aes(x = dose, y = exp(mean), col = compound)) +
  geom_errorbar(aes(x = dose, y = exp(mean), ymin = exp(X2.5.), 
                    ymax = exp(X97.5.), col = compound), width = 0.1) +
  ylab(label = expression("Fold Change ("*delta*"')")) +
  xlab("Dose") +
  theme(legend.position = "top")

7 Pairwise Comparisons

Often, you want to compare specific treatments against each other (e.g., Drug A vs. Drug B), not just against the control.

7.1 Comparison Matrix

The get_pairs() function computes the difference between all treatment groups.

  • \(\rho\): Log-fold change difference between Treatment X and Treatment Y.

  • \(\pi\): Probability that the effect is truly different (0 to 1). Closer to 1 means strong evidence.

# Get pairwise comparisons (log-scale)
u <- get_pairs(x = o, exponentiate = FALSE)

7.1.1 Visualize \(\rho\)

# vislualize matrix of rhos
u$plot_rho

7.1.2 Visualize \(\pi\)

# visualize matrix of pis
u$plot_pi

7.1.3 Visualize \(\rho\) vs. \(\pi\) with a volcano plot

# visualize volcano plot
u$plot_volcano

Tip: Set exponentiate = TRUE to get fold-change ratios instead of log-differences.

7.2 Violin Plots for Specific Comparisons

To inspect the distribution of differences between a specific target group and all others, use get_violins().

# View available group labels
groups <- get_groups(x = o)
head(groups)
FALSE   group_id group compound dose
FALSE 1        1 C2|D1       C2   D1
FALSE 2        2 C2|D2       C2   D2
FALSE 3        3 C2|D3       C2   D3
FALSE 4        4 C2|D4       C2   D4
FALSE 5        5 C2|D5       C2   D5
FALSE 6        6 C2|D6       C2   D6
# Compare all groups against Compound 2, Dose 1
u_violin <- get_violins(x = o, 
                        from_groups = groups$group,
                        to_group = "C2|D1",
                        exponentiate = FALSE)
u_violin$plot

  • Violins: Show the posterior distribution of the difference.
  • Label: Probability (\(\pi\)) that the difference is non-zero.

8 Model Diagnostics

It is crucial to verify that the model fits your data well. cellmig provides Posterior Predictive Checks (PPC).

8.1 Cell-Level Check

Do the simulated velocities (pink) match the observed velocities (black)? If they overlap well, the model captures the data distribution.

g <- get_ppc_violins(x = o, wrap = TRUE, ncol = 3)
g + scale_y_log10()

8.2 Well-Level Check

Does the model predict the mean velocity per well accurately? Points should lie close to the diagonal line.

g <- get_ppc_means(x = o)
g

9 Variance Components

You can inspect how much variability comes from different sources (plates, wells, treatments). This helps in planning future experiments.

# Plate-specific baseline effects
g_alpha_p <- ggplot(data = o$posteriors$alpha_p) +
  geom_errorbarh(aes(y = plate, x = mean, xmin = X2.5., xmax = X97.5.),
                 height = 0.2) +
  geom_point(aes(y = plate, x = mean)) +
  xlab("Plate Effect (log-scale)")

# Variance parameters (Biological vs Technical)
g_sigma <- ggplot() +
  geom_errorbarh(data = o$posteriors$sigma_bio,
                 aes(y = "Biological (Plate)",
                     x = mean, xmin = X2.5., xmax = X97.5.), height = 0.2) +
  geom_errorbarh(data = o$posteriors$sigma_tech,
                 aes(y = "Technical (Well)",
                     x = mean, xmin = X2.5., xmax = X97.5.), height = 0.2) +
  geom_errorbarh(data = o$posteriors$sigma_delta,
                 aes(y = "Treatment Variation",
                     x = mean, xmin = X2.5., xmax = X97.5.), height = 0.2) +
  geom_point(data = o$posteriors$sigma_bio,
             aes(y = "Biological (Plate)", x = mean)) +
  geom_point(data = o$posteriors$sigma_tech,
             aes(y = "Technical (Well)", x = mean)) +
  geom_point(data = o$posteriors$sigma_delta,
             aes(y = "Treatment Variation", x = mean)) +
  xlab("Standard Deviation")

g_alpha_p | g_sigma

10 Advanced: Dose-Response Profiles

For screens with multiple compounds and overlapping doses, cellmig can cluster compounds based on their dose-response similarity, accounting for uncertainty.

get_dose_response_profile(x = o, exponentiate = TRUE) +
  patchwork::plot_layout(widths = c(.7, 1, 2))

11 Session Info

sessionInfo()
FALSE R version 4.6.0 RC (2026-04-17 r89917)
FALSE Platform: x86_64-pc-linux-gnu
FALSE Running under: Ubuntu 24.04.4 LTS
FALSE 
FALSE Matrix products: default
FALSE BLAS:   /home/biocbuild/bbs-3.24-bioc/R/lib/libRblas.so 
FALSE LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
FALSE 
FALSE locale:
FALSE  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
FALSE  [3] LC_TIME=en_GB              LC_COLLATE=C              
FALSE  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
FALSE  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
FALSE  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
FALSE [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
FALSE 
FALSE time zone: America/New_York
FALSE tzcode source: system (glibc)
FALSE 
FALSE attached base packages:
FALSE [1] stats     graphics  grDevices utils     datasets  methods   base     
FALSE 
FALSE other attached packages:
FALSE [1] ggforce_0.5.0    ggplot2_4.0.3    cellmig_1.3.3    BiocStyle_2.41.0
FALSE 
FALSE loaded via a namespace (and not attached):
FALSE  [1] ggiraph_0.9.6           tidyselect_1.2.1        dplyr_1.2.1            
FALSE  [4] farver_2.1.2            loo_2.9.0               S7_0.2.2               
FALSE  [7] fastmap_1.2.0           lazyeval_0.2.3          tweenr_2.0.3           
FALSE [10] fontquiver_0.2.1        digest_0.6.39           lifecycle_1.0.5        
FALSE [13] StanHeaders_2.32.10     tidytree_0.4.7          magrittr_2.0.5         
FALSE [16] compiler_4.6.0          rlang_1.2.0             sass_0.4.10            
FALSE [19] tools_4.6.0             yaml_2.3.12             knitr_1.51             
FALSE [22] labeling_0.4.3          htmlwidgets_1.6.4       pkgbuild_1.4.8         
FALSE [25] curl_7.1.0              plyr_1.8.9              RColorBrewer_1.1-3     
FALSE [28] aplot_0.2.9             withr_3.0.2             purrr_1.2.2            
FALSE [31] grid_4.6.0              polyclip_1.10-7         stats4_4.6.0           
FALSE [34] gdtools_0.5.0           inline_0.3.21           scales_1.4.0           
FALSE [37] MASS_7.3-65             tinytex_0.59            dichromat_2.0-0.1      
FALSE [40] cli_3.6.6               rmarkdown_2.31          treeio_1.37.0          
FALSE [43] generics_0.1.4          otel_0.2.0              RcppParallel_5.1.11-2  
FALSE [46] ggtree_4.3.0            reshape2_1.4.5          ape_5.8-1              
FALSE [49] cachem_1.1.0            rstan_2.32.7            stringr_1.6.0          
FALSE [52] parallel_4.6.0          ggplotify_0.1.3         BiocManager_1.30.27    
FALSE [55] matrixStats_1.5.0       vctrs_0.7.3             yulab.utils_0.2.4      
FALSE [58] V8_8.2.0                jsonlite_2.0.0          fontBitstreamVera_0.1.1
FALSE [61] bookdown_0.46           gridGraphics_0.5-1      patchwork_1.3.2        
FALSE [64] magick_2.9.1            systemfonts_1.3.2       tidyr_1.3.2            
FALSE [67] jquerylib_0.1.4         glue_1.8.1              codetools_0.2-20       
FALSE [70] stringi_1.8.7           gtable_0.3.6            QuickJSR_1.9.2         
FALSE [73] tibble_3.3.1            pillar_1.11.1           rappdirs_0.3.4         
FALSE [76] htmltools_0.5.9         R6_2.6.1                evaluate_1.0.5         
FALSE [79] lattice_0.22-9          ggfun_0.2.0             fontLiberation_0.1.0   
FALSE [82] bslib_0.10.0            rstantools_2.6.0        Rcpp_1.1.1-1.1         
FALSE [85] gridExtra_2.3           nlme_3.1-169            xfun_0.57              
FALSE [88] fs_2.1.0                pkgconfig_2.0.3