To install the current version of scLANE from
BioConductor, run the following:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("scLANE")Alternatively, you can install the development version of
scLANE using:
Single cell RNA-seq technologies allow scientists to profile developmental processes at the cellular level. Trajectory inference comprises a broad set of methods including pseudotime estimation, RNA velocity, and graph abstraction. These methods usually seek to identify some sort of pseudotemporal ordering of cells based on their transcriptomic similarity, with the assumption that it’s possible to reconstruct the biological process being studied based on gene expression. This ordering is then used to perform trajectory differential expression (TDE) testing, wherein pseudotime is treated as a covariate and gene expression is the response. Genes with statistically significant associations with pseudotime are then investigated to determine their biological effect on the process being studied.
In order to properly characterize transcriptional dynamics across
trajectories models must be able to handle nonlinearity. This is
traditionally done using generalized additive models (GAMs), though
interpretation of that type of model is tricky and often subjective. The
scLANE method proposes a negative-binomial nonlinear model
that is piecewise linear across empirically chosen pseudotime intervals;
within each interval the model is interpretable as a classical
generalized linear model (GLM). This package is comprised of an
efficient implementation of that model for both single- and
multi-subject datasets, along with a suite of downstream analysis
utilities.
First we’ll need to load some packages & resolve a few function conflicts.
We’ll start by reading in some simulated scRNA-seq data with a simple trajectory structure that is homogeneous across several subjects.
The ground-truth pseudotime ordering is well-represented in PCA space:
PCA embedding showing ground-truth pseudotime
We don’t observe any clustering by subject ID, indicating that the trajectory is consistent across subjects.
PCA embedding showing subject ID
scLANE testingThe necessary inputs for scLANE are as follows: a
dataframe containing \(\ge 1\)
pseudotime lineages, a sequencing depth-based offset, and a matrix of
gene expression counts (this can be a SingleCellExperiment
or Seurat object, or an actual dense or sparse counts
matrix). In addition, we can optionally provide a subset of genes to
test - here we identify the top 1,000 most highly-variable genes (HVGs),
and test only those.
pt_df <- data.frame(PT = sim_counts$cell_time_normed)
cell_offset <- createCellOffset(sim_counts)
top1k_hvgs <- getTopHVGs(modelGeneVar(sim_counts), n = 1e3)The default modeling framework is based on GLMs, which we use here.
Parallel processing is turned on by default in order to speed things up.
Setting the verbose = TRUE parameter would print a progress
bar to the console. For the sake of showing an example, we are fitting
only five genes on a single core.
scLANE_models <- testDynamic(sim_counts,
pt = pt_df,
genes = top1k_hvgs[1:5],
size.factor.offset = cell_offset,
n.cores = 1,
verbose = FALSE
)## Registered S3 method overwritten by 'bit':
## method from
## print.ri gamlss
##
## Attaching package: 'MASS'
## The following object is masked _by_ '.GlobalEnv':
##
## select
## The following object is masked from 'package:glm2':
##
## crabs
## The following object is masked from 'package:dplyr':
##
## select
## scLANE testing in GLM mode completed for 5 genes across 1 lineage in 2.702 secs
After model fitting has completed, we generate a tidy table of DE
test results and statistics using the getResultsDE()
function. Each gene is given a binary dynamic vs. static classification
(denoted as 1 or 0, respectively) over a given lineage / the trajectory
as a whole if it’s adjusted p-value is less than a specified
threshold; the default is \(\alpha =
0.01\).
scLANE_de_res <- getResultsDE(scLANE_models)
select(scLANE_de_res, Gene, Test_Stat, P_Val, P_Val_Adj, Gene_Dynamic_Overall) %>%
slice_sample(n = 5) %>%
knitr::kable(
digits = 3,
caption = "Sample of scLANE TDE statistics",
col.names = c("Gene", "LRT Stat.", "P-value", "Adj. P-value", "Pred. Status")
)| Gene | LRT Stat. | P-value | Adj. P-value | Pred. Status |
|---|---|---|---|---|
| ACLY | 489.493 | 0 | 0 | 1 |
| ABHD14B | 666.422 | 0 | 0 | 1 |
| ADAMTSL2 | 633.573 | 0 | 0 | 1 |
| ADRM1 | 261.477 | 0 | 0 | 1 |
| ACAA2 | 715.468 | 0 | 0 | 1 |
The package offers a variety of different downstream analysis functionalities, several of which we’ll explore here.
The plotModels() function allows us to compare different
modeling frameworks. Here we visualize the most significant gene and
compare our model with a GLM and a GAM. The monotonic GLM fails to
capture the inherent nonlinearity of the gene’s dynamics over
pseudotime, and while the GAM does capture that trend it lacks the
interpretability of the scLANE model.
plotModels(scLANE_models,
gene = scLANE_de_res$Gene[1],
pt = pt_df,
expr.mat = sim_counts,
size.factor.offset = cell_offset,
plot.glm = TRUE,
plot.gam = TRUE
)## Ignoring unknown labels:
## • fill : "Lineage"
Modeling framework comparison
Each gene’s output contains a slot named Gene_Dynamics
that summarizes the coefficients across each pseudotime interval.
## Gene Lineage Breakpoint1 Breakpoint2 Slope.Segment1 Slope.Segment2
## 1 ACAA2 A 0.2409 0.7244 0 -1.60855
## Slope.Segment3 Trend.Segment1 Trend.Segment2 Trend.Segment3
## 1 -10.1353 0 -1 -1
Using the getFittedValues() function we can generate a
table of per-gene, per-cell expression estimations on a variety of
scales (raw, depth-normalized, and log1p-normalized). Here we visualize
the fitted dynamics for the top four most significantly TDE genes.
getFittedValues(scLANE_models,
genes = scLANE_de_res$Gene[1:4],
pt = pt_df,
expr.mat = sim_counts,
size.factor.offset = cell_offset,
cell.meta.data = data.frame(cluster = sim_counts$label)
) %>%
ggplot(aes(x = pt, y = rna_log1p)) +
facet_wrap(~gene, ncol = 2) +
geom_point(aes(color = cluster),
size = 2,
alpha = 0.75,
stroke = 0
) +
geom_ribbon(aes(ymin = scLANE_ci_ll_log1p, ymax = scLANE_ci_ul_log1p),
linewidth = 0,
fill = "grey70",
alpha = 0.9
) +
geom_line(aes(y = scLANE_pred_log1p),
color = "black",
linewidth = 0.75
) +
scale_x_continuous(labels = scales::label_number(accuracy = 0.01)) +
labs(
x = "Pseudotime",
y = "Normalized Expression",
color = "Leiden"
) +
theme_scLANE() +
theme(strip.text.x = element_text(face = "italic")) +
guides(color = guide_legend(override.aes = list(alpha = 1, size = 2, stroke = 1)))Dynamics of the top 4 most TDE genes
To generate a heatmap of expression cascades, we first pull a matrix of gene dynamics for all genes that are significantly TDE.
dyn_genes <- filter(scLANE_de_res, Gene_Dynamic_Overall == 1) %>%
pull(Gene)
smoothed_counts <- smoothedCountsMatrix(scLANE_models,
size.factor.offset = cell_offset,
pt = pt_df,
genes = dyn_genes,
log1p.norm = TRUE
)Next, we set up column annotations for the cells, and order the genes
by where their peak expression occurs during pseudotime with the
sortGenesHeatmap() function.
col_anno_df <- data.frame(
cell_name = colnames(sim_counts),
leiden = as.factor(sim_counts$label),
subject = as.factor(sim_counts$subject),
pseudotime = sim_counts$cell_time_normed
) %>%
arrange(pseudotime)
gene_order <- sortGenesHeatmap(smoothed_counts$Lineage_A, pt.vec = sim_counts$cell_time_normed)
heatmap_mat <- t(scale(smoothed_counts$Lineage_A))
colnames(heatmap_mat) <- colnames(sim_counts)
heatmap_mat <- heatmap_mat[, col_anno_df$cell_name]
heatmap_mat <- heatmap_mat[gene_order, ]
col_anno <- HeatmapAnnotation(
Leiden = col_anno_df$leiden,
Subject = col_anno_df$subject,
Pseudotime = col_anno_df$pseudotime,
show_legend = TRUE,
show_annotation_name = FALSE,
gap = unit(1, "mm"),
border = TRUE
)Now we can finally plot the heatmap:
Heatmap(
matrix = heatmap_mat,
name = "Scaled\nmRNA",
col = circlize::colorRamp2(
colors = viridis::inferno(50),
breaks = seq(min(heatmap_mat), max(heatmap_mat), length.out = 50)
),
cluster_columns = FALSE,
width = 9,
height = 6,
column_title = "",
cluster_rows = FALSE,
top_annotation = col_anno,
border = TRUE,
show_column_names = FALSE,
show_row_names = FALSE,
use_raster = TRUE,
raster_by_magick = TRUE,
raster_quality = 5
)Expression cascade of dynamic genes across pseudotime
With embedGenes() we can compute a gene-level clustering
along with PCA & UMAP embeddings. This allows us to examine how
groups of genes behave, and annotate those groups based on the genes’
biological functions.
gene_embedding <- embedGenes(expm1(smoothed_counts$Lineage_A))
ggplot(gene_embedding, aes(x = umap1, y = umap2, color = leiden)) +
geom_point(
alpha = 0.75,
size = 2,
stroke = 0
) +
labs(
x = "UMAP 1",
y = "UMAP 2",
color = "Gene Cluster"
) +
theme_scLANE(umap = TRUE) +
guides(color = guide_legend(override.aes = list(size = 2, alpha = 1, stroke = 1)))Embedding & clustering of gene dynamics
We assume that each cluster of similarly-behaving genes represents a
gene program i.e., a set of genes that work together to perform a shared
task. Each program is defined by the set of genes unique to it, and
module scoring allows us to assign a per-cell numeric score for each set
of genes. The geneProgramScoring() function performs this
task using the UCell
package under the hood.
sim_counts <- geneProgramScoring(sim_counts,
genes = gene_embedding$gene,
gene.clusters = gene_embedding$leiden
)Plotting the program scores for gene cluster 1 shows that cells at the end of the trajectory have overall high expression of genes in that cluster.
Module scores for gene cluster 1
Lastly, we can perform pathway analysis on the set of dynamic genes
using the enrichDynamicGenes() function. This is built off
of the gprofiler2
package, which supports a wide variety of species and pathway
databases.
dyn_gene_enrichment <- enrichDynamicGenes(scLANE_de_res, species = "hsapiens")
filter(dyn_gene_enrichment$result, source == "GO:BP") %>%
select(term_id, term_name, p_value, source) %>%
slice_head(n = 5) %>%
knitr::kable(
digits = 3,
caption = "Trajectory pathway enrichment statistics",
col.names = c("Term ID", "Term name", "Adj. p-value", "Source")
)| Term ID | Term name | Adj. p-value | Source |
|---|---|---|---|
| GO:0006695 | cholesterol biosynthetic process | 0.039 | GO:BP |
| GO:1902653 | secondary alcohol biosynthetic process | 0.039 | GO:BP |
| GO:0016126 | sterol biosynthetic process | 0.050 | GO:BP |
| GO:1902109 | negative regulation of mitochondrial membrane permeability involved in apoptotic process | 0.113 | GO:BP |
| GO:0046165 | alcohol biosynthetic process | 0.217 | GO:BP |
## R version 4.5.2 (2025-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] broom.mixed_0.2.9.6 bigstatsr_1.6.2
## [3] MASS_7.3-65 ComplexHeatmap_2.26.1
## [5] scLANE_1.0.3 magrittr_2.0.4
## [7] glm2_1.2.1 scater_1.38.0
## [9] ggplot2_4.0.2 dplyr_1.2.0
## [11] scran_1.38.0 scuttle_1.20.0
## [13] SingleCellExperiment_1.32.0 SummarizedExperiment_1.40.0
## [15] Biobase_2.70.0 GenomicRanges_1.62.1
## [17] Seqinfo_1.0.0 IRanges_2.44.0
## [19] S4Vectors_0.48.0 BiocGenerics_0.56.0
## [21] generics_0.1.4 MatrixGenerics_1.22.0
## [23] matrixStats_1.5.0 BiocStyle_2.38.0
##
## loaded via a namespace (and not attached):
## [1] RcppAnnoy_0.0.23 splines_4.5.2 bitops_1.0-9
## [4] tibble_3.3.1 rpart_4.1.24 gamlss.data_6.0-7
## [7] lifecycle_1.0.5 Rdpack_2.6.6 edgeR_4.8.2
## [10] doParallel_1.0.17 globals_0.19.0 lattice_0.22-9
## [13] backports_1.5.0 plotly_4.12.0 limma_3.66.0
## [16] sass_0.4.10 rmarkdown_2.30 jquerylib_0.1.4
## [19] yaml_2.3.12 bigparallelr_0.3.2 metapod_1.18.0
## [22] otel_0.2.0 cowplot_1.2.0 buildtools_1.0.0
## [25] minqa_1.2.8 RColorBrewer_1.1-3 abind_1.4-8
## [28] glmmTMB_1.1.14 purrr_1.2.1 RCurl_1.98-1.17
## [31] bigassertr_0.1.7 sandwich_3.1-1 circlize_0.4.17
## [34] gbm_2.2.3 ggrepel_0.9.6 irlba_2.3.7
## [37] listenv_0.10.0 maketools_1.3.2 RSpectra_0.16-2
## [40] dqrng_0.4.1 parallelly_1.46.1 codetools_0.2-20
## [43] DelayedArray_0.36.0 tidyselect_1.2.1 shape_1.4.6.1
## [46] UCell_2.14.0 farver_2.1.2 lme4_1.1-38
## [49] ScaledMatrix_1.18.0 viridis_0.6.5 flock_0.7
## [52] jsonlite_2.0.0 GetoptLong_1.1.0 BiocNeighbors_2.4.0
## [55] survival_3.8-6 iterators_1.0.14 foreach_1.5.2
## [58] tools_4.5.2 snow_0.4-4 Rcpp_1.1.1
## [61] glue_1.8.0 gridExtra_2.3 SparseArray_1.10.8
## [64] xfun_0.56 mgcv_1.9-4 withr_3.0.2
## [67] numDeriv_2016.8-1.1 BiocManager_1.30.27 fastmap_1.2.0
## [70] ggh4x_0.3.1 boot_1.3-32 bluster_1.20.0
## [73] digest_0.6.39 rsvd_1.0.5 gamlss_5.5-0
## [76] R6_2.6.1 colorspace_2.1-2 Cairo_1.7-0
## [79] tidyr_1.3.2 data.table_1.18.2.1 geeM_0.10.1
## [82] httr_1.4.7 htmlwidgets_1.6.4 WeightSVM_1.7-16
## [85] S4Arrays_1.10.1 uwot_0.2.4 pkgconfig_2.0.3
## [88] gtable_0.3.6 S7_0.2.1 XVector_0.50.0
## [91] sys_3.4.3 furrr_0.3.1 htmltools_0.5.9
## [94] TMB_1.9.19 clue_0.3-66 scales_1.4.0
## [97] png_0.1-8 doSNOW_1.0.20 reformulas_0.4.4
## [100] knitr_1.51 rjson_0.2.23 curl_7.0.0
## [103] nlme_3.1-168 nloptr_2.2.1 cachem_1.1.0
## [106] zoo_1.8-15 GlobalOptions_0.1.3 rmio_0.4.0
## [109] parallel_4.5.2 vipor_0.4.7 pillar_1.11.1
## [112] vctrs_0.7.1 pscl_1.5.9 BiocSingular_1.26.1
## [115] ff_4.5.2 beachmat_2.26.0 cluster_2.1.8.2
## [118] beeswarm_0.4.0 gamlss.dist_6.1-1 evaluate_1.0.5
## [121] magick_2.9.0 cli_3.6.5 locfit_1.5-9.12
## [124] compiler_4.5.2 rlang_1.1.7 crayon_1.5.3
## [127] gprofiler2_0.2.4 labeling_0.4.3 ps_1.9.1
## [130] forcats_1.0.1 ggbeeswarm_0.7.3 viridisLite_0.4.3
## [133] BiocParallel_1.44.0 lazyeval_0.2.2 coop_0.6-3
## [136] glmnet_4.1-10 mpath_0.4-2.26 Matrix_1.7-4
## [139] RcppEigen_0.3.4.0.2 future_1.69.0 statmod_1.5.1
## [142] rbibutils_2.4.1 igraph_2.2.2 broom_1.0.12
## [145] bslib_0.10.0 bst_0.3-24 bit_4.6.0