Differential expression (DE) analysis, aiming to identify genes with significant expression changes across conditions, provides insights into molecular mechanisms underlying aging, disease, and other biological processes. Traditional DE methods primarily detect changes in centrality (e.g., mean or median), but often lack power against alternative hypotheses characterized by changes in spread (e.g. variance or dispersion). Variance shifts, however, are critical in understanding regulatory dynamics and stochasticity in gene expression, particularly in contexts like aging and cellular differentiation. Moreover, in DE analysis, there is often a trade-off between statistical power and control over the false discovery rate (FDR): parametric approaches may inflate FDRs, while nonparametric methods frequently lack sufficient power.
The QRscore
package addresses these two limitations by
providing a robust framework for two-sample and K-sample tests that
detect shifts in both centrality and spread. Built upon rigorous
theoretical foundations,QRscore
extends the Mann-Whitney
test to an adaptive, rank-based approach that combines non-parametric
tests with weights informed by (zero-inflated) negative binomial models,
ensuring both high power and strictly controlled FDR.
This package is designed to complement existing tools in Bioconductor
by offering enhanced capabilities for detecting distributional shifts.
By integrating with widely-used Bioconductor packages such as
DESeq2
and leveraging parallelization
(BiocParallel
), QRscore
seamlessly integrates
into genomics workflows for differential expression and differential
dispersion analysis. This vignette demonstrates the utility of
QRscore
through a detailed example.
This vignette illustrates how to:
Preprocess and normalize bulk RNA-seq data.
Perform two-sample and three-sample differential tests using
QRscore
.
Obtain results for DEGs and DDGs.
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
# if (!requireNamespace("DESeq2", quietly = TRUE)) {
# BiocManager::install("DESeq2")
# }
# if (!requireNamespace("BiocParallel", quietly = TRUE)) {
# BiocManager::install("BiocParallel")
# }
library(QRscore)
#> Loading required package: MASS
#> Loading required package: pscl
#> Classes and Methods for R originally developed in the
#> Political Science Computational Laboratory
#> Department of Political Science
#> Stanford University (2002-2015),
#> by and under the direction of Simon Jackman.
#> hurdle and zeroinfl functions by Achim Zeileis.
#> Loading required package: arrangements
#> Loading required package: hitandrun
#> Loading required package: assertthat
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:MASS':
#>
#> select
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Loading required package: BiocParallel
library(DESeq2)
#> Loading required package: S4Vectors
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: 'generics'
#> The following object is masked from 'package:dplyr':
#>
#> explain
#> The following objects are masked from 'package:base':
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: 'BiocGenerics'
#> The following object is masked from 'package:dplyr':
#>
#> combine
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#> mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#>
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:dplyr':
#>
#> first, rename
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#>
#> Attaching package: 'IRanges'
#> The following objects are masked from 'package:dplyr':
#>
#> collapse, desc, slice
#> Loading required package: GenomicRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'matrixStats'
#> The following object is masked from 'package:dplyr':
#>
#> count
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
library(BiocParallel)
The example dataset contains 3000 randomly selected genes in RNA-seq counts data from whole blood samples in the GTEx project, along with associated metadata for age groups.
Age groups are aggregated for larger sample size and simplified analysis.
Genes with low expression or high dropout rates are excluded.
The DESeq2
package is used to normalize the filtered
gene expression matrix.
col_means <- colMeans(bulk_sparse_mat, na.rm = TRUE)
col_zeros <- colMeans(bulk_sparse_mat==0, na.rm = TRUE)
col_ids <- which(col_means>5&col_zeros<0.2) # The threshold can be modified
bulk_df = bulk_sparse_mat[,col_ids]
bulk_df_inv = t(bulk_df)
coldata = data.frame(age = ages)
dds <- DESeqDataSetFromMatrix(countData = bulk_df_inv,
colData = coldata,
design = ~ age)
dds = estimateSizeFactors(dds)
normalized_mat <- counts(dds, normalized=TRUE)
This example compares the 20-39 and 60-79 age groups.
Both mean and dispersion shifts are tested in parallel. The outputs includes ranked p-values of differential variance and differential mean tests for all genes, together with log ratio of mean shifts and variance shifts.
Genes with significant mean shifts are identified.
results_2_sample_mean = results_2_sample$mean_test
results_2_sample_DEG = results_2_sample_mean[results_2_sample_mean$QRscore_Mean_adj_p_value<0.05,]
head(results_2_sample_DEG)
#> QRscore_Mean_p_value QRscore_Mean_adj_p_value
#> ENSG00000272455.1 1.447681e-19 1.756037e-16
#> ENSG00000261326.2 8.129687e-19 4.930655e-16
#> ENSG00000249087.6 1.270984e-18 5.139011e-16
#> ENSG00000271895.2 1.812373e-18 5.496021e-16
#> ENSG00000116205.12 4.755658e-18 1.153723e-15
#> ENSG00000163155.11 7.157742e-18 1.422275e-15
#> Log_FC_Mean_60-79_vs_20-39
#> ENSG00000272455.1 -1.0379163
#> ENSG00000261326.2 -0.9286823
#> ENSG00000249087.6 -0.8174845
#> ENSG00000271895.2 -0.8803063
#> ENSG00000116205.12 -0.7140029
#> ENSG00000163155.11 -1.1847951
Genes with significant variance shifts are identified.
results_2_sample_var = results_2_sample$var_test
results_2_sample_DDG = results_2_sample_var[results_2_sample_var$QRscore_Var_adj_p_value<0.05,]
head(results_2_sample_DDG)
#> QRscore_Var_p_value QRscore_Var_adj_p_value
#> ENSG00000162692.10 6.311935e-13 7.656377e-10
#> ENSG00000189409.13 6.227351e-11 3.776888e-08
#> ENSG00000177606.6 1.475935e-10 5.967698e-08
#> ENSG00000188290.10 4.113753e-10 1.247496e-07
#> ENSG00000162383.11 2.381975e-09 5.778672e-07
#> ENSG00000269967.1 4.901528e-09 9.909256e-07
#> Log_FC_Var_60-79_vs_20-39
#> ENSG00000162692.10 2.4332081
#> ENSG00000189409.13 0.8072178
#> ENSG00000177606.6 0.2094172
#> ENSG00000188290.10 0.3225796
#> ENSG00000162383.11 2.1536069
#> ENSG00000269967.1 1.4658650
This example focuses on detecting DDGs across all age groups.
results_3_sample <- QRscore.genetest(normalized_mat, coldata$age, pairwise_test = FALSE, pairwise_logFC = FALSE, test_mean = FALSE, test_dispersion = TRUE, num_cores = 2, approx = "asymptotic")
head(results_3_sample$var_test)
#> QRscore_Var_p_value QRscore_Var_adj_p_value
#> ENSG00000189409.13 5.080165e-10 2.334420e-07
#> ENSG00000162692.10 5.207999e-10 2.334420e-07
#> ENSG00000177606.6 5.773504e-10 2.334420e-07
#> ENSG00000188290.10 1.737933e-08 5.270282e-06
#> ENSG00000162383.11 2.226853e-08 5.402345e-06
#> ENSG00000233184.6 5.256418e-08 1.062672e-05
The comprehensive output includes 3-sample test p-values as well as pairwise test p-values and fold changes.
results_3_sample <- QRscore.genetest(normalized_mat, coldata$age, pairwise_test = TRUE, pairwise_logFC = TRUE, test_mean = TRUE, test_dispersion = TRUE, num_cores = 2, approx = "asymptotic")
For detecting DDGs and DEGs, it’s recommended to use 3-sample test
p-values (namely QRscore_Mean_adj_p_value
and
QRscore_Var_adj_p_value
) with certain cutoffs
(e.g. 0.05).
results_3_sample_mean = results_3_sample$mean_test
results_3_sample_DEG = results_3_sample_mean[results_3_sample_mean$QRscore_Mean_adj_p_value<0.05,]
head(results_3_sample_DEG)
#> QRscore_Mean_p_value QRscore_Mean_adj_p_value
#> ENSG00000272455.1 2.729276e-19 2.544765e-16
#> ENSG00000116205.12 4.822300e-19 2.544765e-16
#> ENSG00000271895.2 6.293732e-19 2.544765e-16
#> ENSG00000261326.2 1.099706e-18 3.334857e-16
#> ENSG00000272078.1 2.464001e-18 5.977667e-16
#> ENSG00000162526.6 3.368610e-18 6.810207e-16
#> Pairwise_Test_Mean_40-59_vs_20-39 Log_FC_Mean_40-59_vs_20-39
#> ENSG00000272455.1 9.544097e-06 -0.4483026
#> ENSG00000116205.12 1.252520e-07 -0.3581835
#> ENSG00000271895.2 1.069600e-06 -0.4010168
#> ENSG00000261326.2 1.697696e-05 -0.3939637
#> ENSG00000272078.1 3.719349e-06 0.6447943
#> ENSG00000162526.6 4.881433e-07 -0.3306444
#> Pairwise_Test_Mean_60-79_vs_20-39 Log_FC_Mean_60-79_vs_20-39
#> ENSG00000272455.1 1.646943e-19 -1.0379163
#> ENSG00000116205.12 4.871294e-18 -0.7140029
#> ENSG00000271895.2 1.622322e-18 -0.8803063
#> ENSG00000261326.2 7.520692e-19 -0.9286823
#> ENSG00000272078.1 1.260975e-16 1.1566804
#> ENSG00000162526.6 7.980935e-18 -0.6571025
#> Pairwise_Test_Mean_60-79_vs_40-59 Log_FC_Mean_60-79_vs_40-59
#> ENSG00000272455.1 6.910217e-10 -0.5896137
#> ENSG00000116205.12 3.901535e-08 -0.3558193
#> ENSG00000271895.2 5.315290e-09 -0.4792895
#> ENSG00000261326.2 7.872455e-10 -0.5347186
#> ENSG00000272078.1 4.266551e-09 0.5118861
#> ENSG00000162526.6 5.277627e-08 -0.3264581
results_3_sample_var = results_3_sample$var_test
results_3_sample_DDG = results_3_sample_var[results_3_sample_var$QRscore_Var_adj_p_value<0.05,]
head(results_3_sample_DDG)
#> QRscore_Var_p_value QRscore_Var_adj_p_value
#> ENSG00000189409.13 5.080165e-10 2.334420e-07
#> ENSG00000162692.10 5.207999e-10 2.334420e-07
#> ENSG00000177606.6 5.773504e-10 2.334420e-07
#> ENSG00000188290.10 1.737933e-08 5.270282e-06
#> ENSG00000162383.11 2.226853e-08 5.402345e-06
#> ENSG00000233184.6 5.256418e-08 1.062672e-05
#> Pairwise_Test_Var_40-59_vs_20-39 Log_FC_Var_40-59_vs_20-39
#> ENSG00000189409.13 7.912260e-06 0.4795484
#> ENSG00000162692.10 4.748034e-02 0.3628677
#> ENSG00000177606.6 5.048536e-05 0.3483889
#> ENSG00000188290.10 1.812500e-05 -1.8237979
#> ENSG00000162383.11 6.623701e-03 1.8681958
#> ENSG00000233184.6 1.139868e-02 0.1407786
#> Pairwise_Test_Var_60-79_vs_20-39 Log_FC_Var_60-79_vs_20-39
#> ENSG00000189409.13 2.793867e-10 0.8072178
#> ENSG00000162692.10 4.334648e-13 2.4332081
#> ENSG00000177606.6 1.621052e-10 0.2094172
#> ENSG00000188290.10 3.796025e-10 0.3225796
#> ENSG00000162383.11 8.938337e-10 2.1536069
#> ENSG00000233184.6 1.171175e-05 0.4660539
#> Pairwise_Test_Var_60-79_vs_40-59 Log_FC_Var_60-79_vs_40-59
#> ENSG00000189409.13 5.480815e-03 0.3276693
#> ENSG00000162692.10 1.313411e-07 2.0703404
#> ENSG00000177606.6 1.049933e-03 -0.1389717
#> ENSG00000188290.10 3.643272e-02 2.1463775
#> ENSG00000162383.11 3.797216e-05 0.2854110
#> ENSG00000233184.6 1.921294e-05 0.3252753