Workflow_WTA

Installation

if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("GeoDiff")

Overview

This vignette demonstrates the use of the GeoDiff package on NanoString GeoMx Digital Spatial Profiler (DSP) data. This package can be used for background modeling, target and sample QC, normalization and differential expression analysis.

We’ll analyze a NanoString GeoMx DSP dataset of diseased vs healthy kidney tissue using the Human Whole Transcriptome (WTA) panel. Seven slides were analyzed, 4 diseased and 3 healthy. Regions of Interest (ROI) were focused two different parts of a kidney’s structure: tubules or glomeruli.

Data preparation

First we will load the necessary packages

library(GeoDiff)
library(dplyr)
library(ggplot2)
library(NanoStringNCTools)
library(GeomxTools)
library(Biobase)
library(reshape2)

Now let’s load our data and examine it.

data("kidney")

#Update to current NanoStringGeoMxSet version
kidney <- updateGeoMxSet(kidney)

kidney
#> NanoStringGeoMxSet (storageMode: lockedEnvironment)
#> assayData: 18642 features, 276 samples 
#>   element names: exprs 
#> protocolData
#>   sampleNames: DSP-1001250007851-H-A02.dcc DSP-1001250007851-H-A03.dcc
#>     ... DSP-1002510007866-C-H12.dcc (276 total)
#>   varLabels: FileVersion SoftwareVersion ... NTC (18 total)
#>   varMetadata: labelDescription
#> phenoData
#>   sampleNames: DSP-1001250007851-H-A02.dcc DSP-1001250007851-H-A03.dcc
#>     ... DSP-1002510007866-C-H12.dcc (276 total)
#>   varLabels: construct read_pattern ... nuclei (10 total)
#>   varMetadata: labelDescription
#> featureData
#>   featureNames: RTS0020877 RTS0020878 ... RTS0052009 (18642 total)
#>   fvarLabels: RTS_ID TargetName ... Negative (6 total)
#>   fvarMetadata: labelDescription
#> experimentData: use 'experimentData(object)'
#> Annotation: TAP_H_WTA_v1.0.pkc 
#> signature: none
#> feature: Probe
#> analyte: RNA

head(pData(kidney))
#>                             construct read_pattern expected_neg slide name
#> DSP-1001250007851-H-A02.dcc directPCR         2x27            0   disease3
#> DSP-1001250007851-H-A03.dcc directPCR         2x27            0   disease3
#> DSP-1001250007851-H-A04.dcc directPCR         2x27            0   disease3
#> DSP-1001250007851-H-A05.dcc directPCR         2x27            0   disease3
#> DSP-1001250007851-H-A06.dcc directPCR         2x27            0   disease3
#> DSP-1001250007851-H-A07.dcc directPCR         2x27            0   disease3
#>                             class           segment     area     region
#> DSP-1001250007851-H-A02.dcc   DKD Geometric Segment 31797.59 glomerulus
#> DSP-1001250007851-H-A03.dcc   DKD Geometric Segment 16920.10 glomerulus
#> DSP-1001250007851-H-A04.dcc   DKD Geometric Segment 14312.33 glomerulus
#> DSP-1001250007851-H-A05.dcc   DKD Geometric Segment 20032.84 glomerulus
#> DSP-1001250007851-H-A06.dcc   DKD Geometric Segment 27583.26 glomerulus
#> DSP-1001250007851-H-A07.dcc   DKD Geometric Segment 18164.84 glomerulus
#>                             pathology nuclei
#> DSP-1001250007851-H-A02.dcc  abnormal  16871
#> DSP-1001250007851-H-A03.dcc  abnormal  17684
#> DSP-1001250007851-H-A04.dcc  abnormal  15108
#> DSP-1001250007851-H-A05.dcc  abnormal  15271
#> DSP-1001250007851-H-A06.dcc  abnormal  18256
#> DSP-1001250007851-H-A07.dcc  abnormal  16899

table(pData(kidney)$`slide name`)
#> 
#> disease1B disease2B  disease3  disease4  normal2B   normal3   normal4 
#>        39        37        59        24        35        59        23
table(pData(kidney)$region)
#> 
#> glomerulus     tubule 
#>        190         86

This data is stored in a NanoStringGeoMxSet object. For more examples on how to work with this data please look at vignette("Developer_Introduction_to_the_NanoStringGeoMxSet", package = "GeomxTools") or vignette("GeomxTools_RNA-NGS_Analysis", package = "GeoMxWorkflows")

In order to make the vignette run in a reasonable amount of time, we subset the data. We subset 16 ROIs with a similar distribution to the entire dataset: 8 ROIs from the disease3 and normal3 slides, 4 glomerulus and 4 tubule ROIs from each.

kidney <- kidney[, which(kidney$`slide name` %in% c("disease3", "normal3"))][, c(1:4, 48:51,
                                                                                 60:63, 115:118)]
table(kidney$region, kidney$`slide name`)
#>             
#>              disease3 normal3
#>   glomerulus        4       4
#>   tubule            4       4
table(kidney$`slide name`, kidney$class)
#>           
#>            DKD normal
#>   disease3   8      0
#>   normal3    0      8

Background Modeling

Poisson background model using negative probes.

The background model works on the probe level data with all of the negative probes. Please do not use aggregateCounts from GeomxTools before modeling.

featureType(kidney)
#> [1] "Probe"

paste("## of Negative Probes:", sum(fData(kidney)$Negative))
#> [1] "## of Negative Probes: 139"

This model estimates a feature factor for each negative probe and a background size factor for each ROI.

kidney <- fitPoisBG(kidney)
#> Iteration = 1, squared error = 3.911046e+06
#> Iteration = 2, squared error = 0.000000e+00
#> Model converged.

summary(pData(kidney)$sizefact)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> 0.01779 0.03267 0.04620 0.06250 0.07954 0.15591
summary(fData(kidney)$featfact[fData(kidney)$Negative])
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>     0.0   138.5   160.0   163.4   185.0   346.0

After running the model, we can diagnose it and see if there are any issues in the dataset. One key metric for Poisson model is the dispersion. When dispersion is big, it is called over-dispersion which often indicates batch effect or large outliers in the data.

set.seed(123)
kidney_diag <- diagPoisBG(kidney)


notes(kidney_diag)$disper
#> [1] 0.9674666

If the dispersion is >2, one of these factors might be present in the data. We can check for outlier ROIs. People can choose to set outliers to be missing values and rerun the Poisson Background model. Since the dispersion is within range here, the model will not get run.

which(assayDataElement(kidney_diag, "low_outlier") == 1, arr.ind = TRUE)
which(assayDataElement(kidney_diag, "up_outlier") == 1, arr.ind = TRUE)

Or if a batch effect is assumed, the poisson model can be adjusted to take different groups into account. Here we are grouping the ROIs by slide.

kidney <- fitPoisBG(kidney, groupvar = "slide name")
#> Iteration = 1, squared error = 7.881119e+06
#> Iteration = 2, squared error = 7.290337e-26
#> Model converged.

The diagnosis of this model shows that when splitting by slide we similar results as without splitting in this dataset.

set.seed(123)
kidney_diag <- diagPoisBG(kidney, split = TRUE)
#> The results are based on stored `groupvar`, slide name

notes(kidney_diag)$disper_sp
#> [1] 0.8865716

Aggregate function

After subsetting, we have a couple probes with 0 counts in all 16 ROIs so we will remove them here.

aggreprobe is a GeoDiff specific function for probe aggregation and filtering. Probes get filtered based on either correlation and/or the score test within targets and then aggregated. The negative probes do not get aggregated or filtered.

all0probeidx <- which(rowSums(exprs(kidney))==0)
if (length(all0probeidx) > 0) {
    kidney <- kidney[-all0probeidx, ]
}
kidney <- aggreprobe(kidney, use = "cor")

Target QC

Score test

Using the background score test, we can determine which targets are expressed above the background of the negative probes across this dataset. We can then filter the data to only targets above background, using a suggested pvalue threshold of 1e-3.

kidney <- BGScoreTest(kidney)

sum(fData(kidney)[["pvalues"]] < 1e-3, na.rm = TRUE)
#> [1] 16265

For advanced users, there are 3 variables that can be changed in the score test. The default for all three variables is FALSE. Any combination of these variables can be used.

  1. split - should the poisson background values split by group be used
  2. removeoutlier - should outlier negatives be removed
  3. useprior - use prior that the expression level of background follows a Beta distribution, this will lead to a more conservative test but is prone to influence by outliers
kidneySplit <- BGScoreTest(kidney, split = TRUE, removeoutlier = FALSE, useprior = FALSE)
#> The results are based on stored `groupvar`, slide name
sum(fData(kidneySplit)[["pvalues"]] < 1e-3, na.rm = TRUE)
#> [1] 16265

kidneyOutliers <- BGScoreTest(kidney, split = FALSE, removeoutlier = TRUE, useprior = FALSE)
#> 2 negative probes are removed prior to the score test.
sum(fData(kidneyOutliers)[["pvalues"]] < 1e-3, na.rm = TRUE)
#> [1] 16449

kidneyPrior <- BGScoreTest(kidney, split = FALSE, removeoutlier = FALSE, useprior = TRUE)
sum(fData(kidneyPrior)[["pvalues"]] < 1e-3, na.rm = TRUE)
#> [1] 15662

Estimate the size factor

To estimate the signal size factor, we use the fit negative binomial threshold function. This size factor represents technical variation between ROIs like sequencing depth

The feature_high_fitNBth labeled genes are ones well above background that will be used in later steps.

set.seed(123)

kidney <- fitNBth(kidney, split = TRUE)
#> `threshold_start` is missing. The default value is estimated based on fitPoisBG results with multiple slides.
#> Iteration = 1, squared error = 0.00302568486217242
#> Iteration = 2, squared error = 7.03053124879638e-05
#> Iteration = 3, squared error = 6.53079878320857e-06
#> Iteration = 4, squared error = 1.8323186587986e-06
#> Iteration = 5, squared error = 1.1343308086295e-06
#> Iteration = 6, squared error = 9.12467948134292e-07
#> Iteration = 7, squared error = 7.68956008901471e-07
#> Iteration = 8, squared error = 6.54062373230628e-07
#> Model converged.

features_high <- rownames(fData(kidney))[fData(kidney)$feature_high_fitNBth == 1]

length(features_high)
#> [1] 1500

We can compare this threshold to the mean of the background as a sanity check.

bgMean <- mean(fData(kidney)$featfact, na.rm = TRUE)

notes(kidney)[["threshold"]]
#> [1] 181.0033
bgMean
#> [1] 164.5362

This is a sanity check to see that the signal size factor and background size factor are correlated but not redundant.

cor(kidney$sizefact, kidney$sizefact_fitNBth)
#> [1] 0.9412545
plot(kidney$sizefact, kidney$sizefact_fitNBth, xlab = "Background Size Factor",
     ylab = "Signal Size Factor")
abline(a = 0, b = 1)

In this dataset, this size factor correlate well with different quantiles, including \(75\%\) quantile which is used in Q3 normalization.

# get only biological probes
posdat <- kidney[-which(fData(kidney)$CodeClass == "Negative"), ]
posdat <- exprs(posdat)

quan <- sapply(c(0.75, 0.8, 0.9, 0.95), function(y)
  apply(posdat, 2, function(x) quantile(x, probs = y)))

corrs <- apply(quan, 2, function(x) cor(x, kidney$sizefact_fitNBth))
names(corrs) <- c(0.75, 0.8, 0.9, 0.95)

corrs
#>      0.75       0.8       0.9      0.95 
#> 0.9807485 0.9843475 0.9840226 0.9762122

quan75 <- apply(posdat, 2, function(x) quantile(x, probs = 0.75))

Quantile range (quantile - background size factor scaled by the mean feature factor of negative probes) has better correlation with the signal size factor.

kidney <- QuanRange(kidney, split = FALSE, probs = c(0.75, 0.8, 0.9, 0.95))

corrs <- apply(pData(kidney)[, as.character(c(0.75, 0.8, 0.9, 0.95))], 2, function(x)
  cor(x, kidney$sizefact_fitNBth))

names(corrs) <- c(0.75, 0.8, 0.9, 0.95)

corrs
#>      0.75       0.8       0.9      0.95 
#> 0.9964596 0.9983635 0.9898350 0.9731675

Sample QC

To filter out poor quality ROIs, we only keep those which have a high enough signal in comparison to the background. In this dataset, all ROIs remain.

ROIs_high <- sampleNames(kidney)[which((quantile(fData(kidney)[["para"]][, 1],
                                                  probs = 0.90, na.rm = TRUE) -
                                          notes(kidney)[["threshold"]])*kidney$sizefact_fitNBth>2)]

features_all <- rownames(posdat)

DE modeling

Fixed Effect Model

Running the DE model with default values.


NBthDEmod <- fitNBthDE(form = ~region,
                       split = FALSE,
                       object = kidney)
#> Iteration = 1, squared error = 1.545817e-05
#> Iteration = 2, squared error = 1.218597e-05

str(NBthDEmod)
#> List of 12
#>  $ X            : num [1:16, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:16] "DSP-1001250007851-H-A02.dcc" "DSP-1001250007851-H-A03.dcc" "DSP-1001250007851-H-A04.dcc" "DSP-1001250007851-H-A05.dcc" ...
#>   .. ..$ : chr [1:2] "(Intercept)" "regiontubule"
#>   ..- attr(*, "assign")= int [1:2] 0 1
#>   ..- attr(*, "contrasts")=List of 1
#>   .. ..$ region: chr "contr.treatment"
#>  $ para0        : num [1:4, 1:1500] 5.75 2.35 23.13 181 7.41 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:4] "(Intercept)" "regiontubule" "r" "threshold"
#>   .. ..$ : chr [1:1500] "DDT" "FOSL1" "FKBP11" "PIM1" ...
#>  $ para         : num [1:4, 1:18439] 12.38 -1.58 3.12 181 6.61 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:4] "(Intercept)" "regiontubule" "r" "threshold"
#>   .. ..$ : chr [1:18439] "A2M" "NAT2" "ACADM" "ACADS" ...
#>  $ sizefact     : Named num [1:16] 0.0683 0.0457 0.0426 0.0549 0.0729 ...
#>   ..- attr(*, "names")= chr [1:16] "DSP-1001250007851-H-A02.dcc" "DSP-1001250007851-H-A03.dcc" "DSP-1001250007851-H-A04.dcc" "DSP-1001250007851-H-A05.dcc" ...
#>  $ sizefact0    : Named num [1:16] 0.0671 0.0452 0.0427 0.0548 0.073 ...
#>   ..- attr(*, "names")= chr [1:16] "DSP-1001250007851-H-A02.dcc" "DSP-1001250007851-H-A03.dcc" "DSP-1001250007851-H-A04.dcc" "DSP-1001250007851-H-A05.dcc" ...
#>  $ preci1       : num [1:2, 1:2] 0.04 0.02 0.02 1.4
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:2] "(Intercept)" "regiontubule"
#>   .. ..$ : chr [1:2] "(Intercept)" "regiontubule"
#>  $ Im0          :List of 1500
#>   ..$ DDT       : num[0 , 0 ] 
#>   ..$ FOSL1     : num[0 , 0 ] 
#>   ..$ FKBP11    : num[0 , 0 ] 
#>   ..$ PIM1      : num[0 , 0 ] 
#>   ..$ RABEP2    : num[0 , 0 ] 
#>   ..$ ASAP2     : num[0 , 0 ] 
#>   ..$ IRX2      : num[0 , 0 ] 
#>   ..$ CENPX     : num[0 , 0 ] 
#>   ..$ NDUFS2    : num[0 , 0 ] 
#>   ..$ MED11     : num[0 , 0 ] 
#>   ..$ PDP1      : num[0 , 0 ] 
#>   ..$ DCAF11    : num[0 , 0 ] 
#>   ..$ GLCE      : num[0 , 0 ] 
#>   ..$ ELAVL1    : num[0 , 0 ] 
#>   ..$ SSH2      : num[0 , 0 ] 
#>   ..$ CFHR1     : num[0 , 0 ] 
#>   ..$ SEC23B    : num[0 , 0 ] 
#>   ..$ DGCR8     : num[0 , 0 ] 
#>   ..$ PVALEF    : num[0 , 0 ] 
#>   ..$ BEGAIN    : num[0 , 0 ] 
#>   ..$ FBXO21    : num[0 , 0 ] 
#>   ..$ ERP29     : num[0 , 0 ] 
#>   ..$ VPS13A    : num[0 , 0 ] 
#>   ..$ CYBA      : num[0 , 0 ] 
#>   ..$ S1PR3     : num[0 , 0 ] 
#>   ..$ RDH14     : num[0 , 0 ] 
#>   ..$ DIS3L2    : num[0 , 0 ] 
#>   ..$ RNF185    : num[0 , 0 ] 
#>   ..$ SLC50A1   : num[0 , 0 ] 
#>   ..$ UPF3B     : num[0 , 0 ] 
#>   ..$ ART1      : num[0 , 0 ] 
#>   ..$ FAHD2B    : num[0 , 0 ] 
#>   ..$ ZNF684    : num[0 , 0 ] 
#>   ..$ PTGER4    : num[0 , 0 ] 
#>   ..$ DMPK      : num[0 , 0 ] 
#>   ..$ ID3       : num[0 , 0 ] 
#>   ..$ STIM1     : num[0 , 0 ] 
#>   ..$ PLRG1     : num[0 , 0 ] 
#>   ..$ MPC2      : num[0 , 0 ] 
#>   ..$ ENY2      : num[0 , 0 ] 
#>   ..$ FUCA1     : num[0 , 0 ] 
#>   ..$ TCAF1     : num[0 , 0 ] 
#>   ..$ MTSS1     : num[0 , 0 ] 
#>   ..$ STAU2     : num[0 , 0 ] 
#>   ..$ MIS12     : num[0 , 0 ] 
#>   ..$ ZBTB3     : num[0 , 0 ] 
#>   ..$ GLRX3     : num[0 , 0 ] 
#>   ..$ PLEKHB2   : num[0 , 0 ] 
#>   ..$ BAD       : num[0 , 0 ] 
#>   ..$ HIPK1     : num[0 , 0 ] 
#>   ..$ ATP6V1D   : num[0 , 0 ] 
#>   ..$ TMEM140   : num[0 , 0 ] 
#>   ..$ WRNIP1    : num[0 , 0 ] 
#>   ..$ OXCT2     : num[0 , 0 ] 
#>   ..$ PAQR5     : num[0 , 0 ] 
#>   ..$ DHDDS     : num[0 , 0 ] 
#>   ..$ ARSL      : num[0 , 0 ] 
#>   ..$ HSD3B2    : num[0 , 0 ] 
#>   ..$ GAS8      : num[0 , 0 ] 
#>   ..$ CLUAP1    : num[0 , 0 ] 
#>   ..$ INO80D    : num[0 , 0 ] 
#>   ..$ ROMO1     : num[0 , 0 ] 
#>   ..$ ZNF785    : num[0 , 0 ] 
#>   ..$ AFM       : num[0 , 0 ] 
#>   ..$ FAM180A   : num[0 , 0 ] 
#>   ..$ MAP3K20   : num[0 , 0 ] 
#>   ..$ ATF6B     : num[0 , 0 ] 
#>   ..$ EYA3      : num[0 , 0 ] 
#>   ..$ BAIAP2    : num[0 , 0 ] 
#>   ..$ USP8      : num[0 , 0 ] 
#>   ..$ C1RL      : num[0 , 0 ] 
#>   ..$ CERS5     : num[0 , 0 ] 
#>   ..$ ZIC3      : num[0 , 0 ] 
#>   ..$ WDR83OS   : num[0 , 0 ] 
#>   ..$ CHRNB1    : num[0 , 0 ] 
#>   ..$ TSEN15    : num[0 , 0 ] 
#>   ..$ ERP27     : num[0 , 0 ] 
#>   ..$ RIT1      : num[0 , 0 ] 
#>   ..$ PSMG3     : num[0 , 0 ] 
#>   ..$ GIPC2     : num[0 , 0 ] 
#>   ..$ LMBRD1    : num[0 , 0 ] 
#>   ..$ KCNJ15    : num[0 , 0 ] 
#>   ..$ CRTC1     : num[0 , 0 ] 
#>   ..$ ZNF300    : num[0 , 0 ] 
#>   ..$ EP300     : num[0 , 0 ] 
#>   ..$ PAOX      : num[0 , 0 ] 
#>   ..$ PRPF6     : num[0 , 0 ] 
#>   ..$ YES1      : num[0 , 0 ] 
#>   ..$ FAM107B   : num[0 , 0 ] 
#>   ..$ AFAP1L1   : num[0 , 0 ] 
#>   ..$ TNRC18    : num[0 , 0 ] 
#>   ..$ MYO18A    : num[0 , 0 ] 
#>   ..$ TOPORS    : num[0 , 0 ] 
#>   ..$ LATS2     : num[0 , 0 ] 
#>   ..$ METRN     : num[0 , 0 ] 
#>   ..$ ADRB1     : num[0 , 0 ] 
#>   ..$ PCGF2     : num[0 , 0 ] 
#>   ..$ JCAD      : num[0 , 0 ] 
#>   ..$ THNSL2    : num[0 , 0 ] 
#>   .. [list output truncated]
#>  $ Im           :List of 18439
#>   ..$ A2M       : num [1:4, 1:4] 21.1303 8.5032 -0.1702 0.0115 8.5032 ...
#>   ..$ NAT2      : num [1:4, 1:4] 13.976574 8.887219 -0.000738 0.204521 8.887219 ...
#>   ..$ ACADM     : num [1:4, 1:4] 29.2401 19.2994 -0.0804 0.0802 19.2994 ...
#>   ..$ ACADS     : num [1:4, 1:4] 80.4057 58.1105 -0.0033 0.3098 58.1105 ...
#>   ..$ ACAT1     : num [1:4, 1:4] 20.1191 12.8542 -0.0879 0.0348 12.8542 ...
#>   ..$ ACVRL1    : num [1:4, 1:4] 50.7264 17.4294 -0.0244 0.2293 17.4294 ...
#>   ..$ PSEN1     : num [1:4, 1:4] 7.97e+01 5.61e+01 5.33e-05 3.96e-01 5.61e+01 ...
#>   ..$ ADA       : num [1:4, 1:4] 13.372464 8.412482 -0.000587 0.179358 8.412482 ...
#>   ..$ SGCA      : num [1:4, 1:4] 23.2053 14.9894 -0.0154 0.1293 14.9894 ...
#>   ..$ ADRB2     : num [1:4, 1:4] 2.67e+01 1.73e+01 1.11e-05 2.80e-01 1.73e+01 ...
#>   ..$ ADRB3     : num [1:4, 1:4] 3.89 2.24 7.35e-06 1.23e-01 2.24 ...
#>   ..$ AGL       : num [1:4, 1:4] 3.21e+01 2.39e+01 4.28e-05 2.94e-01 2.39e+01 ...
#>   ..$ AGXT      : num [1:4, 1:4] 13.2306 8.44311 -0.00958 0.16776 8.44311 ...
#>   ..$ ABCD1     : num [1:4, 1:4] 2.31e+01 1.12e+01 2.63e-04 2.47e-01 1.12e+01 ...
#>   ..$ ALDOB     : num [1:4, 1:4] 4.21742 2.36829 -0.89891 0.00155 2.36829 ...
#>   ..$ AMPD1     : num [1:4, 1:4] 1.34e+01 8.33 8.22e-05 2.11e-01 8.33 ...
#>   ..$ APC       : num [1:4, 1:4] 3.60e+01 1.83e+01 -6.30e-06 3.10e-01 1.83e+01 ...
#>   ..$ APOA1     : num [1:4, 1:4] 29.737555 17.393144 -0.000214 0.294107 17.393144 ...
#>   ..$ APOC3     : num [1:4, 1:4] 44.97451 28.34827 -0.00416 0.30889 28.34827 ...
#>   ..$ APOE      : num [1:4, 1:4] 8.6294 4.8548 -0.2787 0.0135 4.8548 ...
#>   ..$ APOH      : num [1:4, 1:4] 3.48 2.33 9.79e-06 1.22e-01 2.33 ...
#>   ..$ ARG1      : num [1:4, 1:4] 2.22e+01 1.24e+01 3.96e-05 2.59e-01 1.24e+01 ...
#>   ..$ ARSB      : num [1:4, 1:4] 55.08493 32.15418 -0.00107 0.35459 32.15418 ...
#>   ..$ ASL       : num [1:4, 1:4] 110.25378 82.71718 -0.00753 0.39028 82.71718 ...
#>   ..$ ASPA      : num [1:4, 1:4] 39.45687 31.0848 -0.00809 0.27177 31.0848 ...
#>   ..$ ASS1      : num [1:4, 1:4] 20.8068 11.2059 -0.1244 0.0192 11.2059 ...
#>   ..$ ATM       : num [1:4, 1:4] 8.79e+01 5.15e+01 -7.71e-06 4.10e-01 5.15e+01 ...
#>   ..$ ATP7A     : num [1:4, 1:4] 3.03e+01 1.78e+01 -9.99e-05 3.01e-01 1.78e+01 ...
#>   ..$ ATP7B     : num [1:4, 1:4] 18.9938 12.1616 -0.0223 0.1923 12.1616 ...
#>   ..$ AVPR2     : num [1:4, 1:4] 5.3372 3.8251 0.0119 0.0785 3.8251 ...
#>   ..$ BLM       : num [1:4, 1:4] 11.0912 6.8233 -0.0199 0.153 6.8233 ...
#>   ..$ BRCA2     : num [1:4, 1:4] 12.86985 8.58233 -0.00286 0.16091 8.58233 ...
#>   ..$ BTD       : num [1:4, 1:4] 6.12e+01 3.71e+01 -4.94e-05 3.76e-01 3.71e+01 ...
#>   ..$ BTK       : num [1:4, 1:4] 9.01 5.79 2.38e-05 1.80e-01 5.79 ...
#>   ..$ SERPING1  : num [1:4, 1:4] 49.4303 25.9894 -0.0531 0.026 25.9894 ...
#>   ..$ C2        : num [1:4, 1:4] 5.1053 2.8807 -0.0282 0.0938 2.8807 ...
#>   ..$ C3        : num [1:4, 1:4] 5.5626 4.09169 -0.4753 0.00522 4.09169 ...
#>   ..$ C6        : num [1:4, 1:4] 1.71e+01 1.04e+01 -1.87e-05 2.39e-01 1.04e+01 ...
#>   ..$ C8B       : num [1:4, 1:4] 10.09948 6.10957 -0.00615 0.15522 6.10957 ...
#>   ..$ CA2       : num [1:4, 1:4] 13.8934 9.074 -0.1683 0.0134 9.074 ...
#>   ..$ CACNA1S   : num [1:4, 1:4] 1.36e+01 7.37 -2.42e-05 2.17e-01 7.37 ...
#>   ..$ CBS       : num [1:4, 1:4] 53.8122 32.9343 -0.0282 0.2443 32.9343 ...
#>   ..$ CD36      : num [1:4, 1:4] 2.01e+01 9.88 -7.53e-05 2.51e-01 9.88 ...
#>   ..$ CD3G      : num [1:4, 1:4] 8.57 5.47 8.15e-07 1.79e-01 5.47 ...
#>   ..$ CD40LG    : num [1:4, 1:4] 7.45 5.07 -8.81e-06 1.70e-01 5.07 ...
#>   ..$ CDK4      : num [1:4, 1:4] 1.13e+02 7.64e+01 -2.24e-04 4.43e-01 7.64e+01 ...
#>   ..$ CDKN1C    : num [1:4, 1:4] 20.0781 5.4559 -0.1557 0.0181 5.4559 ...
#>   ..$ CETP      : num [1:4, 1:4] 7.99 5.21 -5.77e-05 1.73e-01 5.21 ...
#>   ..$ CHRNE     : num [1:4, 1:4] 20.9623 14.4304 -0.0016 0.2291 14.4304 ...
#>   ..$ LYST      : num [1:4, 1:4] 7.79e+01 4.83e+01 -7.13e-06 3.96e-01 4.83e+01 ...
#>   ..$ ERCC8     : num [1:4, 1:4] 2.59e+01 2.00e+01 -3.06e-05 2.76e-01 2.00e+01 ...
#>   ..$ CLCN1     : num [1:4, 1:4] 6.86468 4.755111 0.000854 0.145611 4.755111 ...
#>   ..$ CLCN5     : num [1:4, 1:4] 12.3042 8.8935 -0.0995 0.1002 8.8935 ...
#>   ..$ CLN3      : num [1:4, 1:4] 7.92e+01 5.70e+01 9.41e-05 3.92e-01 5.70e+01 ...
#>   ..$ CNGA1     : num [1:4, 1:4] 2.21e+01 1.39e+01 6.66e-05 2.56e-01 1.39e+01 ...
#>   ..$ COL1A1    : num [1:4, 1:4] 14.2016 9.3211 -0.0846 0.0986 9.3211 ...
#>   ..$ COL1A2    : num [1:4, 1:4] 15.9234 8.5407 -0.1707 0.0259 8.5407 ...
#>   ..$ COL4A3    : num [1:4, 1:4] 44.8727 14.5751 -0.0866 0.0889 14.5751 ...
#>   ..$ COL5A1    : num [1:4, 1:4] 9.4344 5.697 -0.1002 0.0801 5.697 ...
#>   ..$ COMP      : num [1:4, 1:4] 2.56e+01 1.57e+01 -5.74e-05 2.82e-01 1.57e+01 ...
#>   ..$ CP        : num [1:4, 1:4] 15.15634 9.35931 0.00319 0.1581 9.35931 ...
#>   ..$ CPOX      : num [1:4, 1:4] 4.513219 3.030791 0.000223 0.127448 3.030791 ...
#>   ..$ CPT2      : num [1:4, 1:4] 46.5891 33.1228 -0.0215 0.24 33.1228 ...
#>   ..$ CST3      : num [1:4, 1:4] 126.713 71.5443 -0.0789 0.1423 71.5443 ...
#>   ..$ CSTB      : num [1:4, 1:4] 219.079 130.367 0.039 0.23 130.367 ...
#>   ..$ CYP17A1   : num [1:4, 1:4] 9.6723 5.8395 -0.1018 0.0993 5.8395 ...
#>   ..$ CYP19A1   : num [1:4, 1:4] 1.05e+01 6.49 2.92e-05 1.92e-01 6.49 ...
#>   ..$ CYP1B1    : num [1:4, 1:4] 17.768 8.54 -0.1127 0.0577 8.54 ...
#>   ..$ CYP2D6    : num [1:4, 1:4] 24.6712 14.2669 -0.0171 0.2062 14.2669 ...
#>   ..$ DDB2      : num [1:4, 1:4] 1.79e+01 1.12e+01 -9.89e-05 2.48e-01 1.12e+01 ...
#>   ..$ DLD       : num [1:4, 1:4] 31.1109 21.3943 -0.0193 0.1057 21.3943 ...
#>   ..$ SLC26A3   : num [1:4, 1:4] 6.27 4.29 -2.19e-05 1.58e-01 4.29 ...
#>   ..$ SLC26A2   : num [1:4, 1:4] 2.58e+01 1.44e+01 3.20e-05 2.76e-01 1.44e+01 ...
#>   ..$ TOR1A     : num [1:4, 1:4] 24.58101 14.65125 0.00944 0.22075 14.65125 ...
#>   ..$ EDN3      : num [1:4, 1:4] 6.2312 4.4511 -0.0122 0.1259 4.4511 ...
#>   ..$ EMD       : num [1:4, 1:4] 1.53e+02 1.02e+02 3.06e-04 4.58e-01 1.02e+02 ...
#>   ..$ EPB42     : num [1:4, 1:4] 4.98e+01 2.70e+01 -4.45e-05 3.49e-01 2.70e+01 ...
#>   ..$ EPHX1     : num [1:4, 1:4] 89.4385 50.2844 -0.0287 0.2761 50.2844 ...
#>   ..$ EPOR      : num [1:4, 1:4] 4.73e+01 3.83e+01 1.07e-04 3.27e-01 3.83e+01 ...
#>   ..$ ERCC5     : num [1:4, 1:4] 55.2361 31.7118 -0.0106 0.2411 31.7118 ...
#>   ..$ ESR1      : num [1:4, 1:4] 2.89e+01 1.75e+01 -9.72e-05 2.93e-01 1.75e+01 ...
#>   ..$ EXT1      : num [1:4, 1:4] 6.06e+01 3.97e+01 -2.75e-06 3.72e-01 3.97e+01 ...
#>   ..$ F11       : num [1:4, 1:4] 9.2722 6.9846 -0.0479 0.1282 6.9846 ...
#>   ..$ F13A1     : num [1:4, 1:4] 1.97e+01 1.52e+01 -5.97e-05 2.49e-01 1.52e+01 ...
#>   ..$ F5        : num [1:4, 1:4] 8.9738 2.5771 -0.0193 0.0899 2.5771 ...
#>   ..$ F9        : num [1:4, 1:4] 5.30e-02 -1.27e-02 -8.84e-07 1.97e-02 -1.27e-02 ...
#>   ..$ FABP2     : num [1:4, 1:4] 1.6058 1.040901 0.000356 0.081299 1.040901 ...
#>   ..$ FANCA     : num [1:4, 1:4] 1.24e+01 8.58 -1.56e-05 2.10e-01 8.58 ...
#>   ..$ FAH       : num [1:4, 1:4] 33.6237 20.5502 -0.0247 0.1941 20.5502 ...
#>   ..$ MS4A2     : num [1:4, 1:4] 2.36 1.51 -6.48e-06 1.02e-01 1.51 ...
#>   ..$ FECH      : num [1:4, 1:4] 19.8438 14.0706 -0.109 0.0737 14.0706 ...
#>   ..$ FGFR2     : num [1:4, 1:4] 2.73e+01 2.22e+01 9.89e-05 2.70e-01 2.22e+01 ...
#>   ..$ FH        : num [1:4, 1:4] 30.8169 20.7292 -0.0877 0.1206 20.7292 ...
#>   ..$ FSHR      : num [1:4, 1:4] 5.00 3.48 -1.62e-05 1.42e-01 3.48 ...
#>   ..$ FUCA1     : num [1:4, 1:4] 47.0674 35.6609 0.0474 0.1502 35.6609 ...
#>   ..$ FUT1      : num [1:4, 1:4] 1.05e+01 5.59 2.34e-05 1.90e-01 5.59 ...
#>   ..$ FUT6      : num [1:4, 1:4] 9.4824 6.308 -0.1181 0.0734 6.308 ...
#>   ..$ G6PC      : num [1:4, 1:4] 4.9649 3.043 -0.161 0.0424 3.043 ...
#>   ..$ GALC      : num [1:4, 1:4] 47.60447 23.17732 -0.00838 0.2488 23.17732 ...
#>   .. [list output truncated]
#>  $ conv0        : num [1:1500, 1] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..- attr(*, "names")= chr [1:1500] "DDT" "FOSL1" "FKBP11" "PIM1" ...
#>  $ conv         : num [1:18439, 1] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..- attr(*, "names")= chr [1:18439] "A2M" "NAT2" "ACADM" "ACADS" ...
#>  $ features_high: chr [1:1500] "DDT" "FOSL1" "FKBP11" "PIM1" ...
#>  $ features_all : chr [1:18439] "A2M" "NAT2" "ACADM" "ACADS" ...

Mixed effect model

First take a look at the study design. It shows the two levels of region both exist in the same patient ID. This indicates the random effect model with random slope would be appropriate, still we fit both random intercept model and random slope model to showcase the capability of the mixed model function.

Here we subset features_high to speed up DE in later steps as only these 30 genes are modeled.

pData(kidney)$region <- factor(pData(kidney)$region, levels=c("glomerulus", "tubule"))

table(pData(kidney)[, c("region", "slide name")])
#>             slide name
#> region       disease3 normal3
#>   glomerulus        4       4
#>   tubule            4       4

features_high_subset <- features_high[1:30]

Random intercept model only for high genes as an example, takes about 1 hour on the full dataset.

set.seed(123)
NBthmDEmod <- fitNBthmDE(object = kidney,
                         form = ~ region+(1|`slide name`),
                         ROIs_high = ROIs_high,
                         split = FALSE,
                         features_all = features_high_subset,
                         preci1=NBthDEmod$preci1,
                         threshold_mean = bgMean,
                         sizescale = TRUE,
                         controlRandom=list(nu=12, nmh_e=400, thin_e=60))

str(NBthmDEmod)
#> List of 17
#>  $ X            : num [1:16, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:16] "DSP-1001250007851-H-A02.dcc" "DSP-1001250007851-H-A03.dcc" "DSP-1001250007851-H-A04.dcc" "DSP-1001250007851-H-A05.dcc" ...
#>   .. ..$ : chr [1:2] "(Intercept)" "regiontubule"
#>   ..- attr(*, "assign")= int [1:2] 0 1
#>   ..- attr(*, "contrasts")=List of 1
#>   .. ..$ region: chr "contr.treatment"
#>   ..- attr(*, "msgScaleX")= chr(0) 
#>  $ Z            : num [1:16, 1:2] 1 1 1 1 0 0 0 0 0 0 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:16] "DSP-1001250007851-H-A02.dcc" "DSP-1001250007851-H-A03.dcc" "DSP-1001250007851-H-A04.dcc" "DSP-1001250007851-H-A05.dcc" ...
#>   .. ..$ : chr [1:2] "disease3" "normal3"
#>  $ rt           :List of 10
#>   ..$ Zt     :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#>   .. .. ..@ i       : int [1:16] 0 0 0 0 1 1 1 1 1 1 ...
#>   .. .. ..@ p       : int [1:17] 0 1 2 3 4 5 6 7 8 9 ...
#>   .. .. ..@ Dim     : int [1:2] 2 16
#>   .. .. ..@ Dimnames:List of 2
#>   .. .. .. ..$ : chr [1:2] "disease3" "normal3"
#>   .. .. .. ..$ : chr [1:16] "DSP-1001250007851-H-A02.dcc" "DSP-1001250007851-H-A03.dcc" "DSP-1001250007851-H-A04.dcc" "DSP-1001250007851-H-A05.dcc" ...
#>   .. .. ..@ x       : num [1:16] 1 1 1 1 1 1 1 1 1 1 ...
#>   .. .. ..@ factors : list()
#>   ..$ theta  : num 1
#>   ..$ Lind   : int [1:2] 1 1
#>   ..$ Gp     : int [1:2] 0 2
#>   ..$ lower  : num 0
#>   ..$ Lambdat:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#>   .. .. ..@ i       : int [1:2] 0 1
#>   .. .. ..@ p       : int [1:3] 0 1 2
#>   .. .. ..@ Dim     : int [1:2] 2 2
#>   .. .. ..@ Dimnames:List of 2
#>   .. .. .. ..$ : NULL
#>   .. .. .. ..$ : NULL
#>   .. .. ..@ x       : num [1:2] 1 1
#>   .. .. ..@ factors : list()
#>   ..$ flist  :List of 1
#>   .. ..$ slide name: Factor w/ 2 levels "disease3","normal3": 1 1 1 1 2 2 2 2 2 2 ...
#>   .. ..- attr(*, "assign")= int 1
#>   ..$ cnms   :List of 1
#>   .. ..$ slide name: chr "(Intercept)"
#>   ..$ Ztlist :List of 1
#>   .. ..$ 1 | `slide name`:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#>   .. .. .. ..@ i       : int [1:16] 0 0 0 0 1 1 1 1 1 1 ...
#>   .. .. .. ..@ p       : int [1:17] 0 1 2 3 4 5 6 7 8 9 ...
#>   .. .. .. ..@ Dim     : int [1:2] 2 16
#>   .. .. .. ..@ Dimnames:List of 2
#>   .. .. .. .. ..$ : chr [1:2] "disease3" "normal3"
#>   .. .. .. .. ..$ : chr [1:16] "DSP-1001250007851-H-A02.dcc" "DSP-1001250007851-H-A03.dcc" "DSP-1001250007851-H-A04.dcc" "DSP-1001250007851-H-A05.dcc" ...
#>   .. .. .. ..@ x       : num [1:16] 1 1 1 1 1 1 1 1 1 1 ...
#>   .. .. .. ..@ factors : list()
#>   ..$ nl     : Named int 2
#>   .. ..- attr(*, "names")= chr "slide name"
#>  $ para0        : logi NA
#>  $ para         : num [1:4, 1:30] 0.316 0.984 25.053 1 0.315 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:4] "(Intercept)" "regiontubule" "r" "threshold"
#>   .. ..$ : chr [1:30] "ACADM" "CPT2" "DLD" "EPOR" ...
#>  $ sizefact     : Named num [1:16] 0.0685 0.0458 0.043 0.0545 0.0742 ...
#>   ..- attr(*, "names")= chr [1:16] "DSP-1001250007851-H-A02.dcc" "DSP-1001250007851-H-A03.dcc" "DSP-1001250007851-H-A04.dcc" "DSP-1001250007851-H-A05.dcc" ...
#>  $ sizefact0    : logi NA
#>  $ preci1       : num [1:2, 1:2] 0.04 0.02 0.02 1.4
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:2] "(Intercept)" "regiontubule"
#>   .. ..$ : chr [1:2] "(Intercept)" "regiontubule"
#>  $ conv0        : logi NA
#>  $ conv         : Named num [1:30] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "names")= chr [1:30] "ACADM" "CPT2" "DLD" "EPOR" ...
#>  $ Im0          : logi NA
#>  $ Im           :List of 30
#>   ..$ ACADM  : num [1:4, 1:4] -13.761 -9.6945 0.0168 -2.6905 -9.6945 ...
#>   ..$ CPT2   : num [1:4, 1:4] 10.5418 7.58289 -0.00985 9.07071 7.58289 ...
#>   ..$ DLD    : num [1:4, 1:4] -7.20 -5.94 4.78e-05 -3.01 -5.94 ...
#>   ..$ EPOR   : num [1:4, 1:4] 2.06e+01 1.67e+01 -8.81e-06 2.28e+01 1.67e+01 ...
#>   ..$ EXT1   : num [1:4, 1:4] 2.53e+01 1.66e+01 -3.26e-05 2.50e+01 1.66e+01 ...
#>   ..$ FAH    : num [1:4, 1:4] 18.67874 11.03686 0.00449 16.12429 11.03686 ...
#>   ..$ FUCA1  : num [1:4, 1:4] 37.9195 28.3687 0.0403 19.4401 28.3687 ...
#>   ..$ GALC   : num [1:4, 1:4] 2.55e+01 1.31e+01 7.43e-05 2.11e+01 1.31e+01 ...
#>   ..$ GCDH   : num [1:4, 1:4] 16.004145 11.856065 -0.000201 22.514191 11.856065 ...
#>   ..$ HMGCL  : num [1:4, 1:4] 8.094 5.68788 -0.00836 5.90006 5.68788 ...
#>   ..$ SGSH   : num [1:4, 1:4] 31.399076 20.797981 -0.000165 30.111257 20.797981 ...
#>   ..$ IDUA   : num [1:4, 1:4] 24.90885 16.5431 0.00559 18.80878 16.5431 ...
#>   ..$ ITGA6  : num [1:4, 1:4] 11.47 7.71 0.02 5.26 7.71 ...
#>   ..$ MTR    : num [1:4, 1:4] 4.1673 3.0973 0.0358 4.0155 3.0973 ...
#>   ..$ POU3F4 : num [1:4, 1:4] 2.30e+01 1.77e+01 -3.75e-06 3.42e+01 1.77e+01 ...
#>   ..$ TCN2   : num [1:4, 1:4] 11.6566 6.2444 -0.0671 8.5455 6.2444 ...
#>   ..$ UROS   : num [1:4, 1:4] 1.66e+01 1.33e+01 3.44e-05 1.48e+01 1.33e+01 ...
#>   ..$ WAS    : num [1:4, 1:4] 1.05e+01 7.38 5.32e-05 1.39e+01 7.38 ...
#>   ..$ COL5A2 : num [1:4, 1:4] 1.14e+01 6.93 5.60e-05 7.60 6.93 ...
#>   ..$ CRYAA  : num [1:4, 1:4] 4.3879 2.1911 -0.0533 4.8391 2.1911 ...
#>   ..$ CTSK   : num [1:4, 1:4] 1.93e+01 1.38e+01 1.09e-04 2.70e+01 1.38e+01 ...
#>   ..$ HLCS   : num [1:4, 1:4] 2.21e+01 1.68e+01 -2.55e-05 3.20e+01 1.68e+01 ...
#>   ..$ HSD17B4: num [1:4, 1:4] 2.65e-01 3.44e-01 5.12e-05 -9.82e-01 3.44e-01 ...
#>   ..$ IL4R   : num [1:4, 1:4] -5.34 -1.66 -5.74e-07 -5.10 -1.66 ...
#>   ..$ UBE3A  : num [1:4, 1:4] 2.4913 1.05177 -0.00443 6.69451 1.05177 ...
#>   ..$ AMT    : num [1:4, 1:4] 2.29e+01 1.60e+01 5.98e-05 3.27e+01 1.60e+01 ...
#>   ..$ FBP1   : num [1:4, 1:4] 9.07115 5.95658 -0.00958 5.84892 5.95658 ...
#>   ..$ GH1    : num [1:4, 1:4] 2.12e+01 1.35e+01 2.96e-05 2.21e+01 1.35e+01 ...
#>   ..$ KCNJ11 : num [1:4, 1:4] 4.45 2.91 -1.84e-05 7.17 2.91 ...
#>   ..$ SMPD1  : num [1:4, 1:4] 28.984421 16.026142 -0.000512 30.403389 16.026142 ...
#>  $ features_high: logi NA
#>  $ features_all : chr [1:30] "ACADM" "CPT2" "DLD" "EPOR" ...
#>  $ theta        : num [1, 1:30] 0.917 0.238 0.969 0.192 0.174 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:30] "ACADM" "CPT2" "DLD" "EPOR" ...
#>   ..- attr(*, "names")= chr [1:30] "ACADM" "CPT2" "DLD" "EPOR" ...
#>  $ varcov       : num [1, 1:30] 0.8414 0.0567 0.9391 0.0369 0.0304 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:30] "ACADM" "CPT2" "DLD" "EPOR" ...
#>  $ Uvec         : num [1:2, 1:30] 0.10845 1.28961 0.00944 0.24002 0.08512 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:30] "ACADM" "CPT2" "DLD" "EPOR" ...

Random slope model (recommended for this study design), takes about 4 hours on the full dataset.

set.seed(123)
NBthmDEmodslope <- fitNBthmDE(object = kidney,
                              form = ~ region+(1+region|`slide name`),
                              ROIs_high = ROIs_high,
                              split = FALSE,
                              features_all = features_high_subset,
                              preci1=NBthDEmod$preci1,
                              threshold_mean = bgMean,
                              sizescale = TRUE,
                              controlRandom=list(nu=12, nmh_e=400, thin_e=60))

Relation between models.

plot(NBthDEmod$para[2,names(NBthmDEmod$para[2,])], NBthmDEmod$para[2,],
     xlab = "Fixed Effect Model Output Parameters", ylab = "Mixed Effect Model Output Parameters")
abline(a=0,b=1)


plot(NBthDEmod$para[2,names(NBthmDEmodslope$para[2,])], NBthmDEmodslope$para[2,],
     xlab = "Fixed Effect Model Output Parameters", ylab = "Random Slope Model Output Parameters")
abline(a=0,b=1)

Genes with larger difference in estimates between fixed effect model and random slope model have larger random effect variance for the random slope.

diff_high <- names(which(abs(NBthDEmod$para[2,names(NBthmDEmodslope$para[2,])]-
                               NBthmDEmodslope$para[2,])>0.6))
diff_high
#> [1] "ACADM" "DLD"
set.seed(123)

NBthmDEmodslope$theta[3, "ACADM"]
#>     ACADM 
#> 0.8215778


annot <- pData(kidney)
annot$ACADM <- posdat["ACADM",]

The figure below shows there are huge variation in the difference between two levels of region within each slide.


plot_dat <- annot[,c("region", "ACADM", "slide name")]

p <- ggplot(plot_dat, aes(x=`slide name`, y=ACADM, fill=region)) +
  geom_boxplot()

plot(p)

Generate DE result

A list of inference results can be generated using coefNBth. This produces a list of Wald test inference results on model coefficients.

coeff <- coefNBth(NBthDEmod)
coefr <- coefNBth(NBthmDEmod)
coefrslope <- coefNBth(NBthmDEmodslope)

str(coeff)
#> List of 4
#>  $ estimate : num [1:2, 1:18439] 12.3793 -1.5766 6.6138 -0.0493 8.4003 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:2] "(Intercept)" "regiontubule"
#>   .. ..$ : chr [1:18439] "A2M" "NAT2" "ACADM" "ACADS" ...
#>  $ wald_stat: num [1:2, 1:18439] 45.036 -3.841 16.554 -0.106 28.01 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:2] "(Intercept)" "regiontubule"
#>   .. ..$ : chr [1:18439] "A2M" "NAT2" "ACADM" "ACADS" ...
#>  $ p_value  : num [1:2, 1:18439] 0.00 1.22e-04 1.49e-61 9.16e-01 1.24e-172 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:2] "(Intercept)" "regiontubule"
#>   .. ..$ : chr [1:18439] "A2M" "NAT2" "ACADM" "ACADS" ...
#>  $ se       : num [1:2, 1:18439] 0.275 0.41 0.4 0.466 0.3 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:2] "(Intercept)" "regiontubule"
#>   .. ..$ : chr [1:18439] "A2M" "NAT2" "ACADM" "ACADS" ...

If you see an NA it is an extremely insignificant gene, these p-values can be changed to 1.

We can find the baselevel of this DE comparison by looking at the comparison name after coefNBth. The base level is not listed here as it is what everything else is compared to. So in this case the base level is regionglomerulus.

rownames(coeff$estimate)[-1]
#> [1] "regiontubule"

DE tables can be generated using DENBth. This will produce a table using the inference list generated by coefNBth. Negative fold changes indicate higher expression in the base condition.

DEtab <- DENBth(coeff, variable = "regiontubule")
DEtabr <- DENBth(coefr, variable = "regiontubule")
DEtabrslope <- DENBth(coefrslope, variable = "regiontubule")

head(DEtab)
#>             log2FC       pvalue         adjp
#> A2M    -1.57664936 1.224187e-04 3.456782e-03
#> NAT2   -0.04929348 9.156734e-01 9.999999e-01
#> ACADM   0.99717370 5.600245e-03 7.679366e-02
#> ACADS   0.58771310 1.457096e-02 1.566589e-01
#> ACAT1   1.19455895 4.476426e-03 6.489059e-02
#> ACVRL1 -1.36947667 9.748578e-07 5.063494e-05

For datasets with multiple comparisons, contrastNBth() can be used to create all pair-wise comparisons. That output can also be run through DENBth to create a DE table.

Normalization

Here we normalize the data using a Poisson threshold model based normalization-log2 transformation. In this first normalization, we will not split by slide.

set.seed(123)

names(assayData(kidney))
#> [1] "exprs"

kidney <- fitPoisthNorm(object = kidney,
                        ROIs_high = ROIs_high,
                        threshold_mean = bgMean,
                        sizescalebythreshold = TRUE)
#> probe finished
#> Iteration = 1, squared error = 1.48120928085973e-10
#> probe finished
#> Iteration = 2, squared error = 3.10950441870413e-06
#> Model converged.

names(assayData(kidney))
#> [1] "exprs"    "normmat"  "normmat0"

head(fData(kidney)[,(ncol(fData(kidney))-6):ncol(fData(kidney))])
#>        feature_high_fitNBth para0_norm.var1 para0_norm.var2 para0_norm.var3
#> A2M                       0              NA              NA              NA
#> NAT2                      0              NA              NA              NA
#> ACADM                     1      0.76435937     -0.05106024      0.31344867
#> ACADS                     0              NA              NA              NA
#> ACAT1                     0              NA              NA              NA
#> ACVRL1                    0              NA              NA              NA
#>        para0_norm.var4 para0_norm.var5 para0_norm.var6 para0_norm.var7
#> A2M                 NA              NA              NA              NA
#> NAT2                NA              NA              NA              NA
#> ACADM       0.88765479      2.15982694      2.84061727      2.23455301
#> ACADS               NA              NA              NA              NA
#> ACAT1               NA              NA              NA              NA
#> ACVRL1              NA              NA              NA              NA
#>        para0_norm.var8 para0_norm.var9 para0_norm.var10 para0_norm.var11
#> A2M                 NA              NA               NA               NA
#> NAT2                NA              NA               NA               NA
#> ACADM       3.18781094      1.50144580       1.91267822       0.40395158
#> ACADS               NA              NA               NA               NA
#> ACAT1               NA              NA               NA               NA
#> ACVRL1              NA              NA               NA               NA
#>        para0_norm.var12 para0_norm.var13 para0_norm.var14 para0_norm.var15
#> A2M                  NA               NA               NA               NA
#> NAT2                 NA               NA               NA               NA
#> ACADM        1.23854063       1.45022129       1.00443249       0.76781673
#> ACADS                NA               NA               NA               NA
#> ACAT1                NA               NA               NA               NA
#> ACVRL1               NA               NA               NA               NA
#>        para0_norm.var16 para0_norm.var17 para_norm.var1 para_norm.var2
#> A2M                  NA               NA     5.38686944     5.07771005
#> NAT2                 NA               NA    -0.88598296     0.22208438
#> ACADM        1.97702528       1.00000825     0.78352975     0.18615702
#> ACADS                NA               NA     0.50724270     0.41006263
#> ACAT1                NA               NA     0.69895544     1.07263405
#> ACVRL1               NA               NA     1.77750513     0.99105528
#>        para_norm.var3 para_norm.var4 para_norm.var5 para_norm.var6
#> A2M        4.95420040     5.08051921     0.81354796     3.16820398
#> NAT2      -1.40464756    -0.03802143    -1.48540000    -0.16751785
#> ACADM      0.39222248     0.88588416     2.15690907     2.82744451
#> ACADS      1.25546405     0.41994188     0.99141483     1.43198889
#> ACAT1      1.04469584     0.38841539     3.39845246     3.56333336
#> ACVRL1     1.33350755     1.04226803    -0.46846706    -0.01900852
#>        para_norm.var7 para_norm.var8 para_norm.var9 para_norm.var10
#> A2M        1.09640192     3.48726140     5.48121180      5.30362159
#> NAT2      -0.53281731    -0.95277482     0.36723630     -1.18444164
#> ACADM      2.24371626     3.17786066     1.47747796      1.87840335
#> ACADS      1.90722428     1.93915604     1.31333211     -0.01863125
#> ACAT1      2.83147689     4.29405223     1.34088786      1.65971743
#> ACVRL1    -0.26539900     0.39719256     2.33259246      1.82226621
#>        para_norm.var11 para_norm.var12 para_norm.var13 para_norm.var14
#> A2M         5.32617716      5.14133818      2.82570771      4.89983865
#> NAT2       -0.51450753     -0.73853617     -0.08103528     -0.39775330
#> ACADM       0.59355243      1.26931247      1.44467633      1.05125149
#> ACADS       1.15612980      1.21492218      1.34457122      1.15276850
#> ACAT1       2.59755248      2.12969084      2.03634045      2.03644425
#> ACVRL1      1.36910864      1.77712880      0.56484778      0.98226287
#>        para_norm.var15 para_norm.var16 para_norm.var17 conv0 conv features_high
#> A2M         1.96317519      3.75109364      0.99985730    NA    0            NA
#> NAT2       -0.62887390     -0.30679190      0.99946136    NA    0            NA
#> ACADM       0.89783003      1.93911761      0.99992239     0    0             1
#> ACADS       1.00003245      1.06181782      0.99990658    NA    0            NA
#> ACAT1       1.21834857      2.02849135      0.99984368    NA    0            NA
#> ACVRL1      0.06709719      0.04831181      0.99987426    NA    0            NA
#>        features_all
#> A2M               1
#> NAT2              1
#> ACADM             1
#> ACADS             1
#> ACAT1             1
#> ACVRL1            1

head(pData(kidney))
#>                             construct read_pattern expected_neg slide name
#> DSP-1001250007851-H-A02.dcc directPCR         2x27            0   disease3
#> DSP-1001250007851-H-A03.dcc directPCR         2x27            0   disease3
#> DSP-1001250007851-H-A04.dcc directPCR         2x27            0   disease3
#> DSP-1001250007851-H-A05.dcc directPCR         2x27            0   disease3
#> DSP-1001250007864-D-A02.dcc directPCR         2x27            0    normal3
#> DSP-1001250007864-D-A03.dcc directPCR         2x27            0    normal3
#>                              class           segment     area     region
#> DSP-1001250007851-H-A02.dcc    DKD Geometric Segment 31797.59 glomerulus
#> DSP-1001250007851-H-A03.dcc    DKD Geometric Segment 16920.10 glomerulus
#> DSP-1001250007851-H-A04.dcc    DKD Geometric Segment 14312.33 glomerulus
#> DSP-1001250007851-H-A05.dcc    DKD Geometric Segment 20032.84 glomerulus
#> DSP-1001250007864-D-A02.dcc normal             PanCK 20265.55     tubule
#> DSP-1001250007864-D-A03.dcc normal               neg 91483.62     tubule
#>                             pathology nuclei   sizefact sizefact_sp
#> DSP-1001250007851-H-A02.dcc  abnormal  16871 0.07178719  0.07178719
#> DSP-1001250007851-H-A03.dcc  abnormal  17684 0.04699198  0.04699198
#> DSP-1001250007851-H-A04.dcc  abnormal  15108 0.03800758  0.03800758
#> DSP-1001250007851-H-A05.dcc  abnormal  15271 0.05091165  0.05091165
#> DSP-1001250007864-D-A02.dcc        NA  15471 0.03893244  0.03893244
#> DSP-1001250007864-D-A03.dcc        NA  14470 0.10618339  0.10618339
#>                             sizefact_fitNBth     0.75      0.8      0.9
#> DSP-1001250007851-H-A02.dcc       0.06852735 17.18841 20.18841 29.18841
#> DSP-1001250007851-H-A03.dcc       0.04575617 12.26812 14.26812 20.26812
#> DSP-1001250007851-H-A04.dcc       0.04295366 10.74638 12.74638 18.74638
#> DSP-1001250007851-H-A05.dcc       0.05450094 13.62319 16.62319 23.62319
#> DSP-1001250007864-D-A02.dcc       0.07420959 16.59420 20.59420 32.59420
#> DSP-1001250007864-D-A03.dcc       0.10564590 25.52899 30.52899 48.52899
#>                                 0.95 sizefact_norm sizefact0_norm
#> DSP-1001250007851-H-A02.dcc 43.18841    0.06886568     0.06852655
#> DSP-1001250007851-H-A03.dcc 30.26812    0.04576581     0.04575345
#> DSP-1001250007851-H-A04.dcc 27.74638    0.04274425     0.04295132
#> DSP-1001250007851-H-A05.dcc 35.62319    0.05466523     0.05450215
#> DSP-1001250007864-D-A02.dcc 50.59420    0.07346445     0.07420846
#> DSP-1001250007864-D-A03.dcc 75.52899    0.10531013     0.10564791

After normalization, 2 matrices are added to the assayData:

normmat0 - normalization after iteration 1

normmat - normalization after iteration 2

Convergence and parameter values are added to pData and fData.


In this normalize, we split by slide.

set.seed(123)

kidney <- fitPoisthNorm(object = kidney,
                        split = TRUE,
                        ROIs_high = ROIs_high,
                        threshold_mean = bgMean,
                        sizescalebythreshold = TRUE)
#> The results are based on stored `groupvar`, slide name
#> probe finished
#> Iteration = 1, squared error = 0.0312928907369696
#> probe finished
#> Iteration = 2, squared error = 4.96902071809208e-06
#> Model converged.
#> probe finished
#> Iteration = 1, squared error = 0.048150130113639
#> probe finished
#> Iteration = 2, squared error = 1.53844998792287e-06
#> Model converged.

names(assayData(kidney))
#> [1] "exprs"       "normmat0_sp" "normmat"     "normmat_sp"  "normmat0"

After normalization, 2 matrices are added to the assayData labeled with -sp for split:

normmat0-sp - normalization after iteration 1

normmat-sp - normalization after iteration 2


Comparison of normalization methods

Compared to quantile 75 (Q3) normalization

norm_dat_backqu75 <- sweep(posdat[, ROIs_high], 2,
                           (kidney[, ROIs_high]$sizefact * bgMean),
                           FUN = "-") %>%
  sweep(., 2, quan75[ROIs_high], FUN = "/") %>%
  pmax(., 0) %>%
  `+`(., 0.01) %>%
  log2()
dat_plot <- cbind(pData(kidney)[ROIs_high, c("slide name", "region")],
                  t(norm_dat_backqu75[features_all, ]))

dat_plot <- cbind(dat_plot, ROI_ID = ROIs_high)

dat_plot <- melt(dat_plot, id.vars = c("ROI_ID", "slide name", "region"))

ggplot(dat_plot, aes(x = value)) +
  geom_density(aes(fill = region, group = ROI_ID, alpha = 0.01)) +
  facet_grid(~`slide name`) +
  ggtitle("Q3 Normalization")+
  labs(x = "Q3 Normalized Value (log2)")

Here you can see that Q3 normalization is prone to low values.

annot <- pData(kidney)

dat_plot <- cbind(annot[ROIs_high, c("slide name", "region")],
                  t(assayDataElement(kidney[features_high, ROIs_high], "normmat_sp")))

dat_plot <- cbind(dat_plot, ROI_ID = ROIs_high)

dat_plot <- melt(dat_plot, id.vars = c("ROI_ID", "slide name", "region"))

ggplot(dat_plot, aes(x = value)) +
  geom_density(aes(fill = region, group = ROI_ID, alpha = 0.01)) +
  facet_wrap(~`slide name`) +
  ggtitle("Poisson threshold normalization")+
  labs(x = "Poisson Threshold Normalized Value (log2)")

In contrast, you can see that the poisson threshold normalized values follow more of a normal curve, eliminating the spikes in low values.

Clustering

dat <- t(norm_dat_backqu75[features_high, ])
dat_pca <- prcomp(dat, center = TRUE, scale. = TRUE)
dat <- as.data.frame(dat)

dat$PC1 <- dat_pca$x[, 1]
dat$PC2 <- dat_pca$x[, 2]
dat$id <- annot$`slide name`[match(ROIs_high, colnames(posdat))]
dat$class <- annot$class[match(ROIs_high, colnames(posdat))]
dat$region <- annot$region[match(ROIs_high, colnames(posdat))]
dat$sizeratio <- kidney[, ROIs_high]$sizefact_fitNBth / kidney[, ROIs_high]$sizefact

p <- ggplot(data = dat, aes(x = PC1, y = PC2)) +
  geom_point(aes(colour = paste(class, region))) +
  theme_bw()+
  labs(title = "Q3 Normalized Data")

plot(p)


p <- ggplot(data = dat, aes(x = PC1, y = PC2)) +
  geom_point(aes(colour = log2(sizeratio))) +
  theme_bw()+
  scale_color_gradient2(high = "gold", mid = "grey50", low = "darkblue", midpoint = 0.2)+
  labs(title = "Q3 Normalized Data")

plot(p)

As you can see in the first PCA plot, the ROIs cluster by region and class. However, the first PC is mostly driven by the ratio of background to signal size ratio as shown in the second PCA plot.

With the Poisson Threshold normalization, the ROIs still cluster by region and class but the first PC is not strictly driven by the background to signal size ratio.

dat <- t(assayDataElement(kidney[features_high, ROIs_high],"normmat_sp"))
dat_pca <- prcomp(dat, center = TRUE, scale. = TRUE)
dat <- as.data.frame(dat)

dat$PC1 <- dat_pca$x[, 1]
dat$PC2 <- dat_pca$x[, 2]
dat$id <- annot$`slide name`[match(ROIs_high, colnames(posdat))]
dat$class <- annot$class[match(ROIs_high, colnames(posdat))]
dat$region <- annot$region[match(ROIs_high, colnames(posdat))]
dat$sizeratio <- kidney[, ROIs_high]$sizefact_fitNBt / kidney[, ROIs_high]$sizefact

p <- ggplot(data = dat, aes(x = PC1, y = PC2)) +
  geom_point(aes(colour = paste(class, region))) +
  theme_bw()+
  labs(title = "Poisson Threshold Normalized Data")

plot(p)


p <- ggplot(data = dat, aes(x = PC1, y = PC2)) +
  geom_point(aes(colour = log2(sizeratio))) +
  theme_bw()+
  scale_color_gradient2(high = "gold", mid = "grey50", low = "darkblue", midpoint = 0.2)+
  labs(title = "Poisson Threshold Normalized Data")

plot(p)

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] reshape2_1.4.4           GeomxTools_3.10.0        NanoStringNCTools_1.14.0
#> [4] S4Vectors_0.44.0         ggplot2_3.5.1            dplyr_1.1.4             
#> [7] GeoDiff_1.12.0           Biobase_2.66.0           BiocGenerics_0.52.0     
#> 
#> loaded via a namespace (and not attached):
#>   [1] testthat_3.2.1.1        readxl_1.4.3            rlang_1.1.4            
#>   [4] magrittr_2.0.3          compiler_4.4.1          systemfonts_1.1.0      
#>   [7] vctrs_0.6.5             stringr_1.5.1           pkgconfig_2.0.3        
#>  [10] crayon_1.5.3            fastmap_1.2.0           XVector_0.46.0         
#>  [13] labeling_0.4.3          utf8_1.2.4              rmarkdown_2.28         
#>  [16] UCSC.utils_1.2.0        nloptr_2.1.1            ggbeeswarm_0.7.2       
#>  [19] purrr_1.0.2             xfun_0.48               zlibbioc_1.52.0        
#>  [22] cachem_1.1.0            GenomeInfoDb_1.42.0     jsonlite_1.8.9         
#>  [25] EnvStats_3.0.0          highr_0.11              uuid_1.2-1             
#>  [28] parallel_4.4.1          R6_2.5.1                bslib_0.8.0            
#>  [31] stringi_1.8.4           RColorBrewer_1.1-3      GGally_2.2.1           
#>  [34] parallelly_1.38.0       boot_1.3-31             rrcov_1.7-6            
#>  [37] brio_1.1.5              jquerylib_0.1.4         cellranger_1.1.0       
#>  [40] numDeriv_2016.8-1.1     Rcpp_1.0.13             knitr_1.48             
#>  [43] future.apply_1.11.3     IRanges_2.40.0          Matrix_1.7-1           
#>  [46] splines_4.4.1           tidyselect_1.2.1        yaml_2.3.10            
#>  [49] codetools_0.2-20        listenv_0.9.1           lattice_0.22-6         
#>  [52] tibble_3.2.1            lmerTest_3.1-3          plyr_1.8.9             
#>  [55] withr_3.0.2             evaluate_1.0.1          future_1.34.0          
#>  [58] ggstats_0.7.0           Biostrings_2.74.0       pillar_1.9.0           
#>  [61] pcaPP_2.0-5             generics_0.1.3          sp_2.1-4               
#>  [64] munsell_0.5.1           scales_1.3.0            minqa_1.2.8            
#>  [67] globals_0.16.3          robust_0.7-5            glue_1.8.0             
#>  [70] pheatmap_1.0.12         tools_4.4.1             robustbase_0.99-4-1    
#>  [73] data.table_1.16.2       lme4_1.1-35.5           ggiraph_0.8.10         
#>  [76] dotCall64_1.2           mvtnorm_1.3-1           grid_4.4.1             
#>  [79] tidyr_1.3.1             colorspace_2.1-1        nlme_3.1-166           
#>  [82] GenomeInfoDbData_1.2.13 fit.models_0.64         beeswarm_0.4.0         
#>  [85] vipor_0.4.7             cli_3.6.3               spam_2.11-0            
#>  [88] fansi_1.0.6             ggthemes_5.1.0          gtable_0.3.6           
#>  [91] DEoptimR_1.1-3          sass_0.4.9              digest_0.6.37          
#>  [94] progressr_0.15.0        farver_2.1.2            rjson_0.2.23           
#>  [97] htmlwidgets_1.6.4       SeuratObject_5.0.2      htmltools_0.5.8.1      
#> [100] lifecycle_1.0.4         httr_1.4.7              MASS_7.3-61