SEtools 1.0.0
The SEtools package is a set of convenience functions for the Bioconductor class SummarizedExperiment. It facilitates merging, melting, and plotting SummarizedExperiment objects.
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("SEtools")Or, until the new bioconductor release:
devtools::install_github("plger/SEtools")To showcase the main functions, we will use an example object which contains (a subset of) whole-hippocampus RNAseq of mice after different stressors:
suppressPackageStartupMessages({
  library(SummarizedExperiment)
  library(SEtools)
})
data("SE", package="SEtools")
SE## class: SummarizedExperiment 
## dim: 100 20 
## metadata(0):
## assays(2): counts logcpm
## rownames(100): Egr1 Nr4a1 ... CH36-200G6.4 Bhlhe22
## rowData names(2): meanCPM meanTPM
## colnames(20): HC.Homecage.1 HC.Homecage.2 ... HC.Swim.4 HC.Swim.5
## colData names(2): Region ConditionThis is taken from Floriou-Servou et al., Biol Psychiatry 2018.
The sehm function simplifies the generation of heatmaps from SummarizedExperiment. It uses pheatmap, so any argument supported by it can in principle be passed:
g <- c("Egr1", "Nr4a1", "Fos", "Egr2", "Sgk1", "Arc", "Dusp1", "Fosb", "Sik1")
sehm(SE, genes=g)## Using assay logcpmsehm(SE, genes=g, scale="row")## Using assay logcpmAnnotation from the object’s rowData and colData can be plotted simply by specifying the column name (some will be shown by default if found):
sehm(SE, genes=g, scale="row", anno_rows="meanTPM")## Using assay logcpmThese can also be used to create gaps:
sehm(SE, genes=g, scale="row", anno_rows="meanTPM", gaps_at="Condition")## Using assay logcpmThe specific assay to use for plotting can be specified with the assayName argument.
By default, rows are sorted not with hierarchical clustering, but from the angle on a MDS plot, which tends to give nicer results than bottom-up hierarchical clustering. This can be disabled using sortRowsOn=NULL or cluster_rows=TRUE (to avoid any row reordering and use the order given, use sortRowsOn=NULL, cluster_rows=FALSE). Column clustering is disabled by default, but this can be changed with cluster_cols=TRUE.
It is common to cluster features into groups, and such a clustering can be used simultaneously with row sorting using the toporder argument. For instance:
lfcs <- assays(SE)$logcpm-rowMeans(assays(SE)$logcpm[,which(SE$Condition=="Homecage")])
rowData(SE)$cluster <- as.character(kmeans(lfcs,4)$cluster)
sehm(SE, scale="row", anno_rows="cluster", toporder="cluster", gaps_at="Condition")## Using assay logcpmHeatmaps from multiple SE can be created either by merging the objects (see below), or using the crossHm function, which uses the ComplexHeatmap pacakge:
crossHm( list(se1=SE, se2=SE), g, 
         anno_colors = list( Condition=c( Homecage="green",
                                          Handling="orange",
                                          Restraint="red",
                                          Swim="blue")
                            )
        )## Using assay logcpm
## Using assay logcpmFor some arguments (for instance colors), if they are not specified in the function call, SEtools will try to see whether the corresponding global options have been set, before using default colors. This means that if, in the context of a given project, the same colors are repeatedly being used, they can be specified a single time, and all subsequent plots will be affected:
options("SEtools_def_hmcols"=c("white","grey","black"))
ancols <- list( Condition=c( Homecage="green",
                             Handling="orange",
                             Restraint="red",
                             Swim="blue" ) )
options("SEtools_def_anno_colors"=ancols)
sehm(SE, g, do.scale = TRUE)## Using assay logcpmAt the moment, the following arguments can be set as global options:
assayName, hmcols, anno_columns, anno_rows, anno_colors, gaps_at, breaks.
Options must be set with the prefix SEtools_def_, followed by the name of the argument.
You may use resetAllSEtoolsOptions() to remove all global options relative to the package.
se1 <- SE[,1:10]
se2 <- SE[,11:20]
se3 <- mergeSEs( list(se1=se1, se2=se2) )
se3## class: SummarizedExperiment 
## dim: 100 20 
## metadata(0):
## assays(2): counts logcpm
## rownames(100): AC139063.2 Actr6 ... Zfp667 Zfp930
## rowData names(3): meanCPM meanTPM cluster
## colnames(20): se1.HC.Homecage.1 se1.HC.Homecage.2 ... se2.HC.Swim.4
##   se2.HC.Swim.5
## colData names(3): Dataset Region ConditionAll assays were merged, along with rowData and colData slots.
By default, row z-scores are calculated for each object when merging. This can be prevented with:
se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE)If more than one assay is present, one can specify a different scaling behavior for each assay:
se3 <- mergeSEs( list(se1=se1, se2=se2), use.assays=c("counts", "logcpm"), do.scale=c(FALSE, TRUE))To facilitate plotting features with ggplot2, the meltSE function combines assay values along with row/column data:
d <- meltSE(SE, genes=g[1:4])
head(d)##   feature        sample Region Condition counts    logcpm
## 1    Egr1 HC.Homecage.1     HC  Homecage 1581.0 4.4284969
## 2   Nr4a1 HC.Homecage.1     HC  Homecage  750.0 3.6958917
## 3     Fos HC.Homecage.1     HC  Homecage   91.4 1.7556317
## 4    Egr2 HC.Homecage.1     HC  Homecage   15.1 0.5826999
## 5    Egr1 HC.Homecage.2     HC  Homecage 1423.0 4.4415828
## 6   Nr4a1 HC.Homecage.2     HC  Homecage  841.0 3.9237691suppressPackageStartupMessages(library(ggplot2))
ggplot(d, aes(Condition, counts)) + geom_violin() + facet_wrap(~feature, scale="free")
Figure 1: An example ggplot created from a melted SE
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] ggplot2_3.2.1               SEtools_1.0.0              
##  [3] SummarizedExperiment_1.16.0 DelayedArray_0.12.0        
##  [5] BiocParallel_1.20.0         matrixStats_0.55.0         
##  [7] Biobase_2.46.0              GenomicRanges_1.38.0       
##  [9] GenomeInfoDb_1.22.0         IRanges_2.20.0             
## [11] S4Vectors_0.24.0            BiocGenerics_0.32.0        
## [13] BiocStyle_2.14.0           
## 
## loaded via a namespace (and not attached):
##  [1] viridis_0.5.1          viridisLite_0.3.0      foreach_1.4.7         
##  [4] gtools_3.8.1           assertthat_0.2.1       highr_0.8             
##  [7] BiocManager_1.30.9     GenomeInfoDbData_1.2.2 yaml_2.2.0            
## [10] pillar_1.4.2           lattice_0.20-38        glue_1.3.1            
## [13] digest_0.6.22          RColorBrewer_1.1-2     XVector_0.26.0        
## [16] colorspace_1.4-1       htmltools_0.4.0        Matrix_1.2-17         
## [19] pkgconfig_2.0.3        pheatmap_1.0.12        GetoptLong_0.1.7      
## [22] bookdown_0.14          zlibbioc_1.32.0        purrr_0.3.3           
## [25] scales_1.0.0           gdata_2.18.0           tibble_2.1.3          
## [28] withr_2.1.2            lazyeval_0.2.2         magrittr_1.5          
## [31] crayon_1.3.4           evaluate_0.14          MASS_7.3-51.4         
## [34] gplots_3.0.1.1         tools_3.6.1            registry_0.5-1        
## [37] data.table_1.12.6      GlobalOptions_0.1.1    ComplexHeatmap_2.2.0  
## [40] stringr_1.4.0          munsell_0.5.0          cluster_2.1.0         
## [43] compiler_3.6.1         caTools_1.17.1.2       rlang_0.4.1           
## [46] grid_3.6.1             RCurl_1.95-4.12        iterators_1.0.12      
## [49] rjson_0.2.20           circlize_0.4.8         labeling_0.3          
## [52] bitops_1.0-6           rmarkdown_1.16         gtable_0.3.0          
## [55] codetools_0.2-16       TSP_1.1-7              R6_2.4.0              
## [58] gridExtra_2.3          seriation_1.2-8        knitr_1.25            
## [61] dplyr_0.8.3            clue_0.3-57            KernSmooth_2.23-16    
## [64] dendextend_1.12.0      shape_1.4.4            stringi_1.4.3         
## [67] Rcpp_1.0.2             png_0.1-7              gclus_1.3.2           
## [70] tidyselect_0.2.5       xfun_0.10