Deploying custom panels in the `iSEE` interface

Kevin Rue-Albrecht1*, Federico Marini2,3**, Charlotte Soneson4,5*** and Aaron Lun6****

1Kennedy Institute of Rheumatology, University of Oxford, Headington, Oxford OX3 7FY, UK.
2Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), Mainz
3Center for Thrombosis and Hemostasis (CTH), Mainz
4Institute of Molecular Life Sciences, University of Zurich
5SIB Swiss Institute of Bioinformatics
6Cancer Research UK Cambridge Institute, University of Cambridge

*kevin.rue-albrecht@kennedy.ox.ac.uk
**marinif@uni-mainz.de
***charlottesoneson@gmail.com
****infinite.monkeys.with.keyboards@gmail.com

2 May 2019

Package

iSEE 1.4.0

Compiled date: 2019-05-02

Last edited: 2018-03-08

License: MIT + file LICENSE

1 Background

Users can define their own plots or tables to include in the iSEE interface (Rue-Albrecht et al. 2018). These custom panels are intended to receive subsets of rows and/or columns from other transmitting panels in the interface. The values in the custom panels are recomputed “on the fly” by user-defined functions using the transmitted subset. This provides a flexible and convenient framework for lightweight interactive analysis during data exploration. For example, selection of a particular subset of samples can be transmitted to a custom plot panel that performs dimensionality reduction on that subset. Alternatively, the subset can be transmitted to a custom table that performs a differential expression analysis between that subset and all other samples.

2 Defining custom functions

2.1 Minimum requirements

Recalculations in custom panels are performed using user-defined functions that are supplied to the iSEE() call. The only requirements are that the function must accept:

A SummarizedExperiment object or its derivatives as the first argument.

And, if iSEE(..., customSendAll=FALSE) (the default):

A vector of row names as the second argument, specifying the rows that are currently selected in the transmitting row-based panel. This may be NULL if no transmitting panel was selected or if no active selection is available.
A vector of column names as the third argument, specifying the columns that are currently selected in the transmitting column-based panel. This may be NULL if no transmitting panel was selected or if no active selection is available.

Conversely, iSEE(..., customSendAll=TRUE) transmits all selections - active and saved - from the transmitting panel. This feature is described in further detail below.

The output of the function should be:

A ggplot object, for functions used in custom plot panels.
A data.frame, for functions used in custom table panels.

2.2 Example of custom plot panel

To demonstrate the use of custom plot panels, we define an example function CUSTOM_PCA below. This takes a subset of features and cells in a SingleCellExperiment object, performs a principal components analysis (PCA) using that subset with runPCA, and creates a plot of the first two principal components with plotPCA (McCarthy et al. 2017).

library(scater)
CUSTOM_PCA <- function(se, rows, columns, colour_by=NULL, scale_features=TRUE) {
    if (!is.null(columns)) {
        kept <- se[, columns]
    } else {
        return(
            ggplot() + theme_void() + geom_text(
                aes(x, y, label=label),
                data.frame(x=0, y=0, label="No column data selected."),
                size=5)
            )
    }

    scale_features <- as.logical(scale_features)
    kept <- runPCA(kept, feature_set=rows, scale_features=scale_features)
    plotPCA(kept, colour_by=colour_by)
}

As mentioned above, rows and columns may be NULL if no selection was made in the respective transmitting panels. How these should be treated is up to the user-defined function. In the CUSTOM_PCA example above, an empty ggplot is returned if there is no selection on the columns, while the default runPCA behaviour is used if feature_set=NULL.

2.3 Example of custom table panel

To demonstrate the use of custom table panels, we define an example function CUSTOM_SUMMARY below. This takes a subset of features and cells in a SingleCellExperiment object, and creates data-frame that details the mean, variance and count of samples with expression above a given cut-off within the selection.

CUSTOM_SUMMARY <- function(se, ri, ci, assay="logcounts", min_exprs=0) {
    if (is.null(ri)) {
        ri <- rownames(se)
    }
    if (is.null(ci)) {
        ci <- colnames(se)
    }
    
    assayMatrix <- assay(se, assay)[ri, ci, drop=FALSE]
    
    data.frame(
        Mean = rowMeans(assayMatrix),
        Var = rowVars(assayMatrix),
        Sum = rowSums(assayMatrix),
        n_detected = rowSums(assayMatrix > min_exprs),
        row.names = ri
    )
}

Note that in the CUSTOM_SUMMARY example above, if either rows or columns are NULL, all rows or columns are used, respectively.

2.4 Additional arguments

Users of custom panels can specify additional arguments via a text box that accepts multi-line inputs. This is automatically converted into named arguments before being passed to the custom function. Each line corresponds to one argument:value pair, separated by the first space (ignoring whitespace at the start of the line).

For example, consider the CUSTOM_PCA function. If a user were to type:

colour_by Nanog
scale_features FALSE

… in the text box, this would pass colour_by="Nanog" and scale_features="TRUE" to CUSTOM_PCA. Note that all additional arguments are passed as character strings for the sake of security. It is the responsibility of the custom function to perform any necessary type checks and conversions (hence the as.logical in CUSTOM_PCA).

Similarly, additional arguments can be passed to the CUSTOM_SUMMARY function above. For instance:

assay counts
min_exprs 5

2.5 Example use

Using the sce object that we generated earlier, we will add some fields for examining the mean-variance relationship across features:

rowData(sce)$mean_log <- rowMeans(logcounts(sce))
rowData(sce)$var_log <- apply(logcounts(sce), 1, var)

We will then set up an iSEE instance with four panels - a reduced dimension plot, a row data plot, a custom plot, and a custom table. Note how both custom panels initially receives a column selection from the reduced dimension plot and a row selection from the row data plot. We also set the initial function to the name of CUSTOM_PCA in customDataFun, and the initial arguments to the values described above.

library(iSEE)
reddim <- redDimPlotDefaults(sce, 1)
rowdat <- rowDataPlotDefaults(sce, 1)
rowdat$XAxis <- "Row data"
rowdat$XAxisRowData <- "mean_log"
rowdat$YAxis <- "var_log"

cdp <- customDataPlotDefaults(sce, 1)
cdp$Function <- "CUSTOM_PCA"
cdp$Arguments <- "colour_by Nanog\nscale_features FALSE"
cdp$ColumnSource <- "Reduced dimension plot 1"
cdp$RowSource <- "Row data plot 1"

cst <- customStatTableDefaults(sce, 1)
cst$Function <- "CUSTOM_SUMMARY"
cst$Arguments <- "assay logcounts\nmin_exprs 1"
cst$ColumnSource <- "Reduced dimension plot 1"
cst$RowSource <- "Row data plot 1"

app <- iSEE(
    sce, redDimArgs=reddim, rowDataArgs=rowdat, customDataArgs=cdp, customStatArgs=cst,
    initialPanels=DataFrame(
        Name=c(
            "Reduced dimension plot 1", "Row data plot 1",
            "Custom data plot 1", "Custom statistics table 1"),
        Width=c(4, 4, 4, 12)),
    customDataFun=list(CUSTOM_PCA=CUSTOM_PCA),
    customStatFun=list(CUSTOM_SUMMARY=CUSTOM_SUMMARY)
    )

3 Setting `customSendAll=TRUE`

By default, iSEE only passes the “active” point selection from the transmitting panel to your custom function. Users can set customSendAll=TRUE to obtain all selections - active and saved - from the transmitting panel. This takes the form of a list with two elements:

active, a vector of names specifying the points in the active selection.
saved, a list of vectors of names specifying the points in each saved selection.

It is possible for active to be NULL or saved to contain no elements. This occurs if there are no active and/or saved selections, or if no transmitter has been selected.

This is a powerful system that allows users to compute statistics of interests for multiple selections simultaneously. For example, we can write a function that computes log-fold changes between the samples in the active selection and samples in each saved selection. (It would be trivial to extend this to obtain actual differential expression statistics, e.g., using scran::findMarkers() or functions from packages like limma.)

CUSTOM_DIFFEXP <- function(se, ri, ci, assay="logcounts") {
    ri <- ri$active
    if (is.null(ri)) { # ignoring saved gene selections for now.
        ri <- rownames(se)
    }
    if (is.null(ci$active) || length(ci$saved)==0L) {
        return(data.frame(row.names=character(0), LogFC=integer(0))) # dummy value.
    }

    assayMatrix <- assay(se, assay)[ri, , drop=FALSE]
    active <- rowMeans(assayMatrix[,ci$active,drop=FALSE])

    lfcs <- vector("list", length(ci$saved))
    for (i in seq_along(lfcs)) {
        saved <- rowMeans(assayMatrix[,ci$saved[[i]],drop=FALSE])
        lfcs[[i]] <- active - saved
    }
    names(lfcs) <- sprintf("LogFC/%i", seq_along(lfcs))

    output <- do.call(data.frame, lfcs)
    rownames(output) <- ri
    output
}

We also re-use these statistics to visualize some of the genes with the largest log-fold changes:

CUSTOM_HEAT <- function(se, ri, ci, assay="logcounts") {
    everything <- CUSTOM_DIFFEXP(se, ri, ci, assay=assay)
    if (nrow(everything) == 0L) {
        return(ggplot()) # empty ggplot if no genes reported.
    }

    everything <- as.matrix(everything)
    top <- head(order(rowMeans(abs(everything)), decreasing=TRUE), 50)
    topFC <- everything[top, , drop=FALSE]
    dimnames(topFC) <- list(gene=rownames(topFC), contrast=colnames(topFC))
    dfFC <- reshape2::melt(topFC)
    ggplot(dfFC, aes(contrast, gene)) + geom_raster(aes(fill = value))
}

We test this out as shown below. In particular, we set customSendAll=TRUE to pass both the active and saved selections to each custom panel. Note that each saved selection is also the active selection when it is first generated, hence the log-fold changes of zero in the last column of the heat map until a new active selection is drawn.

cdp2 <- customDataPlotDefaults(sce, 1)
cdp2$Function <- "CUSTOM_HEAT"
cdp2$Arguments <- "assay logcounts"
cdp2$ColumnSource <- "Reduced dimension plot 1"
cdp2$RowSource <- "Row data plot 1"

cst2 <- customStatTableDefaults(sce, 1)
cst2$Function <- "CUSTOM_DIFFEXP"
cst2$Arguments <- "assay logcounts"
cst2$ColumnSource <- "Reduced dimension plot 1"
cst2$RowSource <- "Row data plot 1"

app <- iSEE(
    sce, redDimArgs=reddim, rowDataArgs=rowdat, customDataArgs=cdp2, customStatArgs=cst2,
    initialPanels=DataFrame(
        Name=c(
            "Reduced dimension plot 1", "Row data plot 1",
            "Custom data plot 1", "Custom statistics table 1"),
        Width=c(4, 4, 4, 12)),
    customDataFun=list(CUSTOM_HEAT=CUSTOM_HEAT),
    customStatFun=list(CUSTOM_DIFFEXP=CUSTOM_DIFFEXP),
    customSendAll=TRUE
    )

4 Caching for greater efficiency

True R wizards can take advantage of the pass-by-reference activity of environments to cache results throughout the lifetime of the app. This can be used to avoid repeating time-consuming steps that are not affected by changes in certain parameters. For example, we could cache the output of runPCA to avoid repeating the PCA if only colour_by changes.

We demonstrate this functionality below using a function that computes log-fold changes for the selected features in one subset of samples compared to the others. For any given column selection, we compute log-fold changes for all features and cache the results in the caching environment. This allows us to avoid recomputing these values if only the row selection changes.

caching <- new.env()
CUSTOM_LFC <- function(se, rows, columns) {
    if (is.null(columns)) {
        return(data.frame(logFC=numeric(0)))
    }

    if (!identical(caching$columns, columns)) {
        caching$columns <- columns
        in.subset <- rowMeans(logcounts(sce)[,columns])
        out.subset <- rowMeans(logcounts(sce)[,setdiff(colnames(sce), columns)])
        caching$logFC <- setNames(in.subset - out.subset, rownames(sce))
    } 
       
    lfc <- caching$logFC
    if (!is.null(rows)) {
        out <- data.frame(logFC=lfc[rows], row.names=rows)
    } else {
        out <- data.frame(logFC=lfc, row.names=rownames(se))
    }
    out
}

We then set up an iSEE instance with three panels - a reduced dimension plot, a row data plot and a custom statistics table. This is much the same as described for the custom plot panel.

cst <- customStatTableDefaults(sce, 1)
cst$Function <- "CUSTOM_LFC"
cst$ColumnSource <- "Reduced dimension plot 1"
cst$RowSource <- "Row data plot 1"

app2 <- iSEE(sce, redDimArgs=reddim, rowDataArgs=rowdat, customStatArgs=cst, 
    initialPanels=DataFrame(Name=c("Reduced dimension plot 1", 
            "Row data plot 1", "Custom statistics table 1")),
    customStatFun=list(CUSTOM_LFC=CUSTOM_LFC))

Session Info

sessionInfo()
#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.9-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.9-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
#> [8] methods   base     
#> 
#> other attached packages:
#>  [1] scater_1.12.0               ggplot2_3.1.1              
#>  [3] scRNAseq_1.9.0              iSEE_1.4.0                 
#>  [5] SingleCellExperiment_1.6.0  SummarizedExperiment_1.14.0
#>  [7] DelayedArray_0.10.0         BiocParallel_1.18.0        
#>  [9] matrixStats_0.54.0          Biobase_2.44.0             
#> [11] GenomicRanges_1.36.0        GenomeInfoDb_1.20.0        
#> [13] IRanges_2.18.0              S4Vectors_0.22.0           
#> [15] BiocGenerics_0.30.0         BiocStyle_2.12.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] nlme_3.1-139             bitops_1.0-6            
#>  [3] bit64_0.9-7              tools_3.6.0             
#>  [5] irlba_2.3.3              R6_2.4.0                
#>  [7] DT_0.5                   vipor_0.4.5             
#>  [9] DBI_1.0.0                lazyeval_0.2.2          
#> [11] mgcv_1.8-28              colorspace_1.4-1        
#> [13] withr_2.1.2              gridExtra_2.3           
#> [15] tidyselect_0.2.5         bit_1.1-14              
#> [17] compiler_3.6.0           BiocNeighbors_1.2.0     
#> [19] shinyjs_1.0              colourpicker_1.0        
#> [21] bookdown_0.9             scales_1.0.0            
#> [23] stringr_1.4.0            digest_0.6.18           
#> [25] rmarkdown_1.12           rentrez_1.2.2           
#> [27] XVector_0.24.0           pkgconfig_2.0.2         
#> [29] htmltools_0.3.6          htmlwidgets_1.3         
#> [31] rlang_0.3.4              RSQLite_2.1.1           
#> [33] DelayedMatrixStats_1.6.0 shiny_1.3.2             
#> [35] jsonlite_1.6             dplyr_0.8.0.1           
#> [37] RCurl_1.95-4.12          magrittr_1.5            
#> [39] BiocSingular_1.0.0       GenomeInfoDbData_1.2.1  
#> [41] Matrix_1.2-17            ggbeeswarm_0.6.0        
#> [43] Rcpp_1.0.1               munsell_0.5.0           
#> [45] viridis_0.5.1            stringi_1.4.3           
#> [47] yaml_2.2.0               rintrojs_0.2.0          
#> [49] zlibbioc_1.30.0          Rtsne_0.15              
#> [51] plyr_1.8.4               grid_3.6.0              
#> [53] blob_1.1.1               promises_1.0.1          
#> [55] shinydashboard_0.7.1     crayon_1.3.4            
#> [57] miniUI_0.1.1.1           lattice_0.20-38         
#> [59] cowplot_0.9.4            splines_3.6.0           
#> [61] knitr_1.22               pillar_1.3.1            
#> [63] igraph_1.2.4.1           reshape2_1.4.3          
#> [65] XML_3.98-1.19            glue_1.3.1              
#> [67] evaluate_0.13            BiocManager_1.30.4      
#> [69] httpuv_1.5.1             gtable_0.3.0            
#> [71] purrr_0.3.2              assertthat_0.2.1        
#> [73] xfun_0.6                 rsvd_1.0.0              
#> [75] mime_0.6                 xtable_1.8-4            
#> [77] later_0.8.0              viridisLite_0.3.0       
#> [79] tibble_2.1.1             beeswarm_0.2.3          
#> [81] AnnotationDbi_1.46.0     memoise_1.1.0           
#> [83] shinyAce_0.3.3
# devtools::session_info()

References

McCarthy, D. J., K. R. Campbell, A. T. Lun, and Q. F. Wills. 2017. “Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R.” Bioinformatics 33 (8):1179–86.

Rue-Albrecht, K., F. Marini, C. Soneson, and A. T. L. Lun. 2018. “ISEE: Interactive Summarizedexperiment Explorer.” F1000Research 7 (June):741.

Deploying custom panels in the iSEE interface