Interactive exploration of design matrices with ExploreModelMatrix

Charlotte Soneson, Federico Marini, Michael I Love, Florian Geier and Michael B Stadler

2021-05-19

Introduction

ExploreModelMatrix is an R package for visualizing design matrices generated by the model.matrix() R function. Provided with a sample information table and a design formula, the ExploreModelMatrix() function launches a shiny app where the user can explore the fitted values (in terms of the model coefficients) for each combination of predictor values. In addition, the app allows the user to interactively change the design formula and the reference levels of factor variables as well as drop unwanted columns from the design matrix, in order to explore the effect on the composition of the fitted values. Note that ExploreModelMatrix is not intended to be used to determine which design formula that should be used for analyzing a data set. Instead, its purpose is to assist in the interpretation of the coefficients in a given model.

In addition to the interactive visualization, ExploreModelMatrix also provides a function, VisualizeDesign(), for generating static visualizations.

In this vignette, we illustrate how the package can be used by showing examples of applying the functions to various experimental design setups. Many examples are taken from questions raised at the Bioconductor support site.

library(ExploreModelMatrix)

Interface

The ExploreModelMatrix() function opens a graphical interface where the user can interactively explore the provided design. This section gives an overview of what is shown in the graphical interface. A step-by-step tour is also available by clicking on the icon in the top right of the application.

The sidebar contains the input controls. The design formula of interest is typed into the Design formula text box, and must start with the ~ symbol. It can be changed interactively while using the application. If the ExploreModelMatrix() function is called with sampleData=NULL, there will also be an input control allowing a tab-delimited text file with sample information to be uploaded to the application. Finally, the package contains a collection of example designs, suitable for teaching, exploration and illustration. The remaining input controls allow the user to change the reference levels of the factor variables, to drop specific columns from the design matrix, and to change the display settings of the plots.

The first row of the main body of the application displays the fitted values (expressed in terms of the model coefficients) for each combination of predictor values, in both figure and table form. In the next row, the full provided sample table as well as a summary are provided, and the third row displays the full design matrix as well as its rank. Panels below this display the pseudoinverse of the design matrix, a visualization of variance inflation factors, a co-occurrence matrix and the correlation among the model coefficients.

Examples

This section contains a number of examples of real designs, and shows how they can be explored with ExploreModelMatrix. For each example, the sample information table is printed out. Next, the VisualizeDesign() function is called to generate a static plot of the fitted values, in terms of the model coefficients. This is the same plot that is displayed in the top left panel of the interactive interface generated by ExploreModelMatrix(). We also provide the code for generating and (for interactive sessions) opening the interactive application with ExploreModelMatrix().

Example 1

This example illustrates a two-factor design (genotype and treatment), where the effect of the genotype and treatment are assumed to be additive. For each genotype, two treated and two control individuals are studied. The design formula is ~ genotype + treatment, reflecting the assumption of additivity between the two predictors. The figure generated by the VisualizeDesign() function, displayed below, shows the value of the linear predictor (or, for a regular linear model, the fitted values) for observations with a given combination of predictor values, in terms of the model coefficients. This can be useful in order to set up suitable contrasts. For example, we can see that testing the null hypothesis that the genotypeB coefficient is zero would correspond to comparing observations with genotype B and those with genotype A.

Example 2

From https://support.bioconductor.org/p/121132/. In this example we are considering a set of patients, each being either Resistant or Sensitive to a treatment, and each studied before (pre) and after (post) treatment. Patients have been renumbered within each response group, and patients with only pre- or post-measurements are removed. We use the design ~ Response + Response:ind.n + Response:Treatment. As can be seen from the visualization below, this lets us easily compare e.g. post- vs pre-treatment observations within the Sensitive group (via the ResponseSensitive.Treatmentpre coefficient).

The design above doesn’t allow comparison between Resistant and Sensitive patients while accounting for the patient effect, since the patient is nested within the response group. If we choose to ignore the patient effect, we can fit a factorial model with the design formula ~ Treatment + Response, as illustrated below.

Example 3

From https://support.bioconductor.org/p/80408/. Here we are considering mice from two conditions (ctrl/ko), each measured with and without treatment with a drug (plus/minus). We use the design ~ 0 + batch + condition (where batch corresponds to the mouse ID), and drop the column corresponding to conditionko_minus to get a full-rank design matrix.

Session info

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ExploreModelMatrix_1.4.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.1     xfun_0.23            bslib_0.2.5.1       
#>  [4] shinyjs_2.0.0        purrr_0.3.4          colorspace_2.0-1    
#>  [7] vctrs_0.3.8          generics_0.1.0       htmltools_0.5.1.1   
#> [10] stats4_4.1.0         yaml_2.2.1           rintrojs_0.2.2      
#> [13] utf8_1.2.1           rlang_0.4.11         jquerylib_0.1.4     
#> [16] pillar_1.6.1         later_1.2.0          glue_1.4.2          
#> [19] DBI_1.1.1            BiocGenerics_0.38.0  lifecycle_1.0.0     
#> [22] stringr_1.4.0        munsell_0.5.0        gtable_0.3.0        
#> [25] htmlwidgets_1.5.3    evaluate_0.14        knitr_1.33          
#> [28] fastmap_1.1.0        crosstalk_1.1.1      httpuv_1.6.1        
#> [31] parallel_4.1.0       fansi_0.4.2          highr_0.9           
#> [34] Rcpp_1.0.6           xtable_1.8-4         scales_1.1.1        
#> [37] promises_1.2.0.1     DT_0.18              limma_3.48.0        
#> [40] S4Vectors_0.30.0     jsonlite_1.7.2       farver_2.1.0        
#> [43] mime_0.10            ggplot2_3.3.3        digest_0.6.27       
#> [46] stringi_1.6.2        dplyr_1.0.6          shiny_1.6.0         
#> [49] cowplot_1.1.1        grid_4.1.0           tools_4.1.0         
#> [52] magrittr_2.0.1       sass_0.4.0           tibble_3.1.2        
#> [55] tidyr_1.1.3          crayon_1.4.1         pkgconfig_2.0.3     
#> [58] MASS_7.3-54          ellipsis_0.3.2       shinydashboard_0.7.1
#> [61] assertthat_0.2.1     rmarkdown_2.8        R6_2.5.0            
#> [64] compiler_4.1.0