Brings SummarizedExperiment to the tidyverse!
website: stemangiola.github.io/tidySummarizedExperiment/
Please also have a look at
tidySummarizedExperiment provides a bridge between Bioconductor SummarizedExperiment [@morgan2020summarized] and the tidyverse [@wickham2019welcome]. It creates an invisible layer that enables viewing the Bioconductor SummarizedExperiment object as a tidyverse tibble, and provides SummarizedExperiment-compatible dplyr, tidyr, ggplot and plotly functions. This allows users to get the best of both Bioconductor and tidyverse worlds.
| SummarizedExperiment-compatible Functions | Description | 
|---|---|
all | 
After all tidySummarizedExperiment is a SummarizedExperiment object, just better | 
| tidyverse Packages | Description | 
|---|---|
dplyr | 
Almost all dplyr APIs like for any tibble | 
tidyr | 
Almost all tidyr APIs like for any tibble | 
ggplot2 | 
ggplot like for any tibble | 
plotly | 
plot_ly like for any tibble | 
| Utilities | Description | 
|---|---|
as_tibble | 
Convert cell-wise information to a tbl_df | 
if (!requireNamespace("BiocManager", quietly=TRUE)) {
      install.packages("BiocManager")
  }
BiocManager::install("tidySummarizedExperiment")
From Github (development)
devtools::install_github("stemangiola/tidySummarizedExperiment")
Load libraries used in the examples.
library(ggplot2)
library(tidySummarizedExperiment)
tidySummarizedExperiment, the best of both worlds!This is a SummarizedExperiment object but it is evaluated as a tibble. So it is fully compatible both with SummarizedExperiment and tidyverse APIs.
pasilla_tidy <- tidySummarizedExperiment::pasilla 
It looks like a tibble
pasilla_tidy
## # A SummarizedExperiment-tibble abstraction: 102,193 × 5
## # [90mFeatures=14599 | Samples=7 | Assays=counts[0m
##    .feature    .sample counts condition type      
##    <chr>       <chr>    <int> <chr>     <chr>     
##  1 FBgn0000003 untrt1       0 untreated single_end
##  2 FBgn0000008 untrt1      92 untreated single_end
##  3 FBgn0000014 untrt1       5 untreated single_end
##  4 FBgn0000015 untrt1       0 untreated single_end
##  5 FBgn0000017 untrt1    4664 untreated single_end
##  6 FBgn0000018 untrt1     583 untreated single_end
##  7 FBgn0000022 untrt1       0 untreated single_end
##  8 FBgn0000024 untrt1      10 untreated single_end
##  9 FBgn0000028 untrt1       0 untreated single_end
## 10 FBgn0000032 untrt1    1446 untreated single_end
## # ℹ 40 more rows
But it is a SummarizedExperiment object after all
assays(pasilla_tidy)
## List of length 1
## names(1): counts
We can use tidyverse commands to explore the tidy SummarizedExperiment object.
We can use slice to choose rows by position, for example to choose the first row.
pasilla_tidy %>%
    slice(1)
## # A SummarizedExperiment-tibble abstraction: 1 × 5
## # [90mFeatures=1 | Samples=1 | Assays=counts[0m
##   .feature    .sample counts condition type      
##   <chr>       <chr>    <int> <chr>     <chr>     
## 1 FBgn0000003 untrt1       0 untreated single_end
We can use filter to choose rows by criteria.
pasilla_tidy %>%
    filter(condition == "untreated")
## # A SummarizedExperiment-tibble abstraction: 58,396 × 5
## # [90mFeatures=14599 | Samples=4 | Assays=counts[0m
##    .feature    .sample counts condition type      
##    <chr>       <chr>    <int> <chr>     <chr>     
##  1 FBgn0000003 untrt1       0 untreated single_end
##  2 FBgn0000008 untrt1      92 untreated single_end
##  3 FBgn0000014 untrt1       5 untreated single_end
##  4 FBgn0000015 untrt1       0 untreated single_end
##  5 FBgn0000017 untrt1    4664 untreated single_end
##  6 FBgn0000018 untrt1     583 untreated single_end
##  7 FBgn0000022 untrt1       0 untreated single_end
##  8 FBgn0000024 untrt1      10 untreated single_end
##  9 FBgn0000028 untrt1       0 untreated single_end
## 10 FBgn0000032 untrt1    1446 untreated single_end
## # ℹ 40 more rows
We can use select to choose columns.
pasilla_tidy %>%
    select(.sample)
## # A tibble: 102,193 × 1
##    .sample
##    <chr>  
##  1 untrt1 
##  2 untrt1 
##  3 untrt1 
##  4 untrt1 
##  5 untrt1 
##  6 untrt1 
##  7 untrt1 
##  8 untrt1 
##  9 untrt1 
## 10 untrt1 
## # ℹ 102,183 more rows
We can use count to count how many rows we have for each sample.
pasilla_tidy %>%
    count(.sample)
## # A tibble: 7 × 2
##   .sample     n
##   <chr>   <int>
## 1 trt1    14599
## 2 trt2    14599
## 3 trt3    14599
## 4 untrt1  14599
## 5 untrt2  14599
## 6 untrt3  14599
## 7 untrt4  14599
We can use distinct to see what distinct sample information we have.
pasilla_tidy %>%
    distinct(.sample, condition, type)
## # A tibble: 7 × 3
##   .sample condition type      
##   <chr>   <chr>     <chr>     
## 1 untrt1  untreated single_end
## 2 untrt2  untreated single_end
## 3 untrt3  untreated paired_end
## 4 untrt4  untreated paired_end
## 5 trt1    treated   single_end
## 6 trt2    treated   paired_end
## 7 trt3    treated   paired_end
We could use rename to rename a column. For example, to modify the type column name.
pasilla_tidy %>%
    rename(sequencing=type)
## # A SummarizedExperiment-tibble abstraction: 102,193 × 5
## # [90mFeatures=14599 | Samples=7 | Assays=counts[0m
##    .feature    .sample counts condition sequencing
##    <chr>       <chr>    <int> <chr>     <chr>     
##  1 FBgn0000003 untrt1       0 untreated single_end
##  2 FBgn0000008 untrt1      92 untreated single_end
##  3 FBgn0000014 untrt1       5 untreated single_end
##  4 FBgn0000015 untrt1       0 untreated single_end
##  5 FBgn0000017 untrt1    4664 untreated single_end
##  6 FBgn0000018 untrt1     583 untreated single_end
##  7 FBgn0000022 untrt1       0 untreated single_end
##  8 FBgn0000024 untrt1      10 untreated single_end
##  9 FBgn0000028 untrt1       0 untreated single_end
## 10 FBgn0000032 untrt1    1446 untreated single_end
## # ℹ 40 more rows
We could use mutate to create a column. For example, we could create a new type column that contains single
and paired instead of single_end and paired_end.
pasilla_tidy %>%
    mutate(type=gsub("_end", "", type))
## # A SummarizedExperiment-tibble abstraction: 102,193 × 5
## # [90mFeatures=14599 | Samples=7 | Assays=counts[0m
##    .feature    .sample counts condition type  
##    <chr>       <chr>    <int> <chr>     <chr> 
##  1 FBgn0000003 untrt1       0 untreated single
##  2 FBgn0000008 untrt1      92 untreated single
##  3 FBgn0000014 untrt1       5 untreated single
##  4 FBgn0000015 untrt1       0 untreated single
##  5 FBgn0000017 untrt1    4664 untreated single
##  6 FBgn0000018 untrt1     583 untreated single
##  7 FBgn0000022 untrt1       0 untreated single
##  8 FBgn0000024 untrt1      10 untreated single
##  9 FBgn0000028 untrt1       0 untreated single
## 10 FBgn0000032 untrt1    1446 untreated single
## # ℹ 40 more rows
We could use unite to combine multiple columns into a single column.
pasilla_tidy %>%
    unite("group", c(condition, type))
## # A SummarizedExperiment-tibble abstraction: 102,193 × 4
## # [90mFeatures=14599 | Samples=7 | Assays=counts[0m
##    .feature    .sample counts group               
##    <chr>       <chr>    <int> <chr>               
##  1 FBgn0000003 untrt1       0 untreated_single_end
##  2 FBgn0000008 untrt1      92 untreated_single_end
##  3 FBgn0000014 untrt1       5 untreated_single_end
##  4 FBgn0000015 untrt1       0 untreated_single_end
##  5 FBgn0000017 untrt1    4664 untreated_single_end
##  6 FBgn0000018 untrt1     583 untreated_single_end
##  7 FBgn0000022 untrt1       0 untreated_single_end
##  8 FBgn0000024 untrt1      10 untreated_single_end
##  9 FBgn0000028 untrt1       0 untreated_single_end
## 10 FBgn0000032 untrt1    1446 untreated_single_end
## # ℹ 40 more rows
We can also combine commands with the tidyverse pipe %>%.
For example, we could combine group_by and summarise to get the total counts for each sample.
pasilla_tidy %>%
    group_by(.sample) %>%
    summarise(total_counts=sum(counts))
## # A tibble: 7 × 2
##   .sample total_counts
##   <chr>          <int>
## 1 trt1        18670279
## 2 trt2         9571826
## 3 trt3        10343856
## 4 untrt1      13972512
## 5 untrt2      21911438
## 6 untrt3       8358426
## 7 untrt4       9841335
We could combine group_by, mutate and filter to get the transcripts with mean count > 0.
pasilla_tidy %>%
    group_by(.feature) %>%
    mutate(mean_count=mean(counts)) %>%
    filter(mean_count > 0)
## # A tibble: 86,513 × 6
## # Groups:   .feature [12,359]
##    .feature    .sample counts condition type       mean_count
##    <chr>       <chr>    <int> <chr>     <chr>           <dbl>
##  1 FBgn0000003 untrt1       0 untreated single_end      0.143
##  2 FBgn0000008 untrt1      92 untreated single_end     99.6  
##  3 FBgn0000014 untrt1       5 untreated single_end      1.43 
##  4 FBgn0000015 untrt1       0 untreated single_end      0.857
##  5 FBgn0000017 untrt1    4664 untreated single_end   4672.   
##  6 FBgn0000018 untrt1     583 untreated single_end    461.   
##  7 FBgn0000022 untrt1       0 untreated single_end      0.143
##  8 FBgn0000024 untrt1      10 untreated single_end      7    
##  9 FBgn0000028 untrt1       0 untreated single_end      0.429
## 10 FBgn0000032 untrt1    1446 untreated single_end   1085.   
## # ℹ 86,503 more rows
my_theme <-
    list(
        scale_fill_brewer(palette="Set1"),
        scale_color_brewer(palette="Set1"),
        theme_bw() +
            theme(
                panel.border=element_blank(),
                axis.line=element_line(),
                panel.grid.major=element_line(size=0.2),
                panel.grid.minor=element_line(size=0.1),
                text=element_text(size=12),
                legend.position="bottom",
                aspect.ratio=1,
                strip.background=element_blank(),
                axis.title.x=element_text(margin=margin(t=10, r=10, b=10, l=10)),
                axis.title.y=element_text(margin=margin(t=10, r=10, b=10, l=10))
            )
    )
We can treat pasilla_tidy as a normal tibble for plotting.
Here we plot the distribution of counts per sample.
pasilla_tidy %>%
    tidySummarizedExperiment::ggplot(aes(counts + 1, group=.sample, color=`type`)) +
    geom_density() +
    scale_x_log10() +
    my_theme
sessionInfo()
## R version 4.3.0 RC (2023-04-13 r84269)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] tidySummarizedExperiment_1.10.0 SummarizedExperiment_1.30.0    
##  [3] Biobase_2.60.0                  GenomicRanges_1.52.0           
##  [5] GenomeInfoDb_1.36.0             IRanges_2.34.0                 
##  [7] S4Vectors_0.38.0                BiocGenerics_0.46.0            
##  [9] MatrixGenerics_1.12.0           matrixStats_0.63.0             
## [11] ggplot2_3.4.2                   knitr_1.42                     
## 
## loaded via a namespace (and not attached):
##  [1] plotly_4.10.1           utf8_1.2.3              generics_0.1.3         
##  [4] tidyr_1.3.0             bitops_1.0-7            stringi_1.7.12         
##  [7] lattice_0.21-8          digest_0.6.31           magrittr_2.0.3         
## [10] RColorBrewer_1.1-3      evaluate_0.20           grid_4.3.0             
## [13] fastmap_1.1.1           jsonlite_1.8.4          Matrix_1.5-4           
## [16] httr_1.4.5              purrr_1.0.1             fansi_1.0.4            
## [19] viridisLite_0.4.1       scales_1.2.1            lazyeval_0.2.2         
## [22] cli_3.6.1               rlang_1.1.0             XVector_0.40.0         
## [25] ellipsis_0.3.2          munsell_0.5.0           withr_2.5.0            
## [28] DelayedArray_0.26.0     tools_4.3.0             dplyr_1.1.2            
## [31] colorspace_2.1-0        GenomeInfoDbData_1.2.10 vctrs_0.6.2            
## [34] R6_2.5.1                lifecycle_1.0.3         stringr_1.5.0          
## [37] zlibbioc_1.46.0         htmlwidgets_1.6.2       pkgconfig_2.0.3        
## [40] pillar_1.9.0            gtable_0.3.3            glue_1.6.2             
## [43] data.table_1.14.8       highr_0.10              xfun_0.39              
## [46] tibble_3.2.1            tidyselect_1.2.0        farver_2.1.1           
## [49] htmltools_0.5.5         labeling_0.4.2          compiler_4.3.0         
## [52] RCurl_1.98-1.12