1 Expression Atlas resources at EMBL-EBI

The EMBL-EBI Expression Atlas and Single Cell Expression Atlas resources consist of carefully selected high-quality datasets from ArrayExpress and other data sources that have been manually curated and re-analyzed via the Functional Genomics Team analysis pipelines (George et al. 2023). Since October 2022, ArrayExpress is a collection of functional genomics data in BioStudies (https://www.ebi.ac.uk/biostudies).

The Expression Atlas website allows users to search these datasets for genes and/or experimental conditions, to discover which genes are expressed in which tissues, cell types, developmental stages, and hundreds of other experimental conditions.

The ExpressionAtlas R package allows you to search and download pre-packaged data from Expression Atlas inside an R session. Raw counts are provided for RNA-seq datasets, while normalized intensities are available for microarray experiments. Protocols describing how the data was generated are contained within the downloaded R objects, with more detailed information available on the Expression Atlas website. Sample annotations are also included in the R object.

Single-cell datasets are downloaded in annData format, and loaded into a SingleCellExperiment object for access and customised visualisation.

2 Searching and downloading Expression Atlas data

The package communicates with the EBI RESTful Web Services API and Expression Atlas API to enable efficient data searching and retrieval.

2.1 Searching

You can search for experiments in Atlas using the searchAtlasExperiments() function. This function returns a DataFrame (see S4Vectors) containing the results of your search. The first argument to searchAtlasExperiments() should be a character vector of sample properties, e.g. biological sample attributes, experimental treatments nad/or species. You may also optionally provide a secondary filter yo your query to limit your search to, as a second argument.

invisible(lapply(
  c("ExpressionAtlas", "SummarizedExperiment", "SingleCellExperiment", "ggplot2"),
  function(pkg) suppressMessages(library(pkg, character.only = TRUE))
))
atlas_res <- searchAtlasExperiments( query = "salt", secondaryFilter = "oryza" )
# Searching for experiments matching your query ...
# Query successful.
# Found 4 experiments matching your query.

We will proceed with a subset of three accessions:

atlasRes
## DataFrame with 3 rows and 4 columns
##      Accession                Species                   Type
##    <character>            <character>            <character>
## 1 E-GEOD-11175 Oryza sativa Japonic.. transcription profil..
## 2  E-MTAB-1625 Oryza sativa Japonic..  RNA-seq of coding RNA
## 3  E-MTAB-1624 Oryza sativa Japonic.. transcription profil..
##                    Title
##              <character>
## 1 Transcription profil..
## 2 RNA-seq of coding RN..
## 3 Transcription profil..

The Accession column contains the ArrayExpress accession of each dataset – the unique identifier assigned to it. The species, experiment type (e.g. microarray or RNA-seq), and title of each dataset are also listed.

2.2 Detailed search through experiment metadata

searchAtlasExperiments( query = "lung" , detailed = FALSE )
## DataFrame with 110 rows and 4 columns
##       Accession      Species         Type                  Title
##     <character>  <character>  <character>            <character>
## 1   E-MTAB-4729 Homo sapiens     Baseline RNA-seq analysis of ..
## 2   E-MTAB-9372 Homo sapiens     Baseline Bulk RNA-sequencing ..
## 3    E-MTAB-599 Mus musculus     Baseline RNA-seq of mouse DBA..
## 4   E-MTAB-4644 Mus musculus     Baseline Transcription profil..
## 5     E-PROT-77 Mus musculus     Baseline An atlas of the agei..
## ...         ...          ...          ...                    ...
## 106 E-MTAB-6040 Homo sapiens Differential Gene expression prof..
## 107 E-MTAB-6074 Mus musculus Differential Transcription profil..
## 108 E-MTAB-6706 Mus musculus Differential Expression profiling..
## 109 E-MTAB-9770 Mus musculus Differential Microarray profiling..
## 110   E-TABM-15 Homo sapiens Differential Transcription profil..

The detailed argument can be set to TRUE to return a more detailed search through Atlas experiment metadata information:

searchAtlasExperiments( query = "lung" , detailed = TRUE )
## DataFrame with 243 rows and 4 columns
##        Accession      Species         Type                  Title
##      <character>  <character>  <character>            <character>
## 1    E-MTAB-2706 Homo sapiens     Baseline RNA-seq of 675 commo..
## 2    E-MTAB-2770 Homo sapiens     Baseline RNA-seq of 1019 huma..
## 3    E-MTAB-3983 Homo sapiens     Baseline Sanger Genomics of D..
## 4   E-GEOD-26284 Homo sapiens     Baseline RNA-seq of long poly..
## 5    E-MTAB-4729 Homo sapiens     Baseline RNA-seq analysis of ..
## ...          ...          ...          ...                    ...
## 239  E-MTAB-6706 Mus musculus Differential Expression profiling..
## 240   E-MTAB-835 Mus musculus Differential Transcription profil..
## 241  E-MTAB-9770 Mus musculus Differential Microarray profiling..
## 242    E-TABM-15 Homo sapiens Differential Transcription profil..
## 243 E-GEOD-13316 Homo sapiens Differential miRNA expression dat..

It can work with a secondary filter as well:

searchAtlasExperiments( query = "lung", secondaryFilter = "human", detailed = FALSE )
## DataFrame with 25 rows and 4 columns
##         Accession      Species         Type                  Title
##       <character>  <character>  <character>            <character>
## 1     E-MTAB-9372 Homo sapiens     Baseline Bulk RNA-sequencing ..
## 2   E-GEOD-147507 Homo sapiens Differential Transcriptional resp..
## 3    E-GEOD-54846 Homo sapiens Differential RNA-seq from control..
## 4     E-MTAB-8572 Homo sapiens Differential RNA-seq of human lun..
## 5    E-GEOD-13309 Homo sapiens Differential Transcription profil..
## ...           ...          ...          ...                    ...
## 21    E-MEXP-2115 Homo sapiens Differential Transcription profil..
## 22     E-MEXP-231 Homo sapiens Differential Transcription profil..
## 23    E-MTAB-1900 Homo sapiens Differential Transcription profil..
## 24    E-MTAB-5627 Homo sapiens Differential Microarray analysis ..
## 25      E-TABM-15 Homo sapiens Differential Transcription profil..
searchAtlasExperiments( query = "lung", secondaryFilter = "human", detailed = TRUE )
## DataFrame with 32 rows and 4 columns
##         Accession      Species         Type                  Title
##       <character>  <character>  <character>            <character>
## 1     E-MTAB-2706 Homo sapiens     Baseline RNA-seq of 675 commo..
## 2     E-MTAB-2770 Homo sapiens     Baseline RNA-seq of 1019 huma..
## 3    E-GEOD-26284 Homo sapiens     Baseline RNA-seq of long poly..
## 4     E-MTAB-9372 Homo sapiens     Baseline Bulk RNA-sequencing ..
## 5   E-GEOD-147507 Homo sapiens Differential Transcriptional resp..
## ...           ...          ...          ...                    ...
## 28    E-MEXP-2115 Homo sapiens Differential Transcription profil..
## 29     E-MEXP-231 Homo sapiens Differential Transcription profil..
## 30    E-MTAB-1900 Homo sapiens Differential Transcription profil..
## 31    E-MTAB-5627 Homo sapiens Differential Microarray analysis ..
## 32      E-TABM-15 Homo sapiens Differential Transcription profil..

As we can see, a detailed search returns a DataFrame with more Atlas studies.

2.3 Downloading the data

To download the data for any/all of the experiments in your results, you can use the function getAtlasData(). This function accepts a vector of ArrayExpress accessions. The data is downloaded into a SimpleList object (see package S4Vectors), with one entry per experiment, listed by accession.

For example, to download all the datasets in your results:

allExps <- getAtlasData( atlasRes$Accession )
# Downloading Expression Atlas experiment summary from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-GEOD-11175/E-GEOD-11175-atlasExperimentSummary.Rdata
# Successfully downloaded experiment summary object for E-GEOD-11175
# Downloading Expression Atlas experiment summary from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1625/E-MTAB-1625-atlasExperimentSummary.Rdata
# Successfully downloaded experiment summary object for E-MTAB-1625
# Downloading Expression Atlas experiment summary from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1624/E-MTAB-1624-atlasExperimentSummary.Rdata
# Successfully downloaded experiment summary object for E-MTAB-1624
allExps
## List of length 3
## names(3): E-GEOD-11175 E-MTAB-1625 E-MTAB-1624

To only download the RNA-seq experiment(s):

rnaseqExps <- getAtlasData( 
    atlasRes$Accession[ 
        grep( 
            "rna-seq", 
            atlasRes$Type, 
            ignore.case = TRUE 
        ) 
    ] 
)
# Downloading Expression Atlas experiment summary from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1625/E-MTAB-1625-atlasExperimentSummary.Rdata
# Successfully downloaded experiment summary object for E-MTAB-1625
rnaseqExps
## List of length 1
## names(1): E-MTAB-1625

To access an experiment summary, use the accession:

mtab1624 <- allExps[[ "E-MTAB-1624" ]]
mtab1625 <- allExps[[ "E-MTAB-1625" ]]

Each dataset is also represented by a SimpleList, with one entry per platform used in the experiment. For RNA-seq data there will only ever be one entry, named rnaseq. For microarray data, there is one entry per array design used, listed by ArrayExpress array design accession (see below).

2.3.1 RNA-seq experiment summaries

Following on from above, mtab1625 now contains a SimpleList object with a single entry named rnaseq. For RNA-seq experiments, this entry is a RangedSummarizedExperiment object (see package SummarizedExperiment).

sumexp <- mtab1625$rnaseq
sumexp
## class: RangedSummarizedExperiment 
## dim: 10 18 
## metadata(4): pipeline filtering mapping quantification
## assays(1): counts
## rownames(10): EPlOSAG00000000001 EPlOSAG00000000002 ...
##   EPlOSAG00000000009 EPlOSAG00000000010
## rowData names(0):
## colnames(18): ERR266221 ERR266222 ... ERR266237 ERR266238
## colData names(9): AtlasAssayGroup organism ... growth_condition
##   sampling_time

The matrix of raw counts for this experiment is stored in the assays slot:

head( assays( sumexp )$counts )
##                    ERR266221 ERR266222 ERR266223 ERR266224 ERR266225 ERR266226
## EPlOSAG00000000001         0         0         0         0         1         0
## EPlOSAG00000000002         0         0         0         0         0         0
## EPlOSAG00000000003         0         0         0         0         0         0
## EPlOSAG00000000004         0         0         0         0         0         0
## EPlOSAG00000000005         0         0         0         0         0         0
## EPlOSAG00000000006         0         0         0         0         0         0
##                    ERR266227 ERR266228 ERR266229 ERR266230 ERR266231 ERR266232
## EPlOSAG00000000001         0         0         0         1         0         1
## EPlOSAG00000000002         0         0         0         0         0         0
## EPlOSAG00000000003         0         0         0         0         0         0
## EPlOSAG00000000004         0         0         0         0         0         0
## EPlOSAG00000000005         0         0         0         0         0         0
## EPlOSAG00000000006         0         0         0         0         0         0
##                    ERR266233 ERR266234 ERR266235 ERR266236 ERR266237 ERR266238
## EPlOSAG00000000001         0         1         1         0         0         0
## EPlOSAG00000000002         0         0         0         0         0         0
## EPlOSAG00000000003         0         0         0         0         0         0
## EPlOSAG00000000004         0         0         0         0         0         0
## EPlOSAG00000000005         0         0         0         0         0         0
## EPlOSAG00000000006         0         0         0         0         0         0

The sample annotations can be found in the colData slot:

colData( sumexp )
## DataFrame with 18 rows and 9 columns
##           AtlasAssayGroup               organism    cultivar
##               <character>            <character> <character>
## ERR266221              g5 Oryza sativa Japonic..  Nipponbare
## ERR266222              g2 Oryza sativa Japonic..  Nipponbare
## ERR266223              g2 Oryza sativa Japonic..  Nipponbare
## ERR266224              g5 Oryza sativa Japonic..  Nipponbare
## ERR266225              g3 Oryza sativa Japonic..  Nipponbare
## ...                   ...                    ...         ...
## ERR266234              g3 Oryza sativa Japonic..  Nipponbare
## ERR266235              g4 Oryza sativa Japonic..  Nipponbare
## ERR266236              g4 Oryza sativa Japonic..  Nipponbare
## ERR266237              g4 Oryza sativa Japonic..  Nipponbare
## ERR266238              g6 Oryza sativa Japonic..  Nipponbare
##              developmental_stage         age   time_unit          organism_part
##                      <character> <character> <character>            <character>
## ERR266221 seedling, two leaves..           2        week shoot axis, vascular..
## ERR266222 seedling, two leaves..           2        week shoot axis, vascular..
## ERR266223 seedling, two leaves..           2        week shoot axis, vascular..
## ERR266224 seedling, two leaves..           2        week shoot axis, vascular..
## ERR266225 seedling, two leaves..           2        week shoot axis, vascular..
## ...                          ...         ...         ...                    ...
## ERR266234 seedling, two leaves..           2        week shoot axis, vascular..
## ERR266235 seedling, two leaves..           2        week shoot axis, vascular..
## ERR266236 seedling, two leaves..           2        week shoot axis, vascular..
## ERR266237 seedling, two leaves..           2        week shoot axis, vascular..
## ERR266238 seedling, two leaves..           2        week shoot axis, vascular..
##                 growth_condition sampling_time
##                      <character>   <character>
## ERR266221 300 millimolar sodiu..             5
## ERR266222        normal watering             5
## ERR266223        normal watering             5
## ERR266224 300 millimolar sodiu..             5
## ERR266225        normal watering            24
## ...                          ...           ...
## ERR266234        normal watering            24
## ERR266235 300 millimolar sodiu..             1
## ERR266236 300 millimolar sodiu..             1
## ERR266237 300 millimolar sodiu..             1
## ERR266238 300 millimolar sodiu..            24

Information describing how the raw data files were processed to obtain the raw counts matrix are found in the metadata slot:

metadata( sumexp )
## $pipeline
## [1] "iRAP version 0.7.0p1 (http://nunofonseca.github.io/irap/)"
## 
## $filtering
## [1] "Discard reads below minimum quality threshold"                                 
## [2] "Check of bacterial contamination; discard offending reads"                     
## [3] "Discard reads with common uncalled characters (e.g. N)"                        
## [4] "Remove reads from pair-end libraries that were orphaned by filtering steps 1-3"
## 
## $mapping
## [1] "Against genome reference (Ensembl Plants release: 26) tophat2 version: 2.0.12"
## 
## $quantification
## [1] "htseq2 version: 0.6.1p1"

2.3.2 Single-channel microarray experiments

Data from a single-channel microarray experiment, e.g. E-MTAB-1624, is represented as one or more ExpressionSet object(s) in the SimpleList that is downloaded. ExpressionSet objects are indexed by the ArrayExpress accession(s) of the microarray design(s) used in the original experiment.

names( mtab1624 )
## [1] "A-AFFY-126"
affy126data <- mtab1624[[ "A-AFFY-126" ]]
affy126data
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 10 features, 18 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: nippon_control_1hr_rep1 nippon_control_1hr_rep2 ...
##     nippon_salt_5hr_rep3 (18 total)
##   varLabels: AtlasAssayGroup organism ... sampling_time (9 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: AFFX-BioB-3_at AFFX-BioB-5_at ... AFFX-DapX-3_at (10
##     total)
##   fvarLabels: probeSets
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:

The matrix of normalized intensity values is in the assayData slot:

head( exprs( affy126data ) )
##                 nippon_control_1hr_rep1 nippon_control_1hr_rep2
## AFFX-BioB-3_at                 7.869421                8.365278
## AFFX-BioB-5_at                 7.702652                8.020915
## AFFX-BioB-M_at                 7.652985                8.156907
## AFFX-BioC-3_at                 9.219287                9.556873
## AFFX-BioC-5_at                 8.881807                9.186310
## AFFX-BioDn-3_at               11.617725               11.877930
##                 nippon_control_1hr_rep3 nippon_control_24hr_rep1
## AFFX-BioB-3_at                 8.637034                 8.403105
## AFFX-BioB-5_at                 8.407186                 8.150995
## AFFX-BioB-M_at                 8.348117                 7.998242
## AFFX-BioC-3_at                 9.895652                 9.598754
## AFFX-BioC-5_at                 9.553833                 9.354739
## AFFX-BioDn-3_at               12.193324                11.861900
##                 nippon_control_24hr_rep2 nippon_control_24hr_rep3
## AFFX-BioB-3_at                  8.678257                 8.456243
## AFFX-BioB-5_at                  8.413489                 8.227663
## AFFX-BioB-M_at                  8.358092                 8.111307
## AFFX-BioC-3_at                  9.872856                 9.699042
## AFFX-BioC-5_at                  9.594959                 9.383014
## AFFX-BioDn-3_at                12.077460                11.959052
##                 nippon_control_5hr_rep1 nippon_control_5hr_rep2
## AFFX-BioB-3_at                 8.348849                8.643520
## AFFX-BioB-5_at                 8.129436                8.374279
## AFFX-BioB-M_at                 7.978514                8.285401
## AFFX-BioC-3_at                 9.588422                9.828320
## AFFX-BioC-5_at                 9.210903                9.512925
## AFFX-BioDn-3_at               11.853478               12.043559
##                 nippon_control_5hr_rep3 nippon_salt_1hr_rep1
## AFFX-BioB-3_at                 8.401530             8.331911
## AFFX-BioB-5_at                 8.193307             8.015213
## AFFX-BioB-M_at                 8.046037             7.944433
## AFFX-BioC-3_at                 9.685030             9.509499
## AFFX-BioC-5_at                 9.379879             9.194149
## AFFX-BioDn-3_at               11.952693            11.800154
##                 nippon_salt_1hr_rep2 nippon_salt_1hr_rep3 nippon_salt_24hr_rep1
## AFFX-BioB-3_at              8.463545             8.901247              8.363449
## AFFX-BioB-5_at              8.290420             8.533720              8.112024
## AFFX-BioB-M_at              8.139875             8.462567              8.115197
## AFFX-BioC-3_at              9.676649             9.950136              9.549797
## AFFX-BioC-5_at              9.343052             9.714590              9.276916
## AFFX-BioDn-3_at            12.043509            12.263983             11.866490
##                 nippon_salt_24hr_rep2 nippon_salt_24hr_rep3
## AFFX-BioB-3_at               8.185702              8.586542
## AFFX-BioB-5_at               7.828596              8.207703
## AFFX-BioB-M_at               7.775437              8.231843
## AFFX-BioC-3_at               9.300713              9.624600
## AFFX-BioC-5_at               8.916605              9.371714
## AFFX-BioDn-3_at             11.647577             11.908544
##                 nippon_salt_5hr_rep1 nippon_salt_5hr_rep2 nippon_salt_5hr_rep3
## AFFX-BioB-3_at              8.562492             8.366250             8.499076
## AFFX-BioB-5_at              8.307241             8.275820             8.155382
## AFFX-BioB-M_at              8.078984             8.114156             8.115064
## AFFX-BioC-3_at              9.728441             9.632023             9.531450
## AFFX-BioC-5_at              9.430824             9.358933             9.260982
## AFFX-BioDn-3_at            12.029436            11.832181            11.987857

The sample annotations are in the phenoData slot:

pData( affy126data )
##                          AtlasAssayGroup                    organism   cultivar
## nippon_control_1hr_rep1               g1 Oryza sativa Japonica Group Nipponbare
## nippon_control_1hr_rep2               g1 Oryza sativa Japonica Group Nipponbare
## nippon_control_1hr_rep3               g1 Oryza sativa Japonica Group Nipponbare
## nippon_control_24hr_rep1              g3 Oryza sativa Japonica Group Nipponbare
## nippon_control_24hr_rep2              g3 Oryza sativa Japonica Group Nipponbare
## nippon_control_24hr_rep3              g3 Oryza sativa Japonica Group Nipponbare
## nippon_control_5hr_rep1               g2 Oryza sativa Japonica Group Nipponbare
## nippon_control_5hr_rep2               g2 Oryza sativa Japonica Group Nipponbare
## nippon_control_5hr_rep3               g2 Oryza sativa Japonica Group Nipponbare
## nippon_salt_1hr_rep1                  g4 Oryza sativa Japonica Group Nipponbare
## nippon_salt_1hr_rep2                  g4 Oryza sativa Japonica Group Nipponbare
## nippon_salt_1hr_rep3                  g4 Oryza sativa Japonica Group Nipponbare
## nippon_salt_24hr_rep1                 g6 Oryza sativa Japonica Group Nipponbare
## nippon_salt_24hr_rep2                 g6 Oryza sativa Japonica Group Nipponbare
## nippon_salt_24hr_rep3                 g6 Oryza sativa Japonica Group Nipponbare
## nippon_salt_5hr_rep1                  g5 Oryza sativa Japonica Group Nipponbare
## nippon_salt_5hr_rep2                  g5 Oryza sativa Japonica Group Nipponbare
## nippon_salt_5hr_rep3                  g5 Oryza sativa Japonica Group Nipponbare
##                                                         developmental_stage age
## nippon_control_1hr_rep1  seedling, two leaves visible, three leaves visible   2
## nippon_control_1hr_rep2  seedling, two leaves visible, three leaves visible   2
## nippon_control_1hr_rep3  seedling, two leaves visible, three leaves visible   2
## nippon_control_24hr_rep1 seedling, two leaves visible, three leaves visible   2
## nippon_control_24hr_rep2 seedling, two leaves visible, three leaves visible   2
## nippon_control_24hr_rep3 seedling, two leaves visible, three leaves visible   2
## nippon_control_5hr_rep1  seedling, two leaves visible, three leaves visible   2
## nippon_control_5hr_rep2  seedling, two leaves visible, three leaves visible   2
## nippon_control_5hr_rep3  seedling, two leaves visible, three leaves visible   2
## nippon_salt_1hr_rep1     seedling, two leaves visible, three leaves visible   2
## nippon_salt_1hr_rep2     seedling, two leaves visible, three leaves visible   2
## nippon_salt_1hr_rep3     seedling, two leaves visible, three leaves visible   2
## nippon_salt_24hr_rep1    seedling, two leaves visible, three leaves visible   2
## nippon_salt_24hr_rep2    seedling, two leaves visible, three leaves visible   2
## nippon_salt_24hr_rep3    seedling, two leaves visible, three leaves visible   2
## nippon_salt_5hr_rep1     seedling, two leaves visible, three leaves visible   2
## nippon_salt_5hr_rep2     seedling, two leaves visible, three leaves visible   2
## nippon_salt_5hr_rep3     seedling, two leaves visible, three leaves visible   2
##                          time_unit             organism_part
## nippon_control_1hr_rep1       week shoot axis, vascular leaf
## nippon_control_1hr_rep2       week shoot axis, vascular leaf
## nippon_control_1hr_rep3       week shoot axis, vascular leaf
## nippon_control_24hr_rep1      week shoot axis, vascular leaf
## nippon_control_24hr_rep2      week shoot axis, vascular leaf
## nippon_control_24hr_rep3      week shoot axis, vascular leaf
## nippon_control_5hr_rep1       week shoot axis, vascular leaf
## nippon_control_5hr_rep2       week shoot axis, vascular leaf
## nippon_control_5hr_rep3       week shoot axis, vascular leaf
## nippon_salt_1hr_rep1          week shoot axis, vascular leaf
## nippon_salt_1hr_rep2          week shoot axis, vascular leaf
## nippon_salt_1hr_rep3          week shoot axis, vascular leaf
## nippon_salt_24hr_rep1         week shoot axis, vascular leaf
## nippon_salt_24hr_rep2         week shoot axis, vascular leaf
## nippon_salt_24hr_rep3         week shoot axis, vascular leaf
## nippon_salt_5hr_rep1          week shoot axis, vascular leaf
## nippon_salt_5hr_rep2          week shoot axis, vascular leaf
## nippon_salt_5hr_rep3          week shoot axis, vascular leaf
##                                        growth_condition sampling_time
## nippon_control_1hr_rep1                 normal watering             1
## nippon_control_1hr_rep2                 normal watering             1
## nippon_control_1hr_rep3                 normal watering             1
## nippon_control_24hr_rep1                normal watering            24
## nippon_control_24hr_rep2                normal watering            24
## nippon_control_24hr_rep3                normal watering            24
## nippon_control_5hr_rep1                 normal watering             5
## nippon_control_5hr_rep2                 normal watering             5
## nippon_control_5hr_rep3                 normal watering             5
## nippon_salt_1hr_rep1     300 millimolar sodium chloride             1
## nippon_salt_1hr_rep2     300 millimolar sodium chloride             1
## nippon_salt_1hr_rep3     300 millimolar sodium chloride             1
## nippon_salt_24hr_rep1    300 millimolar sodium chloride            24
## nippon_salt_24hr_rep2    300 millimolar sodium chloride            24
## nippon_salt_24hr_rep3    300 millimolar sodium chloride            24
## nippon_salt_5hr_rep1     300 millimolar sodium chloride             5
## nippon_salt_5hr_rep2     300 millimolar sodium chloride             5
## nippon_salt_5hr_rep3     300 millimolar sodium chloride             5

A brief outline of how the raw data was normalized is in the experimentData slot:

preproc( experimentData( affy126data ) )
## $normalization
## [1] "RMA using oligo (http://www.bioconductor.org/packages/release/bioc/html/oligo.html) version 1.24.2"

3 Downloading and visualising a bulk Expression Atlas experiment

3.1 Downloading a single Expression Atlas experiment summary

You can download data for a single Expression Atlas experiment using the getAtlasExperiment() function:

mtab3007 <- getAtlasExperiment( "E-MTAB-3007" )
# Downloading Expression Atlas experiment summary from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-3007/E-MTAB-3007-atlasExperimentSummary.Rdata
# Successfully downloaded experiment summary object for E-MTAB-3007

3.2 Downloading a single Expression Atlas experiment normalised data

You can download normalised data for a single Expression Atlas experiment using the getNormalisedAtlasExpression() function:

mtab4045_tpm <- getNormalisedAtlasExpression( "E-MTAB-4045", "tpm" )
# Downloading XML file from FTP...
# E-MTAB-4045  is  rnaseq_mrna_baseline , will continue downloading data
# Downloading expression file from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-4045/E-MTAB-4045-tpms.tsv
# Downloading XML file from FTP...
mtab4045_cpm <- getNormalisedAtlasExpression( "E-MTAB-4045", "cpm" )
# Downloading XML file from FTP...
# E-MTAB-4045  is  rnaseq_mrna_baseline , will continue downloading data
# Downloading Expression Atlas experiment summary from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-4045/E-MTAB-4045-atlasExperimentSummary.Rdata
# Successfully downloaded experiment summary object for E-MTAB-4045

3.3 Downloading a single Expression Atlas differential experiment analytics data

You can also download analytics data for a single Expression Atlas deifferential experiment using the getAnalyticsDifferentialAtlasExpression() function:

mtab10104_dea <- getAnalyticsDifferentialAtlasExpression( "E-MTAB-10104" )
# Downloading expression file from:
# ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-10104/E-MTAB-10104-analytics.tsv

3.4 Creating heatmap from normalised data of a Expression Atlas experiment

You can create heatmap from downloaded normalised data for a single Expression Atlas experiment using the getNormalisedAtlasExpression() function:

mtab4045_tpm <- getNormalisedAtlasExpression( "E-MTAB-4045", "tpm" ) 
mtab4045 <- heatmapAtlasExperiment( df = mtab4045_tpm, save_pdf = FALSE, show_plot = TRUE, palette = "viridis", top_n = 20, scaled = FALSE, show_heatmap_title = TRUE )
## [1] "Plotting on screen"

And now using CPM data, instead of TPM:

mtab4045_cpm <- getNormalisedAtlasExpression( "E-MTAB-4045", "cpm" ) 
mtab4045 <- heatmapAtlasExperiment( df = mtab4045_cpm, save_pdf = FALSE, show_plot = TRUE,  palette = "viridis", top_n = 20, scaled = TRUE, show_heatmap_title = TRUE )
## [1] "Plotting on screen"

3.5 Creating a Volcano plot using normalised data from a differential Expression Atlas experiment

You can create volcano plot from downloaded normalised data for a single Expression Atlas experiment using the volcanoDifferentialAtlasExperiment() function:

mtab10104_dea <- getAnalyticsDifferentialAtlasExpression( "E-MTAB-10104" )

head(mtab10104_dea)
##              Gene.ID Gene.Name
## 1 ENSMUSG00000000001     Gnai3
## 2 ENSMUSG00000000003      Pbsn
## 3 ENSMUSG00000000028     Cdc45
## 4 ENSMUSG00000000031       H19
## 5 ENSMUSG00000000037     Scml2
## 6 ENSMUSG00000000049      Apoh
##   'FusΔNLS/+' vs 'wild type genotype' at '0 hour'.p.value
## 1                                       0.949433734508539
## 2                                                      NA
## 3                                       0.317495716647487
## 4                                       0.356839503839382
## 5                                       0.803589327923033
## 6                                        0.38783652780468
##   'FusΔNLS/+' vs 'wild type genotype' at '0 hour'.log2foldchange
## 1                                                            0.0
## 2                                                            0.0
## 3                                                           -0.3
## 4                                                           -1.4
## 5                                                            0.1
## 6                                                           -0.6
##   'FusΔNLS/+' vs 'wild type genotype' at '12 hour'.p.value
## 1                                      0.00325936399304042
## 2                                                       NA
## 3                                       0.0221456988784827
## 4                                       0.0217169268011455
## 5                                        0.902622666742574
## 6                                       0.0720270820023634
##   'FusΔNLS/+' vs 'wild type genotype' at '12 hour'.log2foldchange
## 1                                                            -0.8
## 2                                                             0.0
## 3                                                            -0.6
## 4                                                            -2.2
## 5                                                             0.0
## 6                                                            -0.7
##   'FusΔNLS/+' vs 'wild type genotype' at '24 hour'.p.value
## 1                                        0.152487018630022
## 2                                                       NA
## 3                                        0.416782204662169
## 4                                        0.218489618613372
## 5                                        0.882676533301032
## 6                                        0.125175137122423
##   'FusΔNLS/+' vs 'wild type genotype' at '24 hour'.log2foldchange
## 1                                                            -0.5
## 2                                                             0.0
## 3                                                            -0.3
## 4                                                            -1.5
## 5                                                             0.1
## 6                                                            -0.8
##   'FusΔNLS/+' vs 'wild type genotype' at '8 hour'.p.value
## 1                                       0.125097924748945
## 2                                                      NA
## 3                                       0.047531665786275
## 4                                       0.218562620356939
## 5                                       0.906020467754675
## 6                                      0.0200987685801291
##   'FusΔNLS/+' vs 'wild type genotype' at '8 hour'.log2foldchange
## 1                                                           -0.4
## 2                                                            0.0
## 3                                                           -0.5
## 4                                                           -1.3
## 5                                                            0.0
## 6                                                           -1.0
mtab10104 <-  volcanoDifferentialAtlasExperiment( df = mtab10104_dea, save_pdf = FALSE, show_plot = TRUE, low_fc_colour = "Gray", high_fc_colour = "Blue", cutoff = 1, show_volcanoplot_title = TRUE )

4 Searching, downloading and visualising Single-Cell Expression Atlas experiments

4.1 Search Single Expression Atlas experiments

You can search for experiments in Atlas using the searchSCAtlasExperiments() function.

search_1 <- searchSCAtlasExperiments( query = "endoderm" )
print(search_1)
## DataFrame with 49 rows and 4 columns
##       Accession      Species                   Type                  Title
##     <character>  <character>            <character>            <character>
## 1      E-ANND-1 Homo sapiens 10xv1, 10xv2, 10xv3,.. Human Lung Cell Atla..
## 2      E-ANND-2 Homo sapiens                  10xv2   GTEx: snRNAseq atlas
## 3      E-ANND-3 Homo sapiens             smart-seq2 Tabula Sapiens - Sma..
## 4      E-ANND-5 Homo sapiens       10xv2, 10x5prime Mapping the developi..
## 5     E-CURD-11 Homo sapiens             smart-like Single-cell RNA sequ..
## ...         ...          ...                    ...                    ...
## 45  E-MTAB-7704 Mus musculus                  10xv2 Single cell RNA-seq ..
## 46  E-MTAB-8077 Mus musculus                  10xv2 Single-Cell Transcri..
## 47  E-MTAB-8360 Mus musculus             smart-seq2 Ontogenic changes in..
## 48  E-MTAB-8495 Homo sapiens                  10xv2 Single-cell RNA sequ..
## 49  E-MTAB-9067 Homo sapiens             smart-seq2 Integrative single-c..
search_2 <- searchSCAtlasExperiments( query = "endoderm", secondaryFilter = "human" )
print(search_2)
## DataFrame with 31 rows and 4 columns
##       Accession      Species                   Type                  Title
##     <character>  <character>            <character>            <character>
## 1      E-ANND-1 Homo sapiens 10xv1, 10xv2, 10xv3,.. Human Lung Cell Atla..
## 2      E-ANND-2 Homo sapiens                  10xv2   GTEx: snRNAseq atlas
## 3      E-ANND-3 Homo sapiens             smart-seq2 Tabula Sapiens - Sma..
## 4      E-ANND-5 Homo sapiens       10xv2, 10x5prime Mapping the developi..
## 5     E-CURD-11 Homo sapiens             smart-like Single-cell RNA sequ..
## ...         ...          ...                    ...                    ...
## 27  E-MTAB-5061 Homo sapiens             smart-seq2 Single-cell RNA-seq ..
## 28  E-MTAB-6308 Homo sapiens                  10xv2 An integrated approa..
## 29  E-MTAB-7008 Homo sapiens             smart-seq2 scRNA-seq to investi..
## 30  E-MTAB-8495 Homo sapiens                  10xv2 Single-cell RNA sequ..
## 31  E-MTAB-9067 Homo sapiens             smart-seq2 Integrative single-c..
search_3 <- searchSCAtlasExperiments( query = "endoderm", secondaryFilter = "mouse" )
print(search_3)
## DataFrame with 16 rows and 4 columns
##         Accession              Species        Type                  Title
##       <character>          <character> <character>            <character>
## 1       E-CURD-52         Mus musculus    drop-seq Single-cell RNA-seq ..
## 2       E-CURD-94         Mus musculus  smart-seq2 Single-cell transcri..
## 3       E-ENAD-15         Mus musculus  smart-seq2 Single-cell RNA-seq ..
## 4   E-GEOD-121619 Arabidopsis thaliana       10xv2 Dynamics of gene exp..
## 5   E-GEOD-123046         Mus musculus       10xv2 The emergent landsca..
## ...           ...                  ...         ...                    ...
## 12    E-MTAB-6945         Mus musculus    drop-seq Single-cell analysis..
## 13    E-MTAB-7678         Mus musculus       10xv2 scRNA-seq analysis o..
## 14    E-MTAB-7704         Mus musculus       10xv2 Single cell RNA-seq ..
## 15    E-MTAB-8077         Mus musculus       10xv2 Single-Cell Transcri..
## 16    E-MTAB-8360         Mus musculus  smart-seq2 Ontogenic changes in..
search_4 <- searchSCAtlasExperiments( query = "arabidopsis" )
print(search_4)
## DataFrame with 17 rows and 4 columns
##         Accession                Species         Type                  Title
##       <character>            <character>  <character>            <character>
## 1        E-CURD-4   Arabidopsis thaliana     drop-seq High-throughput sing..
## 2        E-CURD-5   Arabidopsis thaliana     drop-seq High-throughput sing..
## 3       E-CURD-81 Oryza sativa Japonic..        10xv2 Single-cell transcri..
## 4       E-CURD-82 Oryza sativa Indica ..        10xv2 Single-cell transcri..
## 5       E-CURD-83   Arabidopsis thaliana 10xv2, 10xv3 A single-cell analys..
## ...           ...                    ...          ...                    ...
## 13  E-GEOD-157757               Zea mays        10xv3 Single cell sequenci..
## 14  E-GEOD-158761   Arabidopsis thaliana        10xv2 A single cell view o..
## 15  E-GEOD-161332   Arabidopsis thaliana        10xv2 Distinct identities ..
## 16   E-MTAB-11006   Arabidopsis thaliana        10xv3 Single-Cell Transcri..
## 17   E-MTAB-12532   Arabidopsis thaliana        10xv3 Single-cell RNA-seq ..
search_5 <- searchSCAtlasExperiments( query = "pluripotency" )
print(search_5)
## DataFrame with 2 rows and 4 columns
##      Accession      Species        Type                  Title
##    <character>  <character> <character>            <character>
## 1 E-GEOD-36552 Homo sapiens  smart-like Tracing pluripotency..
## 2 E-MTAB-10018 Homo sapiens       10xv3 Single cell RNA sequ..

4.2 Downloading a study into a SingleCellExperiment object

You can download data for a Single-Cell Expression Atlas experiment using the getAtlasSCExperiment() function:

enad19 <- getAtlasSCExperiment( "E-ENAD-19" )
# returns a SingleCellExperiment object

4.3 Visualise a dimensionality reduction plot of a SingleCellExperiment object

Let’s use and example for pluripotency of human early embryos and embryonic stem cells by single cell RNA-seq.

egeod36552 <- getAtlasSCExperiment( "E-GEOD-36552" )
# returns a SingleCellExperiment object

print("Reduced dimension names:")
## [1] "Reduced dimension names:"
print(reducedDimNames(egeod36552))
##  [1] "X_pca"                            "X_tsne_perplexity_1"             
##  [3] "X_tsne_perplexity_10"             "X_tsne_perplexity_15"            
##  [5] "X_tsne_perplexity_20"             "X_tsne_perplexity_25"            
##  [7] "X_tsne_perplexity_30"             "X_tsne_perplexity_35"            
##  [9] "X_tsne_perplexity_40"             "X_tsne_perplexity_45"            
## [11] "X_tsne_perplexity_5"              "X_tsne_perplexity_50"            
## [13] "X_umap_neighbors_n_neighbors_10"  "X_umap_neighbors_n_neighbors_100"
## [15] "X_umap_neighbors_n_neighbors_15"  "X_umap_neighbors_n_neighbors_20" 
## [17] "X_umap_neighbors_n_neighbors_25"  "X_umap_neighbors_n_neighbors_3"  
## [19] "X_umap_neighbors_n_neighbors_30"  "X_umap_neighbors_n_neighbors_5"  
## [21] "X_umap_neighbors_n_neighbors_50"
print("Column data names:")
## [1] "Column data names:"
print(colnames(colData(egeod36552)))
##  [1] "cell_type"                       "developmental_stage"            
##  [3] "individual"                      "organism_part"                  
##  [5] "organism"                        "single_cell_quality"            
##  [7] "time"                            "cell_type.1"                    
##  [9] "organism_part.1"                 "single_cell_identifier"         
## [11] "time.1"                          "cell_type_ontology"             
## [13] "developmental_stage_ontology"    "individual_ontology"            
## [15] "organism_part_ontology"          "organism_ontology"              
## [17] "single_cell_quality_ontology"    "time_ontology"                  
## [19] "cell_type_ontology.1"            "organism_part_ontology.1"       
## [21] "single_cell_identifier_ontology" "time_ontology.1"                
## [23] "n_genes_by_counts"               "log1p_n_genes_by_counts"        
## [25] "total_counts"                    "log1p_total_counts"             
## [27] "total_counts_mito"               "log1p_total_counts_mito"        
## [29] "pct_counts_mito"                 "n_counts"                       
## [31] "n_genes"                         "louvain_resolution_0.1"         
## [33] "louvain_resolution_0.3"          "louvain_resolution_0.5"         
## [35] "louvain_resolution_0.7"          "louvain_resolution_1.0"         
## [37] "louvain_resolution_2.0"          "louvain_resolution_3.0"         
## [39] "louvain_resolution_4.0"          "louvain_resolution_5.0"
plotDimRedSCAtlasExperiment(egeod36552, dimRed = "X_pca", colorby = "time" )

plotDimRedSCAtlasExperiment(egeod36552, dimRed = "X_umap_neighbors_n_neighbors_20", colorby = "cell_type")

plotDimRedSCAtlasExperiment(egeod36552, dimRed = "X_umap_neighbors_n_neighbors_20", colorby = "louvain_resolution_2.0") + theme_classic() + theme(legend.position = "left")

4.4 Plot a gene expression heatmap from a SingleCellExperiment object

For the default cluster or selected clusters with Expression Atlas marker genes, you can use the heatmapSCAtlasExperiment() function:

egeod36552 <- getAtlasSCExperiment( "E-GEOD-36552" )

heatmapSCAtlasExperiment( egeod36552, genes=NULL, sel.K=NULL, scaleNormExp=TRUE, show_row_names=FALSE ) 

# heatmapSCAtlasExperiment( egeod36552, genes=NULL, sel.K=NULL, scaleNormExp=TRUE, show_row_names=FALSE ) 

heatmapSCAtlasExperiment( egeod36552, genes=NULL, sel.K=6, scaleNormExp=TRUE, show_row_names=FALSE ) 

For user selected genes:

egeod36552 <- getAtlasSCExperiment( "E-GEOD-36552" )

heatmapSCAtlasExperiment( egeod36552, genes=c('ENSG00000151611','ENSG00000020577', 'ENSG00000188869' ), sel.K=NULL, scaleNormExp=FALSE, show_row_names=TRUE ) 

# heatmapSCAtlasExperiment( egeod36552, genes=c('ENSG00000151611','ENSG00000020577', 'ENSG00000188869' ), sel.K=NULL, scaleNormExp=TRUE, show_row_names=TRUE ) 

4.5 Dot-plot for a SingleCellExperiment object

For example, if we chose one marker gene from each of the clusters (k = 4) in the previous example, we can use the dotPlotSCAtlasExperiment() function to plot the average expression of these genes across the clusters:

egeod36552 <- getAtlasSCExperiment( "E-GEOD-36552" )

# dotPlotSCAtlasExperiment(egeod36552, genes=c('ENSG00000166681','ENSG00000178928', 'ENSG00000142182' , 'ENSG00000160282' ), sel.K=4)

dotPlotSCAtlasExperiment(egeod36552, genes=c('ENSG00000166681','ENSG00000178928', 'ENSG00000142182' , 'ENSG00000160282' ), sel.K=4, scaleNormExp=TRUE) + theme_classic()

5 sessionInfo

sessionInfo()
## R Under development (unstable) (2025-03-13 r87965)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] ggplot2_3.5.1               SingleCellExperiment_1.29.2
##  [3] viridis_0.6.5               viridisLite_0.4.2          
##  [5] dplyr_1.1.4                 ComplexHeatmap_2.23.1      
##  [7] SummarizedExperiment_1.37.0 GenomicRanges_1.59.1       
##  [9] GenomeInfoDb_1.43.4         IRanges_2.41.3             
## [11] S4Vectors_0.45.4            MatrixGenerics_1.19.1      
## [13] matrixStats_1.5.0           ExpressionAtlas_1.99.4     
## [15] Biobase_2.67.0              BiocGenerics_0.53.6        
## [17] generics_0.1.3             
## 
## loaded via a namespace (and not attached):
##   [1] DBI_1.2.3               bitops_1.0-9            gridExtra_2.3          
##   [4] rlang_1.1.5             magrittr_2.0.3          clue_0.3-66            
##   [7] GetoptLong_1.0.5        compiler_4.6.0          RSQLite_2.3.9          
##  [10] dir.expiry_1.15.0       reshape2_1.4.4          png_0.1-8              
##  [13] vctrs_0.6.5             stringr_1.5.1           pkgconfig_2.0.3        
##  [16] shape_1.4.6.1           crayon_1.5.3            fastmap_1.2.0          
##  [19] magick_2.8.6            XVector_0.47.2          labeling_0.4.3         
##  [22] caTools_1.18.3          rmarkdown_2.29          UCSC.utils_1.3.1       
##  [25] purrr_1.0.4             bit_4.6.0               xfun_0.52              
##  [28] cachem_1.1.0            jsonlite_2.0.0          blob_1.2.4             
##  [31] rhdf5filters_1.19.2     DelayedArray_0.33.6     Rhdf5lib_1.29.2        
##  [34] cluster_2.1.8.1         parallel_4.6.0          R6_2.6.1               
##  [37] stringi_1.8.7           RColorBrewer_1.1-3      bslib_0.9.0            
##  [40] limma_3.63.12           reticulate_1.42.0       genefilter_1.89.0      
##  [43] jquerylib_0.1.4         Rcpp_1.0.14             iterators_1.0.14       
##  [46] knitr_1.50              Matrix_1.7-3            splines_4.6.0          
##  [49] tidyselect_1.2.1        abind_1.4-8             yaml_2.3.10            
##  [52] doParallel_1.0.17       gplots_3.2.0            codetools_0.2-20       
##  [55] curl_6.2.2              zellkonverter_1.17.3    plyr_1.8.9             
##  [58] lattice_0.22-7          tibble_3.2.1            withr_3.0.2            
##  [61] basilisk.utils_1.19.1   KEGGREST_1.47.1         evaluate_1.0.3         
##  [64] survival_3.8-3          xml2_1.3.8              circlize_0.4.16        
##  [67] Biostrings_2.75.4       pillar_1.10.2           BiocManager_1.30.25    
##  [70] filelock_1.0.3          KernSmooth_2.23-26      foreach_1.5.2          
##  [73] munsell_0.5.1           scales_1.3.0            BiocStyle_2.35.0       
##  [76] gtools_3.9.5            xtable_1.8-4            glue_1.8.0             
##  [79] tools_4.6.0             annotate_1.85.0         locfit_1.5-9.12        
##  [82] XML_3.99-0.18           Cairo_1.6-2             rhdf5_2.51.2           
##  [85] tidyr_1.3.1             AnnotationDbi_1.69.1    edgeR_4.5.10           
##  [88] colorspace_2.1-1        GenomeInfoDbData_1.2.14 basilisk_1.19.3        
##  [91] HDF5Array_1.35.16       cli_3.6.4               S4Arrays_1.7.3         
##  [94] gtable_0.3.6            sass_0.4.9              digest_0.6.37          
##  [97] SparseArray_1.7.7       farver_2.1.2            rjson_0.2.23           
## [100] memoise_2.0.1           htmltools_0.5.8.1       lifecycle_1.0.4        
## [103] h5mread_0.99.4          httr_1.4.7              GlobalOptions_0.1.2    
## [106] statmod_1.5.0           bit64_4.6.0-1

References

George, Nancy, Silvie Fexova, Alfonso Munoz Fuentes, Pedro Madrigal, Yalan Bi, Haider Iqbal, Upendra Kumbham, et al. 2023. “Expression Atlas Update: Insights from Sequencing Data at Both Bulk and Single Cell Level.” Nucleic Acids Research 52 (D1): D107–D114. https://doi.org/10.1093/nar/gkad1021.