Scope and opportunity
Case study 1: TCGA Copy Number / Expression
Accessing 'Consortium' data
Representing integrated data sets
Key components
Samples
- Including 'phenotypic' attributes measured on each
Assays
- Often, rectangular features x samples
matrix()
of values
- A different representation:
data.frame()
of feature, sample, value
tuples.
Experiments
- Sample inputs to each experiment
- Assay outputs from each experiment
- Additional experiment-specific information, each date on which
samples were processed
Alternative Representations
R-based representations
Other approaches
- Data base
- 'Scientific' representations, e.g.,
HDF5
- 'Computational' representations, e.g.,
HDFS
A preliminary design
Statistical and other challenges
Practical