lab4Affy.Rnw
% % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % % \VignetteIndexEntry{EMBO03 Lab 4} %\VignetteDepends{affy} %\VignetteKeywords{Microarray, Pre-processing} \documentclass[12pt]{article}
\usepackage{amsmath,pstricks} \usepackage[authoryear,round]{natbib} \usepackage{hyperref}
\textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in
\newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle}
\bibliographystyle{plainnat}
\title{EMBO03 Lab4: Introduction to Bioconductor {\tt affy} Package}
\author{Sandrine Dudoit, Robert Gentleman, and Rafael Irizarry}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{document}
\maketitle
In this lab, we demonstrate the main functions in the \verb+affy+
package for pre-processing Affymetrix microarray data.
For a more detailed introduction, consult the package vignettes which can be listed
by the command {\tt openVignette("affy")}. A demo can also be accessed
by {\tt demo(affy)}. A number of sample datasets are available in the
package; to list these, type {\tt data(package="affy")}. To load the
package
<
The function \verb+ReadAffy+ is available for reading CEL
files. However, in this lab we will work mainly with the
\verb+Dilution+ dataset, which is included in the package. For a description of \verb+Dilution+, type {\tt ? Dilution}.
To load this dataset
<
%%%%%%%%%%%%%%%%%%%%%%%%% %%% affy classes
One of the main classes in \verb+affy+ is the \verb+AffyBatch+
class. For details on this class consult the help file, {\tt ?
AffyBatch}; methods for manipulating instances of this class are also
described in the help file. Other classes include \verb+ProbeSet+ (PM and MM
intensities for individual probe sets), \verb+Cdf+ (information
contained in a CDF file), and
\verb+Cel+ (single array cel intensity data).
The object \verb+Dilution+ is an instance of the class
\verb+AffyBatch+. Try the following commands to obtain information on
this object
<
For a description of the target samples hybridized to the arrays
<
The \verb+exprs+ slot contains a matrix with columns corresponding to
arrays and rows to individual probes on the array.
To obtain the matrix of intensities for all four arrays
<
You can access probe-level PM and MM intensities using
<
To get the probe set names (Affy IDs)
<
As with other microarray objects in Bioconductor packages, you can
use subsetting commands for {\tt AffyBatch} objects
<
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Reading in data
One of the main functions for reading in Affymetrix data is \verb+ReadAffy+. It reads in data from \verb+CEL+ and \verb+CDF+ files and creates objects of class \verb+AffyBatch+. Using \verb+ReadAffy(widget=TRUE)+ provides widgets for interactive data input.\\
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Diagnostic plots
To produce a spatial image of probe log intensities and probe raw intensities
<
<
To produce boxplots of probe log intensities
<
To produce density plots of probe log intensities
<
The boxplots and density plots show that the Dilution data needs
normalization. As described in the dataset help file and in the
\texttt{phenoData} slot (\texttt{pData(Dilution)}), two
concentrations of mRNA were used and, for each concentration, two
scanners were used. From the plots, we note that scanner effects seem
stronger than concentration effects (different colors). Arrays that
should be the same are different; arrays that
should be different are similar. Because different mRNA concentrations were
used, we perform probe-level normalization within concentration groups.
<
Notice how the boxplot now looks better.
<
The \verb+affy+ package provides implementations for a number of methods for background correction, probe-level normalization (e.g., quantile, curve-fitting (Bolstad et al., 2002)), and computation of expression measures (e.g., MAS 4.0, MAS 5.0, MBEI (Li \& Wong, 2001), RMA (Irizarry et al., 2003)). To list available methods for \verb+AffyBatch+ objects
<
The main normalization function is \verb+expresso+. You can select
pre-processing methods interactively using widgets by typing {\tt
expresso(Dilution, widget=TRUE)}. The function operates on objects of class
\verb+AffyBatch+ and returns objects of class \verb+exprSet+.
\verb+rma+ provides a more efficient implementation of Robust Multi-array
Average (RMA). We don't normalize because we already did above.
<
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % CDF data packages
Data packages for CDF information can be download from
\url{www.bioconductor.org}. These packages contain environment objects
which provide mappings between AffyIDs and matrices of probe
locations, with rows corresponding to probe-pairs and columns to PM and
MM cels. CDF environments for HGU95Av2 and HGU133A chips are
already in the package. For information on the environment object {\tt ? hgu95av2cdf}
<
You can also use the \verb+indexProbes+, \verb+pmindex+, and
\verb+mmindex+ functions to get information on probe location
<
Having access to PM and MM data can be useful. Let's look at a plot of PM vs. MM
<
\end{document}