%\VignetteIndexEntry{1. Primer}
%
% NOTE -- ONLY EDIT THE .Rnw FILE!!!  The .tex file is
% likely to be overwritten.
%
\documentclass[11pt]{article}
\usepackage{amsmath,pstricks}
% With MikTeX 2.9, using pstricks also requires using auto-pst-pdf or running
% pdflatex will fail. Note that using auto-pst-pdf requires to set environment
% variable MIKTEX_ENABLEWRITE18=t on Windows, and to set shell_escape = t in
% texmf.cnf for Tex Live on Unix/Linux/Mac.
\usepackage{auto-pst-pdf}
\usepackage{amssymb}
\usepackage{epsfig}
\usepackage{lscape}
\usepackage{graphics}
\usepackage{Sweave}
\textwidth=5.4in 
\textheight=8.8in
%\parskip=.3cm
\oddsidemargin=.1in 
\evensidemargin=.1in 
\headheight=-.5in
\setcounter{tocdepth}{2} 
\setcounter{secnumdepth}{3}
\begin{document}
\title{ArrayTools: Array Quality Assessment and Analysis Tool}
\author{Xiwei Wu and Xuejun Arthur Li}
\maketitle



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
The Affymetrix GeneChip is a commonly used tool to study gene expression profiles. 
The newly introduced Gene 1.0-ST arrays measure transcript expressions more accurately 
than the regular 3' -arrays. However, it lacks a tool to provide quality assessment 
and analysis for this type of array. This package is designed to provide solutions 
for quality assessment and to detect differentially expressed genes for the Affymetrix 
GeneChips, including both 3' -arrays and gene 1.0-ST arrays. The package provides 
functions that are easy to follow by biologists who have limited statistical backgrounds. 
The package generates comprehensive analysis reports in HTML format. Hyperlinks on 
the report page will lead to a series of QC plots, processed data, and differentially 
expressed gene lists. Differentially expressed genes are reported in tabular format with 
annotations hyperlinked to online biological databases. This guide will use an example 
dataset to demonstrate how to perform analysis of experiments with commonly used designs 
by using this package.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Data}
We will use \verb"exprsExample", a simulated gene expression data, and 
its corresponding phenotype data file, \verb"pDataExample" to illustrate
some examples.  Usually the expression data is generated from the Affymetrix
Expression Console and pheno data file is created by the user.
<<>>=
library(ArrayTools)
data(exprsExample)
head(exprsExample)
dim(exprsExample)
data(pDataExample)
pDataExample
@

The expression data that created from Affymetrix Expression Console has
an \verb"probeset_id" column.  To create an ExpressionSet, we need to use
\verb"probeset_id" as the rownames of expression data.  Then we can create 
an \verb"ExpressionSet" from the expression data and
the phenotype data.

<<>>=
rownames(exprsExample) <- exprsExample$probeset_id
eSet <- createExpressionSet (pData=pDataExample,  exprs = exprsExample, annotation = "hugene10sttranscriptcluster")
dim(eSet)
@

The \verb"annotation" argument is important for the analysis for the Gene 1.0-ST arrays.  
Since we only support two types of Gene 1.0-ST arrays, please use 
either \verb"hugene10sttranscriptcluster" or \verb"mogene10sttranscriptcluster" as the value for the 
\verb"annotation" argument.

\section{Data Preprocessing}

For Gene 1.0-ST arrays, data preprocessing include removing the \verb"control" genes 
(default value is \verb"rmControl =TRUE") and takes the log2 of the expression value.  
Before taking log2, we added 1 to the expression value (the default value is 
\verb"offset = 1") because the expression value may have 0 value.  If you want to 
output the preprocessed data to your local directory, you can use the \verb"output = TRUE"
option.  Also, notice that the first argument is an \verb"ExpressionSet".  

<<>>=
normal <- preProcessGeneST (eSet, output=TRUE)
normal
dim(normal)
@
For 3' -arrays, data processing is done by using the \verb"preProcess3prime" function,
which is a wrapper function to perform normalization for the array.  
Instead of using the \verb"ExpressionSet" as its argument, the \verb"preProcess3prime"
function requires an \verb"AffyBatch" object.  We can either choose \verb"rma"
or \verb"gcrma" as the \verb"method\verb" argument.

\section{Quality Assessment}
To generate the Quality Assessment Report, we need to use the Affymetrix Expression 
Console to generate a quality metric file. The sample quality metric file is 
similar to the \verb"QC" file that can be obtained by using the \verb"data" function.  
For Gene 1.0-ST arrays, we can use the \verb"qaGeneST" function to create an HTML report.  
This report contains a series of plots, including Intensity Distribution , 
Mean Signal, BAC Spike, polya Spike, Pos Vs Neg Auc, Mad Residual Signal, RLE MEAN, and 
Hierarchical Clustering of Samples plots.  

<<>>=
data(QC)
qaGeneST(normal, c("Treatment", "Group"), QC)
@

For 3' -arrays, the \verb"qa3prime" function is used to create a 
QC report.  Instead of using \verb"ExpressionSet"
as its argument, an \verb"AffyBatch" object is required.  Furthermore, 
the QC file is not required for the 3' -Array.

\section{Filtering}

Before running analysis on the arrays, filtering out the uninformative genes may be 
an important step for your analysis.  Three types of filtering methods are used in 
the \verb"geneFilter"  function. Suppose that if we want to remove genes with 
their inter-quartile range across the arrays with less than 10% and we 
would also like to keep genes with at least 2 arrays with backgrounds greater than 
4 at the same time, we can do the following:

<<>>=
filtered <- geneFilter(normal, numChip = 2, bg = 4, iqrPct=0.1,  output=TRUE)
@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Analysis}
The analysis takes a series of steps that includes creating a design and a contrast matrix,
running regression, selecting significant genes, and creating an HTML report.

\subsection{Design Matrix}
A design matrix determines what type of model you are running. The design matrix is 
defined as a \verb"designMatrix" class which can be created by the \verb"new" function.  
To create a model with only one factor, which is equivalent to a one-way ANOVA model, 
we can do the following:

<<>>=
design1<- new("designMatrix", target=pData(filtered), covariates = "Treatment")
design1
@

To create a model with two factors, we can do the following:

<<>>=
design2<- new("designMatrix", target=pData(filtered), covariates = c("Treatment", "Group"))
design2
@

\subsection{Contrast Matrix}

For the one-way ANOVA model based on the \verb"design1", if we want to compare 
\verb"Treated" vs. \verb"Control", we can do the following:

<<>>=
contrast1 <- new("contrastMatrix", design.matrix = design1,  compare1 = "Treated", compare2 = "Control")
contrast1
@

To perform the same comparison and to controll the \verb"Group" effect (Randomized 
Block Design), we can write:

<<>>=
contrast2 <- new("contrastMatrix", design.matrix = design2, compare1 = "Treated", compare2 = "Control")
contrast2
@

\subsection{Regression}

To run the gene-wise regression, we can use the \verb"regress" function. This function 
will create a \verb"regressResult" object.  

<<>>=
result1 <- regress(filtered, contrast1)
result2 <- regress(filtered, contrast2)
@

\subsection{Select Significant Genes}

We can select differentially expressed genes by using the \verb"selectSigGene" function.  
To select differentially expressed genes based on p-values less than 0.05 and 
fold change greater than log2 of 1.5, we can write the following codes:

<<>>=
sigResult1 <- selectSigGene(result1, p.value=0.05, fc.value=log2(1.5))
sigResult2 <- selectSigGene(result2, p.value=0.05, fc.value=log2(1.5))
@

We can use the \verb"Sort" function to sort the \verb"regressResult" object by 
p-value in ascending order (\verb"sorted.by = 'pValue'") or log2 Ratio in descending 
order (\verb"sorted.by = 'log2Ratio'") or F statistics in descending order 
(\verb"sorted.by = 'F'").  

<<>>=
Sort(sigResult1, sorted.by = 'pValue')
Sort(sigResult2, sorted.by = 'F')
@

\subsection{Creating Reports}

To output the differentially expressed genes along with annotations to an HTML
file in your current working directory, we can use the \verb"Output2HTML" function.

<<>>=
Output2HTML(sigResult1)
Output2HTML(sigResult2)
@

\section{Detecting Interaction}

Interaction is a statistical term referring to a situation when the 
relationship between the outcome and the variable of the main interest 
differs at different levels of the extraneous variable.

Just like before, we need to create the design and contrast matrices to 
detect the interaction effect. 

<<>>=
designInt <- new("designMatrix", target=pData(filtered), covariates = c("Treatment", "Group"), intIndex=c(1,2))
designInt
contrastInt <- new("contrastMatrix", design.matrix = designInt, interaction = TRUE)
contrastInt
@

To identify genes with an interaction effect, we can use the same \verb"regress" 
and \verb"selectSigGene" functions:

<<>>=
resultInt <- regress(filtered, contrastInt)
sigResultInt <-selectSigGene(resultInt, p.value=0.05, fc.value=log2(1.5))
@

For genes with the interaction effect, they should be analyzed separately 
within each group.  For genes without any interaction gene, 
they should be analyzed together.  This step can be achieved by using the 
\verb"postInteraction" function.  The \verb"postInteraction" function returns an 
object of \verb"interactionResult" class.  The components of the \verb"interactionResult"
object consist of a list of \verb"regressResult" objects.  The first component is 
a \verb"regressResult" object for all the genes.  The second component contains 
the result for genes without interaction.  The third and the fourth components 
(since \verb"Group" only contains two factors, \verb"A" and \verb"B") contain 
results for genes with interaction only among groups \verb"A" and \verb"B", respectively.  
Then we can use the \verb"selectSigGeneInt" function again to select differently 
expressed genes within each component of the \verb"interactionResult" object.

<<>>=
intResult <- postInteraction(filtered, sigResultInt, mainVar ="Treatment",  compare1 = "Treated",  compare2 = "Control")
sigResultInt <- selectSigGeneInt(intResult, pGroup = 0.05, pMain = 0.05)
@

We can use the \verb"Output2HTML" function again to output the 
differentially expressed genes along with annotations to an HTML file in 
your current working directory.

<<>>=
Output2HTML(sigResultInt)
@

\section{Creating Index File}
We have created multiple outputs, including normalized data, filtered data, and 
differently expressed genes for multiple models.  We can create an 
index file that can link all of these results.

<<>>=
createIndex (sigResult1, sigResult2, intResult)
@

\end{document}