%\VignetteIndexEntry{samroc - example}
%\VignetteDepends{SAGx,multtest, hu6800.db}
%\VignetteKeywords{Expression Analysis}
%\VignettePackage{SAGx}

\documentclass[11pt]{article}
\usepackage{geometry}\usepackage{color}

\definecolor{darkblue}{rgb}{0.0,0.0,0.75}
\usepackage[%
baseurl={http://www.bioconductor.org},%
pdftitle={Samroc example},%
pdfauthor={Per Broberg},%
pdfsubject={SAGx},%
pdfkeywords={Bioconductor},%
pagebackref,bookmarks,colorlinks,linkcolor=darkblue,citecolor=darkblue,%
pagecolor=darkblue,raiselinks,plainpages,pdftex]{hyperref}


%------------------------------------------------------------
% newcommand
%------------------------------------------------------------
\newcommand{\arsinh}{\mathop{\mathgroup\symoperators arsinh}\nolimits}
\newcommand{\Robject}[1]{\texttt{#1}}
\newcommand{\Rpackage}[1]{\textit{#1}}
\newcommand{\Rclass}[1]{\textit{#1}}
\newcommand{\Rfunction}[1]{{\small\texttt{#1}}}

\newcommand{\myincfig}[3]{%
  \begin{figure}[htbp]
    \begin{center}
      \includegraphics[width=#2]{#1}
      \caption{\label{#1}#3}
    \end{center}
  \end{figure}
}

\usepackage[authoryear,round]{natbib}
\usepackage{graphicx}
\usepackage{amsfonts}


\begin{document}

%------------------------------------------------------------
\title{Samroc example}
%------------------------------------------------------------
\author{Per Broberg}
\maketitle


\section*{Analysis of the data from Golub \textit{et al.}}

Consider the microarray experiment in \cite{golubetal} where ALL and AML subtypes of leukemia are compared.
The data are available within package \Rpackage{multtest}. 

We can analyse those data in \Rpackage{SAGx}  with the function \textit{samrocNboot}. The ideas behind
it are presented in \cite{broberg:2003}. Briefly, the method relies on a penalised \textit{t}-test statistica $ d = (\bar{x}_1 - \bar{x}_2)/(S + a)$ 
with fudge factor $a$ \cite{efron:2001}. In this case the effect estimated consists of a difference in group means. In general 
the method can estimate and test one such effect in the presence of explanatory variables such as AGE or GENDER using a linear model. In such a case
the function \Rfunction{samrocN} provides a solution. Example code now follows. 

<<>>=
library("SAGx")
library("multtest")
data(golub)
set.seed(849867)
samroc.res <- samrocN(data = golub, formula = ~as.factor(golub.cl))
show(samroc.res)
@
The function \Rfunction{samrocN} is used to perform a penalised \textit{t}-test. Its value is an object of class
\Rclass{samroc.result}. The functions \Rfunction{show} and \Rfunction{plot} are defined for such objects. In Figure \ref{density} the densities of the 
test statistic and its permutation null distribution are displayed. The graph was produced by invoking the \Rfunction{plot} function

<<dummyex3,eval=FALSE>>=
plot(samroc.res)
@


\begin{figure}[htbp]
\centering

<<fig=true, width = 12, height = 6>>=
par(bg = "cornsilk")
plot(samroc.res)
@
\caption{Densities of the test statistic and of its permutation null distribution}
\label{density}
\end{figure}

\begin{figure}[htbp]
\centering

The estimated proportion unchanged genes equals \Sexpr{format.pval(samroc.res@p0, digits = 2)}. The distribution of \textit{p}-values is shown in Figure~\ref{hist1},
which confirms that many genes are changed. Furthermore, using the function \textit{pava.fdr} we obtain estimates of the FDR and 
of the local FDR, see Figure~\ref{hist2}. This function is presented in \cite{broberg:2005} and combines the local FDR estimator 
of \cite{aubert:2004} with Poisson regression (see \cite{efron:2004}) and isotonic regression.

<<fig=true, width = 12, height = 6>>=
par(bg = "cornsilk")
hist(samroc.res@pvalues, xlab = "p-value", main ="", col = 'orange', freq  = F)
print(abline(samroc.res@p0,0, col = 'red'))
@
\caption{Histogram of the \textit{p}-values generated by function \textit{samrocNboot}}
\label{hist1}
\end{figure}

\begin{figure}[htbp]
\centering
<<fig = true, width = 12, height = 6>>=
par(bg = "cornsilk")
fdrs <- pava.fdr(ps = samroc.res@pvalues)
plot(samroc.res@pvalues, fdrs$pava.local.fdr, type = 'n', xlab = "p-value", ylab = "False Discovery Rate (FDR)")
lines(lowess(samroc.res@pvalues, fdrs$pava.local.fdr), col = 'red')
lines(lowess(samroc.res@pvalues, fdrs$pava.fdr), col = 'blue')
legend(0.1,0.9,pch=NULL,col=c("red","blue"),c("pava local FDR","pava FDR"),lty = 1)
@
\caption{Scatter plot of the local false discovery rate and the false discovery rate as estimated by function \textit{pava.fdr}}
\label{hist2}
\end{figure}

One can also perform a simple Gene Set Enrichment Analysis based on the output from \Rfunction{samrocNboot} by invoking \Rfunction{GSEA.mean.t},
cf. \cite{lutian:2005} which describes a similar idea. The package \Rpackage{hu6800.db} maps KEGG pathways \cite{kegg:2000} onto probeset identifiers. The following code
analyses one KEGG pathway ( 00970 Aminoacyl-tRNA biosynthesis) and outputs a p-value based on the average over the pathway of 
the absolute value of the test statistic $d$. The algorithm includes restandardization following \cite{efron:tibshirani:2006}.

<<>>=
library("hu6800.db")
kegg <- as.list(hu6800PATH2PROBE)
probeset <- golub.gnames[,3]
GSEA.mean.t(samroc = samroc.res, probeset = probeset, pway = kegg[1],
type = "original", two.side = FALSE)

@

\newpage
\section{On the calculation of p-values}

Following \cite{tusher:2001}, \cite{broberg:2003} defines a
permutation p-value for gene $i$ out of a total $N$ as

\begin{equation}\label{pval1}
    p_{i} = \frac{\#\{d^{*k}(j)|: |d^{*k}(j)| > |d(i)|\} }{N \times B}
\end{equation}

, denoting by $d(i)$ the test statistic corresponding to gene $i$,
and by $d^{*k}(i)$ the permutation null statistic in the $k^{th}$
iteration out of a total $B$.

This has the unfortunate side effect of occasionally returning
$p$-values equal to zero. To solve this the definition from
\cite{davison:1997} is employed. Denote by $F_{n}$ the empirical
distribution function of all $-|d^{*k}|$. The estimate then
becomes:

\begin{equation}\label{pval2}
    p_i = \frac{B\times N\times F_{n}(-|d(i)|)+ 1}{B\times N + 1}
\end{equation}

This follows from $\{t^{*} \geq t\} \Leftrightarrow \{ -t^{*} \leq -t\}$.

Various functions from SAGx were used in \cite{StefanPierrou03152007}.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newpage
\bibliographystyle{plainnat}
\bibliography{samroc-ex}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\end{document}