\name{gsea}
\alias{gsea}
\docType{data}
\title{
GSEA (Gene Set Enrichment Analysis).}
\description{
Computes the enrichment scores and simulated enrichment scores for
each variable and signature.
An important parameter of the function is \code{logScale}. Its default
value is TRUE which means that by default the provided scores (i.e. fold
changes, hazard ratios) will be log scaled. Remember to change this
parameter to FALSE if your scores are already log scaled.
The \code{getEs}, \code{getEsSim}, \code{getFc}, \code{getHr} and
\code{getFcHr} methods can be used to acces each subobject. For more
information please visit the man pages of each method.

It also computes the NES (normalized enrichment score), p values and fdr
(false discovery rate) for all variables and signatures.
For an overview of the output use the \code{summary} method.

In case of providing gene sets which have more than 10 distinct lengths
an approximation of the calculation of the enrichment score simulations
(ESM) will be computed.
The value of the ESM only depends on the length of the gene
set. Therefore we compute the ESM over a grid of possible gene set
lengths which are representative of the lengths of the provided gene
sets. Then we fit a generalized additive model model with cubic splines
to predict the NES value based on the length of every gene set. This
provides a much faster approach that can be very useful when we need to
run the software over a huge number of gene sets.
}
\usage{
gsea(x,gsets,logScale=TRUE, absVals=FALSE, averageRepeats=FALSE, B=1000,
     mc.cores=1, test="perm",p.adjust.method="none",
     pval.comp.method="original",pval.smooth.tail=TRUE,minGenes=10,
     maxGenes=500,center=FALSE) 
}
\arguments{
\item{x}{ \code{ePhenoTest}, \code{numeric} or \code{matrix} object
  containing scores (hazard ratios or fold changes).}
\item{gsets}{ character or list object containing the names of the
  genes that belong to each signature.}
\item{logScale}{ if values should be log scaled.}
\item{absVals}{ if TRUE fold changes and hazard ratios that are negative
  will be turned into positive before starting the process. This is
  useful when genes can go in both directions.}
\item{averageRepeats}{ if x is of class numeric and has repeated names
  (several measures for some indivdual names) we can average the
  measures of the same names.}
\item{B}{ number of simulations to perform.}
\item{mc.cores}{ number of processors to use.}
\item{test}{ the test that will be used. 'perm' stands for the permutation
  based method, 'wilcox' stands for the wilcoxon test (this is the fastest
  one) and 'ttperm' stands for permutation t test.}
\item{p.adjust.method}{ p adjustment method to be used. Common options
  are 'BH', 'BY', 'bonferroni' or 'none'. All available options and
  their explanations can be found on the \code{p.adjust} function manual.}
\item{pval.comp.method}{ the p value computation method. Has to be one
  of 'signed' or 'original'. The default one is 'original'. See details for
  more information.}
\item{pval.smooth.tail}{ if we want to estimate the tail of the
  ditribution where the pvalues will be generated.}
\item{minGenes}{ gene sets with less than minGenes genes will be removed
  from the analysis.}
\item{maxGenes}{ gene sets with more than maxGenes genes will be removed
  from the analysis.}
\item{center}{ if we want to center scores (fold changes or hazard
  ratios). The following is will be done: x = x-mean(x).}
}
\details{
The following preprocessing was done on the provided scores (i.e. fold
changes, hazard ratios) to avoid errors during the enrichment score
computation:
-When having two scores with the same name its average was used.
-Zeros were removed.
-Scores without names (which can not be in any signature) removed. 
-Non complete cases (i.e. NAs, NaNs) were removed.
ES score was calculated for each signature and variable (see
references). If parameter \code{test} is 'perm' the signature was 
permutted and the ES score was recalculated (this happened B times for
each variable, 1000 by default).
If \code{test} is 'wilcox' a wilcoxon test in which we test the fact
that the average value of the genes that do belong to our signtaure is
different from the average value of the genes that do not belong to our
signature will be performed.
If \code{test} is 'ttperm' a permutation t-test will be used.
Take into account that the final plot will be different when 'wilcox' is used.

The simulated enrichment scores and the calculated one
are used to find the p value.
 
P value calculation depends on the parameter
\code{pval.comp.method}. The default value is 'original'. In 'original' 
we are simply computing the proportion of anbolute simulated ES which
are larger than the observed absolute ES. In 'signed' we are computing
the proportion of simulated ES which are larger than the observed ES (in
case of having positive enrichment score) and the proportion of
simulated ES which are smaller than the observed ES (in case of having
negative enrichment score). 
}
\references{
Aravind Subramanian, (October 25, 2005) \emph{Gene Set Enrichment
  Analysis}.
\url{www.pnas.org/cgi/doi/10.1073/pnas.0506580102}

C.A. Tsai and J.J. Chen. \emph{Kernel estimation for adjusted p-values
  in multiple testing. Computational Statistics & Data Analysis}
\url{http://econpapers.repec.org/article/eeecsdana/v_3a51_3ay_3a2007_3ai_3a8_3ap_3a3885-3897.htm}
}
\keyword{datasets}
\seealso{
gsea.go, gsea.kegg
}
\examples{
#load epheno object
data(epheno)
epheno

#we construct two signatures
sign1 <- sample(featureNames(epheno))[1:20]
sign2 <- sample(featureNames(epheno))[50:75]
mySignature <- list(sign1,sign2)
names(mySignature) <- c('My first signature','My preferred signature')

#run gsea functions
gseaData <- gsea(x=epheno,gsets=mySignature,B=100,mc.cores=1)
my.summary <- summary(gseaData)
my.summary 
#plot(gseaData)
}
\author{
Evarist Planet
}