\name{sam}
\alias{sam}
\title{Significance Analysis of Microarray}
\description{
  Performs a Significance Analysis of Microarrays (SAM). It is possible to perform
  one and two class analyses using either a modified t-statistic or a (standardized) 
  Wilcoxon rank statistic, and a multiclass analysis using a modified F-statistic. 
  Moreover, this function provides a SAM procedure for categorical data such as SNP data
  and the possibility to employ an user-written score function.
}
\usage{
  sam(data, cl, method = d.stat, delta = NULL, n.delta = 10, p0 = NA,
      lambda = seq(0, 0.95, 0.05), ncs.value = "max", ncs.weights = NULL,
      gene.names = dimnames(data)[[1]], q.version = 1, ...)
}
\arguments{
  \item{data}{a matrix, a data frame, or an ExpressionSet object. Each row of \code{data}
    (or \code{exprs(data)}, respectively) must correspond to a gene, and
    each column to a sample}
  \item{cl}{a vector of length \code{ncol(data)} containing the class
     labels of the samples. In the two class paired case, \code{cl} can also 
     be a matrix with \code{ncol(data)} rows and 2 columns. If \code{data} is
     an ExpressionSet object, \code{cl} can also be a character string naming the column
     of \code{pData(data)} that contains the class labels of the samples.
     
     In the one-class case, \code{cl} should be a vector of 1's. 
     
     In the two class unpaired case, \code{cl} should be a vector containing 0's
     (specifying the samples of, e.g., the control group) and 1's (specifying,
     e.g., the case group). 
     
     In the two class paired case, \code{cl} can be either a numeric vector or a numeric matrix. 
     If it is a vector, then \code{cl} has to consist of the integers between -1 and 
     \eqn{-n/2} (e.g., before treatment group) and between 1 and \eqn{n/2} (e.g.,
     after treatment group), where \eqn{n} is the length of \code{cl} and \eqn{k}
     is paired with \eqn{-k}, \eqn{k=1,\dots,n/2}. If \code{cl} is a matrix, one
     column should contain -1's and 1's specifying, e.g., the before and the after
     treatment samples, respectively, and the other column should contain integer
     between 1 and \eqn{n/2} specifying the \eqn{n/2} pairs of observations.
     
     In the multiclass case and if \code{method = cat.stat}, \code{cl} should be a vector containing integers
     between 1 and \eqn{g}, where \eqn{g} is the number of groups.
        
     For examples of how \code{cl} can be specified, see the manual of \pkg{siggenes}} .
  \item{method}{a character string or a name specifying the method/function that should be used
     in the computation of the expression scores \eqn{d}. If \code{method = d.stat},
     a modified t-statistic or F-statistic, respectively, will be computed
     as proposed by Tusher et al. (2001). If \code{method = wilc.stat}, a
     Wilcoxon rank sum statistic or Wilcoxon signed rank statistic will be used
     as expression score. For an analysis of categorical data such as SNP data, 
     \code{method} can be set to \code{cat.stat}. In this case Pearson's
     Chi-squared statistic is computed for each row. It is also possible to use
     an user-written function to compute the expression scores.
     For details, see \code{Details}}
  \item{delta}{a numeric vector specifying a set of values for the threshold 
     \eqn{\Delta}{Delta} that should be used. If \code{NULL}, \code{n.delta}
     \eqn{\Delta}{Delta} values will be computed automatically}
  \item{n.delta}{a numeric value specifying the number of \eqn{\Delta}{Delta} values
     that will be computed over the range of all possible values for \eqn{\Delta}{Delta}
     if \code{delta} is not specified}
  \item{p0}{a numeric value specifying the prior probability \eqn{\pi_0}{pi0} 
     that a gene is not differentially expressed. If \code{NA}, \code{p0} will
     be computed by the function \code{pi0.est}}
  \item{lambda}{a numeric vector or value specifying the \eqn{\lambda}{lambda}
     values used in the estimation of the prior probability. For details, see
     \code{?pi0.est}}
  \item{ncs.value}{a character string. Only used if \code{lambda} is a
     vector. Either \code{"max"} or \code{"paper"}. For details, see \code{?pi0.est}}
  \item{ncs.weights}{a numerical vector of the same length as \code{lambda}
     containing the weights used in the estimation of \eqn{\pi_0}{pi0}. By default
     no weights are used. For details, see \code{?pi0.est}}
  \item{gene.names}{a character vector of length \code{nrow(data)} containing the
     names of the genes. By default the row names of \code{data} are used}
  \item{q.version}{a numeric value indicating which version of the q-value should
     be computed. If \code{q.version=2}, the original version of the q-value, i.e.
     min\{pFDR\}, will be computed. If \code{q.version=1}, min\{FDR\} will be used
     in the calculation of the q-value. Otherwise, the q-value is not computed.
     For details, see \code{?qvalue.cal}}
  \item{\dots}{further arguments of the specific SAM methods. If \code{method = d.stat},
     see the help of \code{\link{d.stat}}. If \code{method = wilc.stat}, see the help
     of \code{\link{wilc.stat}}. If \code{method = cat.stat}, see the help of
     \code{\link{cat.stat}}}
}
\details{
  \code{sam} provides SAM procedures for several types of analysis (one and two class analyses
  with either a modified t-statistic or a Wilcoxon rank statistic, a multiclass analysis
  with a modified F statistic, and an analysis of categorical data). It is, however, also 
  possible to write your own function for another type of analysis. The required arguments
  of this function must be \code{data} and \code{cl}. This function can also have other
  arguments. The output of this function must be a list containing 
  \describe{
     \item{\code{d}:}{a numeric vector consisting of the expression scores of the genes}
     \item{\code{d.bar}:}{a numeric vector of the same length as \code{na.exclude(d)} specifying
        the expected expression scores under the null hypothesis}
     \item{\code{p.value}:}{a numeric vector of the same length as \code{d} containing
        the raw, unadjusted p-values of the genes}
     \item{\code{vec.false}:}{a numeric vector of the same length as \code{d} consisting of
        the one-sided numbers of falsely called genes, i.e. if \eqn{d>0} the numbers
        of genes expected to be larger than \eqn{d} under the null hypothesis, and if
        \eqn{d<0}, the number of genes expected to be smaller than \eqn{d} under the
        null hypothesis}
     \item{\code{s}:}{a numeric vector of the same length as \code{d} containing the standard deviations 
        of the genes. If no standard deviation can be calculated, set \code{s=numeric(0)}}
     \item{\code{s0}:}{a numeric value specifying the fudge factor. If no fudge factor is calculated,
        set \code{s0=numeric(0)}}
     \item{\code{mat.samp}:}{a matrix with B rows and \code{ncol(data)} columns, where B is the number
        of permutations, containing the permutations used in the computation of the permuted
        d-values. If such a matrix is not computed, set \code{mat.samp=matrix(numeric(0))}}
     \item{\code{msg}:}{a character string or vector containing information about, e.g., which type of analysis
        has been performed. \code{msg} is printed when the function \code{print} or 
        \code{summary}, respectively, is called. If no such message should be printed, set \code{msg=""}}
     \item{\code{fold}:}{a numeric vector of the same length as \code{d} consisting of the fold 
        changes of the genes. If no fold change has been computed, set \code{fold=numeric(0)}}
  }
  If this function is, e.g., called \code{foo}, it can be used by setting \code{method = foo}
  in \code{sam}. More detailed information and an example will be contained in the siggenes
  manual.   
}
\value{
  an object of class SAM
}

\note{ 
   SAM was deveoped by Tusher et al. (2001).
    
   !!! There is a patent pending for the SAM technology at Stanford University. !!!
}

\references{
   Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of
   the Empirical Bayes and the Significance Analysis of Microarrays.
   \emph{Technical Report}, SFB 475, University of Dortmund, Germany.

   Schwender, H. (2004). Modifying Microarray Analysis Methods for 
   Categorical Data -- SAM and PAM for SNPs. To appear in: \emph{Proceedings
   of the the 28th Annual Conference of the GfKl}.

   Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays
   applied to the ionizing radiation response. \emph{PNAS}, 98, 5116-5121.

}
\author{Holger Schwender, \email{holger.schw@gmx.de}}

\seealso{
  \code{\link{SAM-class}},\code{\link{d.stat}},\code{\link{wilc.stat}},
  \code{\link{cat.stat}}
}
\examples{\dontrun{
  # Load the package multtest and the data of Golub et al. (1999)
  # contained in multtest.
  library(multtest)
  data(golub)
  
  # golub.cl contains the class labels.
  golub.cl

  # Perform a SAM analysis for the two class unpaired case assuming
  # unequal variances.
  sam.out<-sam(golub,golub.cl,B=100,rand=123)
  sam.out
  
  # Obtain the Delta plots for the default set of Deltas
  plot(sam.out)
  
  # Generate the Delta plots for Delta = 0.2, 0.4, 0.6, ..., 2
  plot(sam.out,seq(0.2,0.4,2))
  
  # Obtain the SAM plot for Delta = 2
  plot(sam.out,2)
  
  # Get information about the genes called significant using 
  # Delta = 3 (since neither the gene names nor the chip type
  # has been specified ll is set to FALSE to avoid a warning)
  sam.sum3<-summary(sam.out,3,ll=FALSE)
  
  # Obtain the rows of golub containing the genes called
  # differentially expressed
  sam.sum3@row.sig.genes
  
  # and their names
  golub.gnames[sam.sum3@row.sig.genes,3] 

  # The matrix containing the d-values, q-values etc. of the
  # differentially expressed genes can be obtained by
  sam.sum3@mat.sig
  
  # Perform a SAM analysis using Wilcoxon rank sums
  sam(golub,golub.cl,method="wilc.stat",rand=123)
    

  # Now consider only the first ten columns of the Golub et al. (1999)
  # data set. For now, let's assume the first five columns were
  # before treatment measurements and the next five columns were
  # after treatment measurements, where column 1 and 6, column 2
  # and 7, ..., build a pair. In this case, the class labels
  # would be
  new.cl<-c(-(1:5),1:5)
  new.cl
  
  # and the corresponding SAM analysis for the two-class paired
  # case would be
  sam(golub[,1:10],new.cl,B=100,rand=123)
  
  # Another way of specifying the class labels for the above paired
  # analysis is
  mat.cl<-matrix(c(rep(c(-1,1),e=5),rep(1:5,2)),10)
  mat.cl
  
  # and the above SAM analysis can also be done by
  sam(golub[,1:10],mat.cl,B=100,rand=123)
}}
\keyword{htest}