\name{quantileDiscretize} \alias{quantileDiscretize} \alias{quantileDiscretize-methods} \alias{quantileDiscretize,matrix-method} \alias{quantileDiscretize,eSet-method} \title{ Discretize expression matrix for qualitative biclustering } \description{ Performs recursive quantilizations on gene expression data across samples, to quantileDiscretize gene expression matrix. The quantile parameter \code{q} determines the estimated proportion of differentially expressed genes (\emph{2q} as for both up- and down-regulatons). The rank parameter \code{r} determines how many discrete levels should differentially expressed genes (or outliers) have. See details below. } \usage{ quantileDiscretize(x, ...) } \arguments{ \item{x}{It can be an object of the \code{\linkS4class{eSet}} class or inheriting it. The most commonly used form is an \code{linkS4class{ExpressionSet}} class. Alternatively, it can be a numeric matrix.} \item{...}{Currently, the \dots accepts two parameter: \code{q} and \code{rank}, explained below.} \itemize{ \item{q}{Estimated proportion of conditions where gene is up- or down-regulated, value between \eqn{(0,0.5)}, default value is set to 0.06. By specifying \code{q} one estimates that in \code{2q} of all conditions, the expression value of a gene is considered as outlier.} \item{rank}{Ranks (levels) of outliers, a positive integer, default is 1L. By default, all conditions get one label for each gene in \eqn{{-1, 0, 1}}, representing down expression, not changing and high expression respectively. In case \eqn{rank>1}, the outliers are further divided into \emph{rank} levels by applying recursive quantilization with equal intervals.} } } \details{ Parameter \code{q} corresponds to the command line option \code{-q} in the QUBIC command line tool, and the \code{rank} option corresponds to \code{-r}. For each gene, the algorithm applies quantile discretization first to divide conditions into negative (lower), un-changed and positive (higher) expressions. Negative and positive expressed conditions are considered as \emph{outliers}. For outliers in each direction, the algorithm tries to further quantileDiscretize the expression values in case \eqn{rank>1}. This second discretization step is performed by dividing the sorted outliers into \eqn{rank} tandom groups with equal conditions. A label is assigned to each of these tandom groups, in the following order: \deqn{-1, -2, \ldots, -rank} for outliers with negative expression, from the \emph{most negative group} to the \emph{least negative group} (not the other way around!). Similarly, for positive outliers, labels in the order of \deqn{rank, rank-1, \ldots, 1} are assigned to tandom groups from the \emph{least positive group} to the \emph{most positive group}. That is, signs of labels indicate the direction of gene expression change, and the absolute value represents the quantileDiscretized \emph{rank} in the outliers. } \value{ An object of the same class as the input parameter, with the \code{exprs} slot replaced by the quantileDiscretized matrix, which is a matrix of integer. } \references{ Li et al. (2009) \emph{QUBIC: a qualitative biclustering algorithm for analyses of gene expression data} Nucleic Acids Research 37:e101 } \author{ Jitao David Zhang } \note{ Note that the resulting discrete matrix of this implementation can be slighly different from the one used by the QUBIC command line tool. The main reason for this is the internal data type: while QUBIC uses \code{float} to represent expression matrix, we use \code{double} to represent the matrix. It has the advantages of interfacing to R, having higher precision and avoiding errors caused by floating presentation. It is implemented with potential larger costs of memory, however for test data sets (for example the ALL dataset with more than 120 samples and 12000 genes) the peak memory use (<100M) as well as the execution time (CPU time 0.028s) are well under control. The differentially is especially often observed when there are many tied values. These cases however are very rare cases and we assume they will not affect the results to a large extent. } \seealso{ \code{\link{parseQubicChars}} parses the quantileDiscretized matrix by the QUBIC command line tool into a data frame. } \examples{ data(sample.ExpressionSet, package="Biobase") sample.disc <- quantileDiscretize(sample.ExpressionSet) exprs(sample.disc)[1:6, 1:6] ## Equivalent to pass a numeric matrix sample.mat.disc <- quantileDiscretize(exprs(sample.ExpressionSet)) sample.mat.disc[1:6, 1:6] \dontrun{identical(exprs(sample.disc),sample.mat.disc)} ## with multiple ranks sample.rank3 <- quantileDiscretize(sample.ExpressionSet, rank=3) exprs(sample.rank3)[1:6, 1:6] }