\name{generateSeeds-methods} \docType{methods} \alias{generateSeeds-methods} \alias{generateSeeds} \alias{generateSeeds,eSet-method} \alias{generateSeeds,matrix-method} \title{Generate seeds for biclustering} \description{ \code{generateSeeds} takes either matrix or an \code{\linkS4class{ExpressionSet}} object to generate seeds. Seeds are defined as pairs of genes (edges) which share coincident expression levels in samples. The higher the coincidence, the higher the score of the seeds will be. The seeds are generated by subsequent comparing each pair of genes. When all seeds have been produced, they are sorted by the coincidence scores and returned as an object. See the details section for notes on implementation. } \section{Methods}{ In the \code{rqubic} package, \code{generateSeeds} currently supports two data types: \code{\linkS4class{ExpressionSet}} (an inherited type of \code{\linkS4class{eSet}}, or numeric matrix. Both methods requires in addition a parameter, \code{minColWidth}, specifying the minimum number of conditions shared by the two genes of each seed. Its default value is 2. When this default value is used, the minimum coincidence score is defined as \eqn{max(2, ncol/20)}, where \eqn{ncol} represents the number of conditions. When a non-default value is provided, the value is used to select seeds. \describe{ \item{\code{signature(object = "eSet")}}{An object representing expression data. Note that the \code{exprs} must be a matrix of integers, otherwise the method warns and coerces the storage mode of matrix into integer.} \item{\code{signature(object = "matrix")}}{A matrix of integers. In case filled by non-integers, the method warns and coerces the storage mode into integer} }} \section{Details}{ The function compares all pairs of genes, namely all edges of a complete graph composed by genes. The weight of each edge is defined as the number of samples, in which two genes have the same expression level. This weight, also known as the \emph{coincidence score}, reflects the co-regulation relationship between two genes. The seed is chosen by picking edges with higher scores than the minimum score, provided by the \code{minColWidth} parameter (default: 2). To implement such a selection algorithm, a \emph{Fibonacci heap} is constructed in the C codes. Its size is predefined as a constant, which should be reduced in case the gene number is too large to run the algorithm. A new seed, which was selected by having a higher coincidence score than the minimum, is inserted to the heap. And dependent on whether the heap is full or not, it is either inserted by squeezing the minimum seed out, or put into the heap directly. Once the heap is filled by examining all pairs of genes, it is dumped into an array of edge pointers, with decreasingly ordered edge pointers by their scores. This array is captured as an external pointer, attached as an attribute of an \code{rqubicSeeds} object. An \code{rqubicSeeds} object holds an integer, which records the height of the heap. It has (besides the class identifier) two attributes: one for the external pointer, and the other one for the threshold of the coincidence score. } \note{ In the \code{rqubic} implementation, the variable \code{arr_c[i][j]} holds the level symbols (\eqn{-1, 0, 1} in the default case), whereas in the \code{QUBIC} implementation, this variable holds the index of level symbols, and the level symbols are saved in the global variable \code{symbols}. } \author{Jitao David Zhang } \examples{ data(sample.ExpressionSet, package="Biobase") sample.disc <- quantileDiscretize(sample.ExpressionSet) sample.seeds <- generateSeeds(sample.disc) sample.seeds ## with higher threshold of incidence score sample.seeds.higher <- generateSeeds(sample.disc, minColWidth=5) sample.seeds.higher }