\documentclass[a4paper]{article} \title{The \emph{mgsa} package} \author{Sebastian Bauer, Julien Gagneur} \date{13 December 2010} % The is for R CMD check, which finds it in spite of the "%", and also for % automatic creation of links in the HTML documentation for the package: % \VignetteIndexEntry{Overview of the mgsa package.} \begin{document} %%%%%%%% Setup % Don't reform code \SweaveOpts{keep.source=TRUE} % Size for figures \setkeys{Gin}{width=\textwidth} % Reduce characters per line in R output <>= options( width = 80 ) @ % Make title \maketitle %%%%%%%% Main text \section{Introduction} Model-based Gene Set Analysis (MGSA, Bauer et al.~\cite{Bauer2010}) is a Bayesian modeling approach for gene set enrichment. The package \emph{mgsa} implements MGSA and tools to use MGSA together with the Gene Ontology~\cite{GO2000}. \section{Quick start} We start with a small simulated dataset which contains \texttt{example\_go}, a random subset of yeast gene ontology annotations with 20 terms and \texttt{example\_o}, a simulated set of observed positive genes. These genes could for example be the "hits" of some screen or a set of differentially expressed genes. In the simulation, the terms GO:0006109 and GO:0030663 were active, implying that genes annotated to these terms were more likely to be observed positives than other genes. <>= library(mgsa) data("example") example_go example_o @ The method \texttt{mgsa} fits the MGSA model. It returns a \texttt{MgsaMcmcResults} object whose \texttt{print} method displays the most likely active terms. On this example, \texttt{mgsa} correctly reports largest posterior probabilities for the terms GO:0006109 and GO:0030663. The call to \texttt{set.seed()}, which sets the seed of the random number generator, simply ensures the example of this vignette to be reproducible. It is not required for \texttt{mgsa()} to work. <>= set.seed(0) fit = mgsa(example_o, example_go) fit @ The method \texttt{plot} provides a graphical visualization of the fit. <>= plot(fit) @ \section{Using the Gene Ontology} The Gene Ontology~\cite{GO2000} (GO) provides structured annotations to genes. Genes with the same annotation consitute a gene set. MGSA can be run on these gene sets. GO annotation files for your organism of study can be downloaded at the GO web page: http://www.geneontology.org. The function \texttt{readGAF} creates an \texttt{MgsaGoSets} object, a particular \texttt{MgsaSets}, from such a gene annotation file. Note that \texttt{readGAF} requires the package \texttt{GO.db} and \texttt{RSQLite} to be installed. For illustration purposes, a simplified GO annotation file with only three yeast genes is provided: <>= readGAF( system.file( "example_files/gene_association_head.sgd", package="mgsa" ) ) @ \section{Using custom gene sets} MGSA is not restricted to Gene Ontology and can be applied to any gene sets. The method \texttt{mgsa} can directly be called on such gene sets provided as \texttt{list} as in the example below. <>= mgsa( c("A", "B"), list(set1=LETTERS[1:3], set2=LETTERS[2:5]) ) @ Internally, the method \texttt{mgsa} indexes all elements of the sets before fitting the model. In case \texttt{mgsa} must be run on several observations with the same gene sets, computations can be speeded up by performing this indexing once for all. This can be achieved by building a \texttt{MgsaSets}. <>= myset = new( "MgsaSets", sets=list(set1=LETTERS[1:3], set2=LETTERS[2:5]) ) mgsa(c("A", "B"), myset) mgsa(c("B", "C"), myset) @ \begin{thebibliography}{10} \bibitem{Bauer2010} S. Bauer, J. Gagneur and P. N. Robinson. \newblock GOing Bayesian: model-based gene set analysis of genome-scale data. \newblock \textit{Nucleic acids research}, 2010. \bibitem{GO2000} The Gene Ontology Consortium. \newblock Gene Ontology: tool for the unification of biology. \newblock \textit{Nature Genetics}, 25:25--29,2000. \end{thebibliography} \end{document}