%\VignetteIndexEntry{RamiGO: An Introduction (HowTo)} %\VignetteDepends{gsubfn} %\VignetteSuggests{} %\VignetteImports{igraph} %\VignettePackage{RamiGO} %documentclass[12pt, a4paper]{article} %\documentclass[12pt]{article} \documentclass[a4paper,11pt]{article} \usepackage{amsmath} \usepackage{times} \usepackage{hyperref} \usepackage[numbers]{natbib} \usepackage[american]{babel} \usepackage{authblk} %\renewcommand\Authfont{\scshape} \renewcommand\Affilfont{\itshape\small} \usepackage{Sweave} \renewcommand{\topfraction}{0.85} \renewcommand{\textfraction}{0.1} %\usepackage{tikz} \usepackage{graphicx} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in %------------------------------------------------------------ % newcommand %------------------------------------------------------------ \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{\textit{#1}} \newcommand{\Rpackage}[1]{\textit{#1}} \newcommand{\Rexpression}[1]{\texttt{#1}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \begin{document} %------------------------------------------------------------ \title{\Rpackage{RamiGO}: an R interface for AmiGO} %------------------------------------------------------------ \author[1,2]{Markus S. Schr\"{o}der} \author[1]{Daniel Gusenleitner} \author[1]{Aed\'{i}n C. Culhane} \author[1]{Benjamin Haibe-Kains} \author[1]{John Quackenbush} \affil[1]{Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health, Boston, USA} \affil[2]{Computational Genomics, Center for Biotechnology, Bielefeld University, Germany} \SweaveOpts{highlight=TRUE, tidy=TRUE, keep.space=TRUE, keep.blank.space=FALSE, keep.comment=TRUE} <>= ##library(pgfSweave) ##setCacheDir("cache") options(keep.source=TRUE) options(width=60) @ \maketitle \tableofcontents %------------------------------------------------------------ \section{Introduction} %------------------------------------------------------------ A common task in recent gene set or gene signature analyses is testing for up- and down-regulation of these gene sets or gene signatures in Gene Ontology (GO) terms. Or having a gene or set of genes of interest and looking at the GO terms that include that gene or gene set. For a closer look at the distribution of the GO terms in the different tree structures of the three GO categories one has to either rebuild the GO tree himself with the help of published R packages, or copy and paste the GO terms of interest into existing web services to display the GO tree. One of these web services is AmiGO visualize: \begin{description} \item[AmiGO visualize:] \url{http://amigo.geneontology.org/cgi-bin/amigo/amigo?mode=visualize} \end{description} The \Rpackage{RamiGO} package is providing functions to interact with the AmiGO visualize web server and retrieves GO (Gene Ontology) trees in various formats. The most common requests would be as png or svg, but a file representation of the tree in the GraphViz DOT format is also possible. \Rpackage{RamiGO} also provides a parser for the GraphViz DOT format that returns a graph object and meta data in R. %------------------------------------------------------------ \section{Getting started} %------------------------------------------------------------ At first we load the \Rpackage{RamiGO} package into the current workspace: <>= library(RamiGO) @ The \Rpackage{RamiGO} package currently provides two functions that enable the user to retrieve directed acyclic trees from AmiGO and parse the GraphViz DOT format. An example on how to use the functions is given below. To retrieve a tree from AmiGO, the user has to provide a vector of GO ID's. For example GO:0051130, GO:0019912, GO:0005783, GO:0043229 and GO:0050789. These GO ID's represent entries from the three GO categories: Biological Process, Molecular Function and Cellular Component. The given GO ID's can be highlighted with different colors within the tree, therefor the user has to provide a vector of colors for each GO ID. A request could look like this: <>= goIDs <- c("GO:0051130","GO:0019912","GO:0005783","GO:0043229","GO:0050789") color <- c("lightblue","red","yellow","green","pink") pngRes <- getAmigoTree(goIDs=goIDs, color=color, filename="example", picType="png", saveResult=TRUE) @ \begin{figure}[h!] \centering \includegraphics[width=1\textwidth]{example.png} \caption{Example PNG returned from AmiGO.} \label{fig:amiGOpng} \end{figure} The GO tree representing the given GO ID's is dowloaded to the file "example.png" (see Figure \ref{fig:amiGOpng}); the file extension is created automatically according to picType. The request for a svg file is similar: <>= svgRes <- getAmigoTree(goIDs=goIDs, color=color, filename="example", picType="svg", saveResult=TRUE) @ \Robject{svgRes} is a vector with the svg picture in xml format. In order to further analyze the tree, \Rpackage{RamiGO} provides the possibility to retrieve the tree in the GraphViz DOT format. The function \Rfunction{readAmigoDot} parses these DOT format files and returns a \Robject{AmigoDot} S4 object. This S4 object includes an igraph object (\Rfunction{agraph()}), an adjacency matrix representing the graph (\Rfunction{adjMAtrix()}), a data.frame with the annotation for each node (\Rfunction{annot()}), the relations (edges) between the nodes (\Rfunction{relations()}) and a data.frame with the leaves of the tree and their annotation (\Rfunction{leaves()}). An example could look like this: <>= dotRes <- getAmigoTree(goIDs=goIDs, color=color, filename="example", picType="dot", saveResult=TRUE) tt <- readAmigoDot(object=dotRes) ## reading the file would also work! ## tt <- readAmigoDot(filename="example.dot") show(tt) @ The leaves of the tree are returned in \Robject{leaves(tt)}: <>= leavesTT <- leaves(tt) leavesTT[,c("node","GO_ID","description")] @ In order to export the tree to an GML file that is readable by Cytoscape, you have to call the \Rfunction{adjM2gml} with some of the results from the \Rfunction{readAmigoDot} function. The following example creates a GML file by internally calling the \Rfunction{exportCytoGML}: <>= gg <- adjM2gml(adjMatrix(tt),relations(tt)$color,annot(tt)$fillcolor,annot(tt)$GO_ID,annot(tt)$description,"example") @ The result is a GML file named example.gml that can be imported into Cytoscape as a network file. %------------------------------------------------------------ \section{A usefull extension to GSEA} %------------------------------------------------------------ The \Rpackage{RamiGO} package provides an extremely helpful extension to the GSEA software, in java as well as in R, if run with genesets from GO (C5 in MSigDB). \Rpackage{RamiGO} provides a mapping from GO terms returned from GSEA to official GO ID's. The mapping is stored in the data object \Robject{c5.go.mapping}. <>= data(c5.go.mapping) head(c5.go.mapping) @ One of the ways to avoid running GSEA in R is to call the java application of GSEA from R with the \Rfunction{system()} function. An example for a preranked GSEA would be: <>= ## paths to gsea jar and gmt file exe.path <- exe.path.string gmt.path <- gmt.path.string gsea.collapse <- "false" ## number of permutations nperm <- 10000 gsea.seed <- 54321 gsea.out <- "out-folder" ## build GSEA command gsea.report <- "report-file" rnk.path <- "rank-file" gsea.cmd <- sprintf("java -Xmx4g -cp %s xtools.gsea.GseaPreranked -gmx %s -collapse %s -nperm %i -rnk %s -scoring_scheme weighted -rpt_label %s -include_only_symbols true -make_sets true -plot_top_x 75 -rnd_seed %i -set_max 500 -set_min 15 -zip_report true -out %s -gui false", exe.path, gmt.path, gsea.collapse, nperm, rnk.path, gsea.report, gsea.seed, gsea.out) ## execute command on the system system(gsea.cmd) @ The results are stored in a folder with the name specified in \Robject{gsea.out}. The subfolder \Robject{gsea.report} has the detailed results in comma separated files and html pages. In the \Robject{gsea.cmd} string above we specified a few parameters which can be changed according to the type of analysis. \begin{itemize} \item plot\_top\_x: the number of results that should have an individual result page linked to the main index.html. \item set\_max and set\_min: limits the analysis to genesets that have more than 15 and less than 500 genes. \end{itemize} Once the GSEA analysis is finished, the important result files are xls files in the \Robject{gsea.report} folder. Named gsea\_report\_for\_na\_pos\_.xls and gsea\_report\_for\_na\_neg\_.xls. We can read them into R with the following command: <>= resn <- "xxx" ## number generated by GSEA that you can get with grep(), strsplit() and dir() tt <- rbind(read.table(sprintf("%s/%s/gsea_report_for_na_pos_%s.xls", gsea.out, gsea.report,resn), stringsAsFactors=FALSE, sep="\t", header=TRUE), read.table(sprintf("%s/%s/gsea_report_for_na_neg_%s.xls", gsea.out, gsea.report, restn), stringsAsFactors=FALSE, sep="\t", header=TRUE)) @ With all results from the GSEA analysis stored in \Robject{tt}, you can extract information from the results and call the \Rfunction{getAmigoTree{}} mentioned in the example section. %------------------------------------------------------------ \section{View and edit GO trees in Cytoscape} %------------------------------------------------------------ The \Rfunction{adjM2gml} function in \Rpackage{RamiGO} creates a Cytoscape specific GML file (see example section above) that can be imported into Cytoscape and further edited (for example for publication purposes). The GO tree from the example above, parsed with the \Rfunction{readAmigoDot} function, exported with the \Rfunction{adjM2gml} and imported into Cytoscape as a network, looks like Figure \ref{fig:cyto}. \begin{figure}[h!] \centering \includegraphics[width=1\textwidth]{cyto.png} \caption{Example GML imported in Cytoscape.} \label{fig:cyto} \end{figure} %------------------------------------------------------------ \section{Misc} %------------------------------------------------------------ \Rfunction{strapply} enables perl-like regular expression in R, as do \Rfunction{grep, sub or gsub}. In particular, it enables the use of the perl variables \$1, \$2, ... for extracting information from within a regular expression. The code below shows an example of the use of \Rfunction{strapply}. The string within brackets (...) is returned in a list by \Rfunction{strapply}. <>= strapply(c("node25 -> node30"), "node([\\d]+) -> node([\\d]+)", c, backref = -2) @ The \Rpackage{RCurl} package is useful for communicating with a web server and sending GET or POST requests. \Rpackage{RamiGO} uses the \Rfunction{postForm()} function to communicate with the AmiGO web server. The \Rpackage{png} package is used to convert the web server response for a png request into an actual png file. The \Rpackage{igraph} package is used to build a graph object representing the tree that was parsed from an DOT format file. \newpage %------------------------------------------------------------ \section{Session Info} %------------------------------------------------------------ <>== toLatex(sessionInfo()) @ \end{document}