%\VignetteIndexEntry{miRNAtap} %\VignetteKeywords{miRNAtap, miRNA, targets} %\VignettePackage{miRNAtap} \documentclass{article} \usepackage{amsmath} \usepackage{amscd} \usepackage[tableposition=top]{caption} \usepackage{ifthen} \usepackage{natbib} \usepackage[utf8]{inputenc} \usepackage{Sweave} \SweaveOpts{prefix.string = tP} \begin{document} \title{{\bf miRNAtap} example use} \author{Maciej Pajak, Ian Simpson} \date{\today} \maketitle \tableofcontents \newpage \section{Introduction} {\tt miRNAtap} package is designed to facilitate implementation of workflows requiring miRNA prediction. Aggregation of commonly used prediction algorithm outputs in a way that improves on performance of every single one of them on their own when compared against experimentally derived targets. microRNA (miRNA) is a 18-22nt long single strand that binds with RISC (RNA induced silencing complex) and targets mRNAs effectively reducing their translation rates. Targets are aggregated from 4 most commonly cited prediction algorithms: DIANA \citep{Maragkakis2011}, Miranda \citep{Enright2003}, PicTar \citep{Lall2006} and TargetScan \citep{Friedman2009}. Programmatic access to sources of data is crucial when streamlining the workflow of our analysis, this way we can run similar analysis for multiple input miRNAs or any other parameters. Not only does it allow us to obtain predictions from multiple sources straight into R but also through aggregation of sources it improves the quality of predictions. Finally, although direct predictions from all sources are only available for {\it Homo sapiens} and {\it Mus musculus}, this package includes an algorithm that allows to translate target genes to other speices (currently only {\it Rattus norvegicus}) using homology information where direct targets are not available. \section{Installation} This section briefly describes the necessary steps to get {\tt miRNAtap} running on your system. We assume that the user has the R program (see the R project at http://www.r-project.org) already installed and is familiar with it. You will need to have R 3.2.0 or later to be able to install and run {\tt miRNAtap}. The miRNAtap package is available from the Bioconductor repository at http://www.bioconductor.org To be able to install the package one needs first to install the core Bioconductor packages. If you have already installed Bioconductor packages on your system then you can skip the two lines below. <>= source("http://bioconductor.org/biocLite.R") biocLite() @ Once the core Bioconductor packages are installed, we can install the miRNAtap and accompanying database {\tt miRNAtap.db} package by <>= source("http://bioconductor.org/biocLite.R") biocLite("miRNAtap") biocLite("miRNAtap.db") @ \section{Workflow} This section explains how {\tt miRNAtap} package can be integrated in the workflow aimed at predicting which processes can be regulated by a given microRNA. In this example workflow we'll use {\tt miRNAtap} as well as another Bioconductor package {\tt topGO} together with Gene Ontology (GO) annotations. In case we don't have {\tt topGO} or GO annotations on our machine we need to install them first: <>= source("http://bioconductor.org/biocLite.R") biocLite("topGO") biocLite("org.Hs.eg.db") @ Then, let's load the required libraries <>= library(miRNAtap) library(topGO) library(org.Hs.eg.db) @ Now we can start the analysis. First, we will obtain predicted targets for human miRNA {\it miR-10b} <<>>= mir = 'miR-10b' predictions = getPredictedTargets(mir, species = 'hsa', method = 'geom', min_src = 2) @ Let's inspect the top of the prediction list. <<>>= head(predictions) @ We are using {\it geometric mean} aggregation method as it proves to perform best when tested against experimental data from MirBase \citep{Griffiths-Jones2008}. We can compare it to the top of the list of the output of {\it mimumum} method: <<>>= predictions_min = getPredictedTargets(mir, species = 'hsa', method = 'min', min_src = 2) head(predictions_min) @ Where predictions for rat genes are not available we can obtain predictions for mouse genes and translate them into rat genes through homology. The operation happens automatically if we specify species as {\tt rno} (for {\it Rattus norvegicus}) <<>>= predictions_rat = getPredictedTargets(mir, species = 'rno', method = 'geom', min_src = 2) @ Now we can use the ranked results as input to GO enrichment analysis. For that we will use our initial prediction for human {\it miR-10b} <>= rankedGenes = predictions[,'rank_product'] selection = function(x) TRUE # we do not want to impose a cut off, instead we are using rank information allGO2genes = annFUN.org(whichOnto='BP', feasibleGenes = NULL, mapping="org.Hs.eg.db", ID = "entrez") GOdata = new('topGOdata', ontology = 'BP', allGenes = rankedGenes, annot = annFUN.GO2genes, GO2genes = allGO2genes, geneSel = selection, nodeSize=10) @ In order to make use of the rank information we will use Kolomonogorov Smirnov (K-S) test instead of Fisher exact test which is based only on counts. <<>>= results.ks = runTest(GOdata, algorithm = "classic", statistic = "ks") results.ks @ We can view the most enriched GO terms (and potentially feed them to further steps in our workflow) <<>>= allRes = GenTable(GOdata, KS = results.ks, orderBy = "KS", topNodes = 20) allRes[,c('GO.ID','Term','KS')] @ For more details about GO analysis refer to {\tt topGO } package vignette \citep{topgo}. Finally, we can use our predictions in a similar way for pathway enrichment analysis based on KEGG \citep{Kanehisa2000}, for example using Bioconductor's {\tt KEGGprofile} \citep{keggprofile}. \section{Session Information} <>= toLatex(sessionInfo()) @ \addcontentsline{toc}{section}{References} \label{references} \bibliography{refs} \bibliographystyle{apalike} \end{document}