% \VignetteIndexEntry{Part 3: How to visualize genomic features} % \VignetteDepends{} % \VignetteKeywords{visualization utilities} % \VignettePackage{ggbio} \documentclass[11pt]{report} % \usepackage{times} \usepackage{hyperref} \usepackage{verbatim} \usepackage{graphicx} \usepackage{fancybox} \usepackage{color} <>= opts_chunk$set(eval=FALSE) @ % \setkeys{Gin}{width=0.95\textwidth} \textwidth=6.5in \textheight=8.5in \parskip=.3cm \parindent = 0cm \oddsidemargin=-.1in \evensidemargin=-.1in \headheight=-.3in \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rcode}[1]{{\texttt{#1}}} \newcommand{\software}[1]{\textsf{#1}} \newcommand{\R}{\software{R}} \newcommand{\Bioc}{\software{Bioconductor}} \newcommand{\IRanges}{\Rpackage{IRanges}} \newcommand{\biovizBase}{\Rpackage{biovizBase}} \newcommand{\ggbio}{\Rpackage{ggbio}} \newcommand{\visnab}{\Rpackage{visnab}} \newcommand{\ggplot}{\Rpackage{ggplot2}} \newcommand{\grid}{\Rpackage{grid}} \newcommand{\gridExtra}{\Rpackage{gridExtra}} \newcommand{\qplot}{\Rfunction{qplot}} \newcommand{\autoplot}{\Rfunction{autoplot}} \newcommand{\knitr}{\Rpackage{knitr}} \newcommand{\tracks}{\Rfunction{tracks}} \newcommand{\chipseq}{\Rpackage{chipseq}} % my own frambox \newcommand{\sfbox}[2][Tips]{ \begin{center} \shadowbox{ \parbox{0.8\linewidth}{ \textcolor{blue}{#1:} #2 } } \end{center} } \title{Visaulization of genomic features} \author{Tengfei Yin} \date{\today} \begin{document} % \setkeys{Gin}{width=0.6\textwidth} \maketitle \newpage \tableofcontents \newpage <>= library(knitr) opts_chunk$set(fig.path='./figures/ggbio-', fig.align='center', fig.show='asis', eval = TRUE, fig.width = 5, fig.height = 5) options(replace.assign=TRUE,width=90) @ <>= options(width=72) @ \section{Introduction} Transcript-centric annotation is one of the most useful tracks that frequently aligned with other data in many genome browsers. In \Bioc{}, you can either request data on the fly from UCSC or BioMart, which require internet connection, or you can save frequently used annotation data of particular organism, for example human genome, as a local data base. Package \Rpackage{GenomicFeatures} provides very convenient API for making and manipulating such database. \Bioc{} also pre-built some frequently used genome annotation as packages for easy installation, for instance, for human genome(hg19), there is a meta data package called \Rpackage{TxDb.Hsapiens.UCSC.hg19.knownGene}, after you load this package, a \Robject{TranscriptDb} object called \Rcode{TxDb.Hsapiens.UCSC.hg19.knownGene} will be visible from your workspace. This object contains information like coding regions, exons, introns, utrs, transcripts for this genome. If you cannot find the organism you want in \Bioc{} meta packages, please refer to the vignette of package \Rpackage{GenomicFeatures} to check how to build your own data base manually. \ggbio{} providing visualization utilities based on this specific object, in the following tutorial we cover some usage: \begin{itemize} \item How to plot genomic features for certain region, including coding region, introns, utrs. \item How to change geom of introns, how to revise arrow size and density. \item How to change aesthetics such as colors. \item How to plot single genomic features by make statistical transformation of ``reduce''. \item How to revise y label using expression and pattern. \item How to change x-scale unit to arbitrary \textit{kb,bp}. \item How to use lower level API. \end{itemize} \section{Usage} \subsection{autoplot} \autoplot{} API is higher level API in \ggbio{} which tries to make smart decision for object-oriented graphics. Another vignette have more detailed introduction to this function. In this tutorial, we solely focus on visualization of \Robject{TranscriptDb} object. <<>>= library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene ## suppose you already know the region you want to visualize ## or for human genome, you can try following commented code ## data(genesymbol, package = "biovizBase") ## genesymbol["ALDOA"] aldoa.gr <- GRanges("chr16", IRanges(30064491, 30081734)) aldoa.gr @ %def <>= library(ggbio) p1 <- autoplot(txdb, which = aldoa.gr) p1 @ % You can changing some aesthetics like colors in \autoplot{}, since rectangle is defined by 'color' which is border color and 'fill' for filled color. <>= library(ggbio) p1 <- autoplot(txdb, which = aldoa.gr, fill = "brown", color = "brown") p1 @ % \autoplot{} function for object \Robject{TranscriptDb} has two supported statistical transformation. \begin{itemize} \item \textbf{identity}: full model, show each transcript, parsing coding region, introns and utrs automatically from the database. intorns are shown as small arrows to indicate the direction, exons are represented as wider rectangles and utrs are represented as narrow rectangles. This transformation is shown in Figure \ref{fig:default} \item \textbf{reduce}: reduced model, show single reduced model, which take union of CDS, utrs and re-compute introns, as shown in Figure \ref{fig:reduce}. \end{itemize} <>= p2 <- autoplot(txdb, which = aldoa.gr, stat = "reduce") print(p2) @ %def To better understand the behavior of ``reduce'' transformation, we layout these two graphics by tracks as shown in Figure \ref{fig:track}. Function \Rfunction{Tracks} has been introduced in detail in another vignette. <>= tracks(full = p1, reduced = p2, heights = c(4,1)) + theme_alignment(grid=FALSE, border = FALSE) @ %def We allow users to change the way to visualization introns here, it's controlled by parameter ``gap.geom'', supported three geoms: \begin{itemize} \item \textbf{arrow}: with small arrow to indicate the strand direction, extra parameter existing to control the appearance of the arrow, as shown in Figure \ref{fig:gap.geom-up}. \textbf{arrow.rate} control how dense the arrows shows in between. \item \textbf{chevron}:chevron to show as introns, no strand indication. please check \Rfunction{geom\_chevron}. \item \textbf{segment}:segments to show as introns, no strand indication. \end{itemize} The geometric object for ranges, introns and uts are controled by parameters \Rfunarg{range.geom, gap.geom, utr.geom}. For example if you want to change the geom for gap, just change the \Rfunarg{gap.geom}. <>= autoplot(txdb, which = aldoa.gr, gap.geom = "chevron") @ %def <>= library(grid) autoplot(txdb, which = aldoa.gr, arrow.rate = 0.001, length = unit(0.35, "cm")) @ %def We also allow users to parse y labels from existing column in \Robject{TranscriptDb} object. <>= p <- autoplot(txdb, which = aldoa.gr, names.expr = "gene_id:::tx_name") p @ %def \clearpage \Rfunction{scale\_x\_sequnit} is a add-on utility to revise the x-scale, it provides three unit \begin{itemize} \item \textbf{mb}: 1e6bp unit. default for autoplot,TranscriptDb. \item \textbf{kb}: 1e3bp unit. \item \textbf{bp}: 1bp unit \end{itemize} it's just post-graphic modification, won't re-load the parsing process. Figure \begin{figure}[!htpb] \centering <>= p + scale_x_sequnit("kb") @ %de \caption{change the unit to kb.} \label{fig:change-unit} \end{figure} \subsection{geom\_alignment} \Rfunction{stat\_gene} is deprecated, and \Rfunction{geom\_alignment} is the lower level API which facilitate construction layer by layer. <>= p1 <- ggplot() + geom_alignment(txdb, which = aldoa.gr) @ %def \end{document}