% \VignetteIndexEntry{Part 3: How to visualize genomic features} 
% \VignetteDepends{} 
% \VignetteKeywords{visualization utilities} 
% \VignettePackage{ggbio}
\documentclass[11pt]{report}
% \usepackage{times}
\usepackage{hyperref}
\usepackage{verbatim}
\usepackage{graphicx}
\usepackage{fancybox}
\usepackage{color}

<<include=FALSE>>=
opts_chunk$set(eval=FALSE)
@

% \setkeys{Gin}{width=0.95\textwidth}

\textwidth=6.5in
\textheight=8.5in
\parskip=.3cm
\parindent = 0cm
\oddsidemargin=-.1in
\evensidemargin=-.1in
\headheight=-.3in

\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\Rcode}[1]{{\texttt{#1}}}

\newcommand{\software}[1]{\textsf{#1}}
\newcommand{\R}{\software{R}}
\newcommand{\Bioc}{\software{Bioconductor}}
\newcommand{\IRanges}{\Rpackage{IRanges}}
\newcommand{\biovizBase}{\Rpackage{biovizBase}}
\newcommand{\ggbio}{\Rpackage{ggbio}}
\newcommand{\visnab}{\Rpackage{visnab}}
\newcommand{\ggplot}{\Rpackage{ggplot2}}
\newcommand{\grid}{\Rpackage{grid}}
\newcommand{\gridExtra}{\Rpackage{gridExtra}}
\newcommand{\qplot}{\Rfunction{qplot}}
\newcommand{\autoplot}{\Rfunction{autoplot}}
\newcommand{\knitr}{\Rpackage{knitr}}
\newcommand{\tracks}{\Rfunction{tracks}}
\newcommand{\chipseq}{\Rpackage{chipseq}}


% my own frambox
\newcommand{\sfbox}[2][Tips]{
\begin{center}
\shadowbox{
  \parbox{0.8\linewidth}{
    \textcolor{blue}{#1:}
    #2
  }
  }
\end{center}
}

\title{Visaulization of genomic features}
\author{Tengfei Yin}
\date{\today}

\begin{document}
% \setkeys{Gin}{width=0.6\textwidth}
\maketitle
\newpage
\tableofcontents
\newpage

<<setup, include=FALSE, cache=FALSE, eval = TRUE>>=
library(knitr)
opts_chunk$set(fig.path='./figures/ggbio-', 
               fig.align='center', fig.show='asis', 
               eval = TRUE, fig.width = 5,
               fig.height = 5)
options(replace.assign=TRUE,width=90)
@


<<options,echo=FALSE>>=
options(width=72)
@

\section{Introduction}
Transcript-centric annotation is one of the most useful tracks that frequently
aligned with other data in many genome browsers. In \Bioc{}, you can either
request data on the fly from UCSC or BioMart, which require internet connection,
or you can save frequently used annotation data of particular organism, for
example human genome, as a local data base. Package \Rpackage{GenomicFeatures}
provides very convenient API for making and manipulating such database. \Bioc{}
also pre-built some frequently used genome annotation as packages for easy
installation, for instance, for human genome(hg19), there is a meta data package
called \Rpackage{TxDb.Hsapiens.UCSC.hg19.knownGene}, after you load this
package, a \Robject{TranscriptDb} object called
\Rcode{TxDb.Hsapiens.UCSC.hg19.knownGene} will be visible from your
workspace. This object contains information like coding regions, exons, introns,
utrs, transcripts for this genome. If you cannot find the organism you want in
\Bioc{} meta packages, please refer to the vignette of package
\Rpackage{GenomicFeatures} to check how to build your own data base manually.


\ggbio{} providing visualization utilities based on this specific object, in the
following tutorial we cover some usage:
\begin{itemize}
\item How to plot genomic features for certain region, including coding region, introns,
  utrs.
\item How to change geom of introns, how to revise arrow size and density.
\item How to change aesthetics such as colors.
\item How to plot single genomic features by make statistical transformation of ``reduce''.
\item How to revise y label using expression and pattern.
\item How to change x-scale unit to arbitrary \textit{kb,bp}.
\item How to use lower level API.
\end{itemize}


\section{Usage}
\subsection{autoplot}
\autoplot{} API is higher level API in \ggbio{} which tries to make smart
decision for object-oriented graphics. Another vignette have more detailed
introduction to this function.

In this tutorial, we solely focus on visualization of \Robject{TranscriptDb}
object.

<<>>=
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
## suppose you already know the region you want to visualize
## or for human genome, you can try following commented code
## data(genesymbol, package = "biovizBase")
## genesymbol["ALDOA"]
aldoa.gr <- GRanges("chr16", IRanges(30064491, 30081734))
aldoa.gr
@ %def 


<<txdb-full, fig.height = 4.5>>=
library(ggbio)
p1 <- autoplot(txdb, which = aldoa.gr)
p1
@ %  

You can changing some aesthetics like colors in \autoplot{}, since rectangle is
defined by 'color' which is border color and 'fill' for filled color.

<<txdb-full-aes, fig.height = 4.5>>=
library(ggbio)
p1 <- autoplot(txdb, which = aldoa.gr, fill = "brown", color = "brown")
p1
@ %  

\autoplot{} function for object \Robject{TranscriptDb} has two supported
statistical transformation.

\begin{itemize}
\item \textbf{identity}: full model, show each transcript, parsing coding region, introns
  and utrs automatically from the database. intorns are shown as small arrows to
  indicate the direction, exons are represented as wider rectangles and utrs are
  represented as narrow rectangles. This transformation is shown in Figure \ref{fig:default}
\item \textbf{reduce}: reduced model, show single reduced model, which take union of CDS,
  utrs and re-compute introns, as shown in Figure \ref{fig:reduce}.
\end{itemize}


<<txdb-reduce, fig.height = 1.5>>=
p2 <- autoplot(txdb, which = aldoa.gr, stat = "reduce")
print(p2)
@ %def   

To better understand the behavior of ``reduce'' transformation, we layout these
two graphics by tracks as shown in Figure \ref{fig:track}. Function
\Rfunction{Tracks} has been introduced in detail in another vignette.

<<tracks, fig.height = 4.5>>=
tracks(full = p1, reduced = p2, heights = c(4,1)) + 
  theme_alignment(grid=FALSE, border = FALSE) 
@ %def   


We allow users to change the way to visualization introns here, it's controlled
by parameter ``gap.geom'', supported three geoms:
\begin{itemize}
\item \textbf{arrow}: with small arrow to indicate the strand direction, extra
  parameter existing to control the appearance of the arrow, as shown in Figure
  \ref{fig:gap.geom-up}.
\textbf{arrow.rate} control how dense the arrows shows
  in between.
\item \textbf{chevron}:chevron to show as introns, no strand indication. please
  check \Rfunction{geom\_chevron}.
\item \textbf{segment}:segments to show as introns, no strand indication. 
\end{itemize}


The geometric object for ranges, introns and uts are controled by parameters
\Rfunarg{range.geom, gap.geom, utr.geom}. For example if you want to change the
geom for gap, just change the \Rfunarg{gap.geom}.

<<change-intron-geom, fig.height = 4.5>>=
autoplot(txdb, which = aldoa.gr, gap.geom = "chevron")
@ %def   

<<change-intron-geom-arrow, fig.height = 4.5>>=
library(grid)
autoplot(txdb, which = aldoa.gr, arrow.rate = 0.001, length = unit(0.35, "cm"))
@ %def   

We also allow users to parse y labels from existing column in
\Robject{TranscriptDb} object.

<<parsing-expression, fig.height = 4.5>>=
p <- autoplot(txdb, which = aldoa.gr, names.expr = "gene_id:::tx_name")
p
@ %def 
  

\clearpage
\Rfunction{scale\_x\_sequnit} is a add-on utility to revise the x-scale, it
provides three unit
\begin{itemize}
\item \textbf{mb}: 1e6bp unit. default for autoplot,TranscriptDb.
\item \textbf{kb}: 1e3bp unit.
\item \textbf{bp}: 1bp unit
\end{itemize}
it's just post-graphic modification, won't re-load the parsing process. Figure 


\begin{figure}[!htpb]
  \centering
<<change-unit, fig.height=4.5>>=
p + scale_x_sequnit("kb")
@ %de  
  \caption{change the unit to kb.}
  \label{fig:change-unit}
\end{figure}

\subsection{geom\_alignment}
\Rfunction{stat\_gene} is deprecated, and \Rfunction{geom\_alignment} is the
lower level API which facilitate construction layer by layer.
<<stat_gene, eval = FALSE>>=
p1 <- ggplot() + geom_alignment(txdb, which = aldoa.gr)
@ %def 


\end{document}