% \VignetteIndexEntry{Part 4: overview layout(circular, karyogram, Manhattan)} 
% \VignetteDepends{} 
% \VignetteKeywords{visualization utilities} 
% \VignettePackage{ggbio}
\documentclass[11pt]{report}
% \usepackage{times}
\usepackage{hyperref}
\usepackage{verbatim}
\usepackage{graphicx}
\usepackage{fancybox}
\usepackage{color}

<<include=FALSE>>=
opts_chunk$set(eval=FALSE)
@

% \setkeys{Gin}{width=0.95\textwidth}
\textwidth=6.5in
\textheight=8.5in
\parskip=.3cm
\parindent = 0cm
\oddsidemargin=-.1in
\evensidemargin=-.1in
\headheight=-.3in

\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\Rcode}[1]{{\texttt{#1}}}

\newcommand{\software}[1]{\textsf{#1}}
\newcommand{\R}{\software{R}}
\newcommand{\gr}{\Rclass{GRanges}}
\newcommand{\Bioc}{\software{Bioconductor}}
\newcommand{\IRanges}{\Rpackage{IRanges}}
\newcommand{\biovizBase}{\Rpackage{biovizBase}}
\newcommand{\ggbio}{\Rpackage{ggbio}}
\newcommand{\visnab}{\Rpackage{visnab}}
\newcommand{\ggplot}{\Rpackage{ggplot2}}
\newcommand{\grid}{\Rpackage{grid}}
\newcommand{\gridExtra}{\Rpackage{gridExtra}}
\newcommand{\qplot}{\Rfunction{qplot}}
\newcommand{\autoplot}{\Rfunction{autoplot}}
\newcommand{\knitr}{\Rpackage{knitr}}
\newcommand{\tracks}{\Rfunction{tracks}}
\newcommand{\chipseq}{\Rpackage{chipseq}}


% my own frambox
\newcommand{\sfbox}[2][Tips]{
\begin{center}
\shadowbox{
  \parbox{0.8\linewidth}{
    \textcolor{blue}{#1:}
    #2
  }
  }
\end{center}
}

\title{Overview layout}
\author{Tengfei Yin}
\date{\today}

\begin{document}
% \setkeys{Gin}{width=0.6\textwidth}
\maketitle
\newpage
\tableofcontents
\newpage

<<setup, include=FALSE, cache=FALSE, eval = TRUE>>=
library(knitr)
opts_chunk$set(fig.path='./figures/ggbio-', 
               fig.align='center', fig.show='asis', 
               eval = TRUE, fig.width = 4,
               dev = 'png',
               fig.height = 4, dpi = 400)
options(replace.assign=TRUE,width=90)
@


<<options,echo=FALSE>>=
options(width=72)
@

\chapter{Circular view}\label{chapter:circle}
\section{Introduction}
Layout "circle" is inspired by \textit{Circos}

graphics and make it a general layout. Layout is generally more complex than a
coordinate transformation, it's a combination of different components like
coordinate transformation(genome and polar), and tracks-based layout,
etc. Especially, circular view is very useful to show links between different
locations. Since we are following the grammar of graphics, aesthetics mapping
are fairly easy in \ggbio{}.

In this tutorial, we will start from the raw data, if you are already familiar
with how to process your data into the right format, which here I mean
\Robject{GRanges},you can jump to \ref{sec:step3} directly.
\section{Tutorial}
\subsection{Step 1: understand the layout circle}
We have discussed about the new coordinate "genome" in vignette about Manhattan
plot before, now this time, it's one step further compared to genome coordinate
transformation. We specify ring radius \Rfunarg{radius} and track width
\Rfunarg{trackWidth} to help transform a linear genome coordinate system to a
circular coordinate system. By using \Rfunction{layout\_circle} function which we
will introduce later.

Before we visualize our data, we need to have something in mind
\begin{itemize}
\item How many tracks we want?
\item Can they be combined into the same data?
\item Do I have chromosomes lengths information?
\item Do I have interesting variables attached as one column?
\end{itemize}


\subsection{Step 2: get your data ready to plot}
Ok, let's start to process some raw data to the format we want. The data used in
this study is from this a
paper\footnote{http://www.nature.com/ng/journal/v43/n10/full/ng.936.html}. In
this example, We are going to
\begin{enumerate}
\item Visualize somatic mutation as segment.
\item Visualize inter,intro-chromosome rearrangement as links.
\item Visualize mutation score as point tracks with grid-background.
\item Add scale and ticks and labels.  
\item To arrange multiple plots and legend. create multiple sample comparison.
\end{enumerate}

Notes: don't put too much tracks on it.  


I simply put script here to get mutation data as `GRanges` object.


<<processing>>=
crc1 <- system.file("extdata", "crc1-missense.csv", package = "biovizBase")
crc1 <- read.csv(crc1)
library(GenomicRanges)
mut.gr <- with(crc1,GRanges(Chromosome, IRanges(Start_position, End_position),
                            strand = Strand))
values(mut.gr) <- subset(crc1, select = -c(Start_position, End_position, Chromosome))
data("hg19Ideogram", package = "biovizBase")
seqs <- seqlengths(hg19Ideogram)
## subset_chr
chr.sub <- paste("chr", 1:22, sep = "")
## levels tweak
seqlevels(mut.gr) <- c(chr.sub, "chrX")
mut.gr <- keepSeqlevels(mut.gr, chr.sub)
seqs.sub <- seqs[chr.sub]
## remove wrong position
bidx <- end(mut.gr) <= seqs.sub[match(as.character(seqnames(mut.gr)),
              names(seqs.sub))]
mut.gr <- mut.gr[which(bidx)]
## assign_seqlengths
seqlengths(mut.gr) <- seqs.sub
## reanme to shorter names
new.names <- as.character(1:22)
names(new.names) <- paste("chr", new.names, sep = "")
new.names
mut.gr.new <- renameSeqlevels(mut.gr, new.names)
head(mut.gr.new)
@ %def 

To get ideogram track, we need to load human hg19 ideogram data, for details
please check another vignette about getting ideogram.


<<ideo>>=
hg19Ideo <- hg19Ideogram
hg19Ideo <- keepSeqlevels(hg19Ideogram, chr.sub)
hg19Ideo <- renameSeqlevels(hg19Ideo, new.names)
head(hg19Ideo)
@ %def 

\subsection{Step 3: low level API: \Rfunction{layout\_circle}}\label{sec:step3}
\Rfunction{layout\_circle} is a lower level API for creating circular plot, it
accepts \Robject{Granges} object, and users need to specify radius, track width,
and other aesthetics, it's very flexible. But keep in mind, you \textbf{have to}
pay attention rules when you make circular plots.

\begin{itemize}
\item For now, \Rfunction{seqlengths}, \Rfunction{seqlevels} and chromosomes
  names should be exactly the same, so you have to make sure data on all tracks
  have this uniform information to make a comparison.
\item Set arguments \Rfunarg{space.skip} to the same value for all tracks, that
  matters for transformation, default is the same, so you don't have to change
  it, unless you want to add/remove space in between.
\item \Rfunarg{direction} argument should be exactly the same, either
  "clockwise" or "counterclockwise".
\item Tweak with your radius and tracks width to get best results.
\end{itemize}


Since low level API leave you as much flexibility as possible, this may looks
hard to adjust, but it can produce various types of graphics which higher levels
API like \autoplot{} hardly can, for instance, if you want to overlap multiple
tracks or fine-tune your layout.

Ok, let's start to add tracks one by one.

First to add a "ideo" track
\begin{figure}[!htpb]
  \centering
<<lower-ideo-track>>=
library(ggbio)
p <- ggplot() + layout_circle(hg19Ideo, geom = "ideo", fill = "gray70",
                              radius = 30, trackWidth = 4)
@ %def   
  \caption{Adding 'ideogram' track.}
  \label{fig:ideo}
\end{figure}
% \clearpage

Then a "scale" track with ticks
\begin{figure}[!htpb]
  \centering
<<lower-scale-track>>=
p <- p + layout_circle(hg19Ideo, geom = "scale", size = 2, radius = 35, trackWidth = 2)
p
@ %def   
  \caption{Adding a 'scale' track.}
  \label{fig:scale}
\end{figure}
% \clearpage

Then a "text" track to label chromosomes. *NOTICE*, after genome coordinate
transformation, original data will be stored in column ".ori", and for mapping,
just use ".ori" prefix to it. Here we use `.ori.seqnames`, if you use
`seqnames`, that is going to be just "genome" character.
\begin{figure}[!htpb]
  \centering
<<lower-text-track>>=
p <- p + layout_circle(hg19Ideo, geom = "text", aes(label = seqnames), vjust = 0,
                       radius = 38, trackWidth = 7)
p
@ %def 
  \caption{Adding a 'text' track.}
  \label{fig:text}
\end{figure}
% \clearpage

Then a "rectangle" track to show somatic mutation, this will looks like vertical
segments. 
\begin{figure}[!htpb]
  \centering
<<lower-mut-track>>=
p <- p + layout_circle(mut.gr, geom = "rect", color = "steelblue",
                       radius = 23 ,trackWidth = 6)
p
@ %def   
  \caption{Adding a segment track to show mutation.}
  \label{fig:mut}
\end{figure}
% \clearpage

Next, we need to add some "links" to show the rearrangement, of course, links
can be used to map any kind of association between two or more different
locations to indicate relationships like copies or fusions.


<<links>>=
rearr  <- read.csv(system.file("extdata", "crc-rearrangment.csv", package = "biovizBase"))
## start position
gr1 <- with(rearr, GRanges(chr1, IRanges(pos1, width = 1)))
## end position
gr2 <- with(rearr, GRanges(chr2, IRanges(pos2, width = 1)))
## add extra column
nms <- colnames(rearr)
.extra.nms <- setdiff(nms, c("chr1", "chr2", "pos1", "pos2"))
values(gr1) <- rearr[,.extra.nms]
## remove out-of-limits data
seqs <- as.character(seqnames(gr1))
.mx <- seqlengths(hg19Ideo)[seqs]
idx1 <- start(gr1) > .mx
seqs <- as.character(seqnames(gr2))
.mx <- seqlengths(hg19Ideo)[seqs]
idx2 <- start(gr2) > .mx
idx <- !idx1 & !idx2
gr1 <- gr1[idx]
seqlengths(gr1) <- seqlengths(hg19Ideo)
gr2 <- gr2[idx]
seqlengths(gr2) <- seqlengths(hg19Ideo)
@ %def 

To create a suitable structure to plot, please use another `GRanges` to
represent the end of the links, and stored as elementMetadata for the "start
point" `GRanges`. Here we named it as "to.gr" and will be used later.
 
<<link-data>>=
values(gr1)$to.gr <- gr2
## rename to gr
gr <- gr1
@ %def 

Here we show the flexibility of *ggbio*, for example, if you want to use color
to indicate your links, make sure you add extra information in the data, used
for mapping later. Here in this example, we use "intrachromosomal" to label
rearrangement within the same chromosomes and use "interchromosomal" to label
rearrangement in different chromosomes.
 
<<rearr>>=
values(gr)$rearrangements <- ifelse(as.character(seqnames(gr))
                                    == as.character(seqnames((values(gr)$to.gr))),
                                    "intrachromosomal", "interchromosomal")
@ %def 

Get subset of links data for only one sample "CRC1"
 
<<subset-crc-1>>=
gr.crc1 <- gr[values(gr)$individual == "CRC-1"]
@ %def 

Ok, add a "point" track with grid background for rearrangement data and map `y`
to variable "score", map `size` to variable "tumreads", rescale the size to a
proper size range.
 
<<lower-point-track>>=
p <- p + layout_circle(gr.crc1, geom = "point", aes(y = score, size = tumreads), color = "red",
                       radius = 12 ,trackWidth = 10, grid = TRUE) +
  scale_size(range = c(1, 2.5))
p
@ %def 

% \clearpage
Finally, let's add links and map color to rearrangement types. Remember you need
to specify `linked.to` to the column that contain end point of the data.
 \begin{figure}[!htpb]
  \centering
<<lower-link-track>>=
p <- p + layout_circle(gr.crc1, geom = "link", linked.to = "to.gr", aes(color = rearrangements),
                       radius = 10 ,trackWidth = 1)
p
@ %def   
  \caption{A link track is added to the circular plot.}
  \label{fig:links}
\end{figure}

\subsection{Step 4: Complex arragnment of plots}
In this step, we are going to make multiple sample comparison, this may require
some knowledge about package \Rpackage{grid} and \Rpackage{gridExtra}. We will
introduce a more easy way to combine your graphics later after this.

We just want 9 single circular plots put together in one page, since we cannot
keep too many tracks, we only keep ideogram and links. Here is one sample.
\begin{figure}[!htpb]
  \centering
 
<<single-arr>>=
cols <- RColorBrewer::brewer.pal(3, "Set2")[2:1]
names(cols) <- c("interchromosomal", "intrachromosomal")

p0 <- ggplot() + layout_circle(gr.crc1, geom = "link", linked.to = "to.gr",
                              aes(color = rearrangements), radius = 7.1) +
  layout_circle(hg19Ideo, geom = "ideo", trackWidth = 1.5,
                color = "gray70", fill = "gray70") +
  scale_color_manual(values = cols)  
p0
@ %def   
  \caption{Just to show single individuals crc1.}
  \label{fig:single-arr}
\end{figure}


<<arrangement>>=
grl <- split(gr, values(gr)$individual)
## need "unit", load grid
library(grid)
lst <- lapply(grl, function(gr.cur){
  print(unique(as.character(values(gr.cur)$individual)))
  cols <- RColorBrewer::brewer.pal(3, "Set2")[2:1]
  names(cols) <- c("interchromosomal", "intrachromosomal")
  p <- ggplot() + layout_circle(gr.cur, geom = "link", linked.to = "to.gr",
                         aes(color = rearrangements), radius = 7.1) +
                           layout_circle(hg19Ideo, geom = "ideo", trackWidth = 1.5,
                         color = "gray70", fill = "gray70") +
                           scale_color_manual(values = cols)  + 
     labs(title = (unique(values(gr.cur)$individual))) +
       theme(plot.margin = unit(rep(0, 4), "lines"))
})
@ %def

We wrap the function in grid level to a more user-friendly high level function,
called \Rfunction{arrangeGrobByParsingLegend}. You can pass your ggplot2
graphics to this function , specify the legend you want to keep on the right,
you can also specify the column/row numbers. Here we assume all plots we have
passed follows the same color scale and have the same legend, so we only have to
keep one legend on the right.

 
<<simple-wrapper>>=
arrangeGrobByParsingLegend(lst, widths = c(4, 1), legend.idx = 1, ncol = 2)
@ %def   

\chapter{Manhattan plot}\label{chapter:man}
\section{Introduction}
In this tutorial, we introduce a new coordinate system called "genome" for
genomic data. This transformation is to put all chromosomes on the same genome
coordinates following specified orders and adding buffers in between.  One may
think about facet ability based on \textit{seqnames}, it can produce something
similar to \textit{Manhattan
  plot}\footnote{http://en.wikipedia.org/wiki/Manhattan}, but the view will not
be compact. What's more, genome transformation is previous step to form a
circular view. In this tutorial, we will simulate some SNP data and use this
special coordinate and a specialized function \Rfunction{plotGrandLinear} to
make a Manhattan plot.

\textit{Manhattan plot} is just a special use design with this coordinate system.

\section{Understand the new coordinate}
Let's load some packages and data first
 
<<>>=
library(ggbio)
data(hg19IdeogramCyto, package = "biovizBase")
data(hg19Ideogram, package = "biovizBase")
library(GenomicRanges)
@ %def 


Make a minimal example `GRanges`, and see what the default coordiante looks
like, pay attention that, by default, the graphics are faceted by `seqnames` as
shown in Figure \ref{fig:simul_gr}

\begin{figure}[!htpb]
  \centering
<<simul_gr, fig.height = 4>>=
library(biovizBase)
gr <- GRanges(rep(c("chr1", "chr2"), each = 5),
              IRanges(start = rep(seq(1, 100, length = 5), times = 2),
                      width = 50))
autoplot(gr, aes(fill = seqnames))
@ %def 
  \caption{Default grahpics is faceted by seqnames}
  \label{fig:simul_gr}
\end{figure}

What if we specify the coordinate system to be "genome" in \autoplot{} function,
there is no faceting anymore, the two plots are merged into one single genome
space, and properly labeled as shown in Figure \ref{fig:coord-genome}
% There is a limitation on integer in \R{}, so the
% genome space cannot be too long, to overcome this limitation, a default argument
% called `maxSize` is defined with this function, if the genome space is over
% limits, it will rescale everything automatically, function `tranformToGenome`
% with return a transformed `GRanges` object, with only one single `seqnames`
% called "genome" and the `seqlengths` of it, is just genome space(with buffering
% region). arguments called `space.ratio` control the skipped region between
% chromosomes.

\begin{figure}[!htpb]
   \centering
<<coord-genome, fig.height = 4>>=
autoplot(gr, coord = "genome", aes(fill = seqnames))
@ %def   
  \caption{Coordinate genome}
  \label{fig:coord-genome}
\end{figure}


The internal transformation are implemented into the function
\Rfunction{transformToGenome}. And there is some simple way to test if a
\Robject{GRanges} object is transformed to coordinate "genome" or not

 
<<is>>=
gr.t <- transformToGenome(gr)
head(gr.t)
is_coord_genome(gr.t)
metadata(gr.t)$coord
@ %def 


\section{Step 2: Simulate a SNP data set}
Let's use the real human genome space to simulate a SNP data set.
 
<<simul_snp>>=
chrs <- as.character(levels(seqnames(hg19IdeogramCyto)))
seqlths <- seqlengths(hg19Ideogram)[chrs]
set.seed(1)
nchr <- length(chrs)
nsnps <- 100
gr.snp <- GRanges(rep(chrs,each=nsnps),
                  IRanges(start =
                          do.call(c, lapply(chrs, function(chr){
                            N <- seqlths[chr]
                            runif(nsnps,1,N)
                          })), width = 1),
                  SNP=sapply(1:(nchr*nsnps), function(x) paste("rs",x,sep='')),
                  pvalue =  -log10(runif(nchr*nsnps)),
                  group = sample(c("Normal", "Tumor"), size = nchr*nsnps,
                    replace = TRUE)
                  )
genome(gr.snp) <- "hg19"
gr.snp
@ %def 


We use the some trick to make a shorter names.
 
<<shorter>>=
seqlengths(gr.snp)
nms <- seqnames(seqinfo(gr.snp))
nms.new <- gsub("chr", "", nms)
names(nms.new) <- nms
gr.snp <- renameSeqlevels(gr.snp, nms.new)
seqlengths(gr.snp)
@ %def 


\section{Step 3: Start to make Manhattan plot by using \autoplot{}}
wrapped basic functions into \autoplot{}, you can specify the coordinate. Figure
\ref{fig:unorder} shows what the unordered object looks like.

\begin{figure}[!htpb]
  \centering
<<unorder, fig.height = '0.5\\textwidth', fig.height = 4>>=
autoplot(gr.snp, coord = "genome", geom = "point", aes(y = pvalue), space.skip = 0.01)
@ %def   
  \caption{Unordred Manhattan plot}
  \label{fig:unorder}
\end{figure}


That's probably not what you want, if you want to change to specific order, just
sort them by hand and use `keepSeqlevels`. Figure \ref{fig:sort} shows a sorted
plot.

\begin{figure}[!htpb]
  \centering
<<sort,fig.height = 4>>=
gr.snp <- keepSeqlevels(gr.snp, c(1:22, "X", "Y"))
values(gr.snp)$highlight <- FALSE
idx <- sample(1:length(gr.snp), size = 15)
values(gr.snp)$highlight[idx] <- TRUE
values(gr.snp)$id <- 1:length(gr.snp)
p <- autoplot(gr.snp, coord = "genome", geom = "point", aes(y = pvalue), space.skip = 0.01)


@ %def 
  \caption{Sorted data for Manhattan plot}
  \label{fig:sort}
\end{figure}


\textbf{NOTICE}: the data now doesn't have information about lengths of each
chromosomes, this is allowed to be plotted, but it's misleading sometimes,
without chromosomes lengths information, \ggbio{} use data space to make
estimated lengths for you, this is not accurate! So let's just assign
\Rfunction{seqlengths} to the object. Then you will find the data space now is
distributed proportional to real space as shown in Figure \ref{fig:with-seql}.

\begin{figure}[!htpb]
  \centering
<<with_seql, fig.height = 4>>=
names(seqlths) <- gsub("chr", "", names(seqlths))
seqlengths(gr.snp) <- seqlths[names(seqlengths(gr.snp))]
autoplot(gr.snp, coord = "genome", geom = "point", aes(y = pvalue), space.skip = 0.01)
@ %def 
  \caption{Manhattan plot after setting seqlengths to the data, the data space now is
    distributed proportional to real chromosome space.}
\label{fig:with-seql}
\end{figure}

In \autoplot{}, argument \Rfunarg{coord} is just used to transform the data,
after that, you can use it as common \Robject{GRanges}, all other geom/stat
works for it. Here just show a simple example for another geom "line" as shown
in Figure \ref{fig:line}

\begin{figure}[!htpb]
   \centering
<<line, fig.height = 4>>=
autoplot(gr.snp, coord = "genome", geom = "line", aes(y = pvalue, group = seqnames,
                                     color = seqnames))
@  %def   
  \caption{Use line to represent the data in typical Manhattan plot.}
  \label{fig:line}
\end{figure}


\section{Convenient \Rfunction{plotGrandLinear} function}
In \ggbio{}, sometimes we develop specialized function for certain types of
plots, it's basically a wrapper over lower level API and \autoplot{}, but more
convenient to use. Here for \textit{Manhattan plot}, we have a function called
\Rfunction{plotGrandLinear} used for it. aes(y = ) is required to indicate the y
value, e.g.  p-value. Figure \ref{fig:plotGl} shows a defalut graphic.
\begin{figure}[!htpb]
  \centering
<<plotGrandLinear, fig.height = 4>>=
gro <- GRanges(c("1", "2"), IRanges(c(100, 200), width = 5e7))
plotGrandLinear(gr.snp, aes(y = pvalue), highlight.gr = gro)
names(gro) <- c("id1", "id2")
plotGrandLinear(gr.snp, aes(y = pvalue), highlight.gr = gro) + theme_clear() + 
  theme(legend.position  = "none")
plotGrandLinear(gr.snp, aes(y = pvalue), highlight.gr = gro) + theme_clear() + 
  theme(legend.position  = "none")
.trans <- 0.5
names(.trans) <- "1"
plotGrandLinear(gr.snp, aes(y = pvalue), highlight.gr = gro, chr.weight = .trans) + theme_clear() + 
  theme(legend.position  = "none")

@ %def   
  \caption{Default Manhattan plot by calling plotGrandLinear function}
  \label{fig:plotGl}
\end{figure}


Color mapping is automatically figured out by *ggbio* following the rules
\begin{itemize}
\item if \Rfunarg{color} present in \Rcode{aes()}, like \Rcode{aes(color =
    seqnames)}, it will assume it's mapping to data column called 'seqnames'.
\item if \Rfunarg{color} is not wrapped in \Rcode{aes()}, then this function
  will \textbf{recylcle} them to all chromosomes.
\item if \Rfunarg{color} is single character representing color, then just use
  one arbitrary color.
\end{itemize}
 
Let's test some examples for controling colors.
\begin{figure}[!htpb]
   \centering
<<morecolor1, fig.height = 4>>=
plotGrandLinear(gr.snp, aes(y = pvalue, color = seqnames))
p = plotGrandLinear(gr.snp, aes(y = pvalue))
p + geom_point(data = as.data.frame(p$.data[values(p$.data)$highlight]), 
               aes(x = start, y = pvalue), color = "red", size = 3)

@ %def   
  \caption{Color mapped to chromosome names.}
  \label{fig:more1}
\end{figure}

\begin{figure}[!htpb]
  \centering
<<morecolor2, fig.height = 4>>=
plotGrandLinear(gr.snp, aes(y = pvalue), color = c("green", "deepskyblue"))
@ %def   
  \caption{Color follow 'green' and 'deepskyblue' order for all chromosome space.}
  \label{fig:more2}
\end{figure}

\begin{figure}[!htpb]
  \centering
<<morecolor3, fig.height = 4>>=
plotGrandLinear(gr.snp, aes(y = pvalue), color = c("green", "deepskyblue", "red"))
@ %def   
\caption{Color follow three colors pattern: 'green','deepskyblue', 'red'}
  \label{fig:more3}
\end{figure}

\begin{figure}[!htpb]
  \centering
 
<<morecolor4, fig.height = 4>>=
plotGrandLinear(gr.snp, aes(y = pvalue), color = "red")
@ %def   
  \caption{Unique color for all.}
  \label{fig:more4}
\end{figure}


You can also add cutoff line as shown in Figure \ref{fig:cutoff}.
 \begin{figure}[!htpb]
   \centering
<<cutoff, fig.height = 4>>=
plotGrandLinear(gr.snp, aes(y = pvalue), cutoff = 3, cutoff.color = "blue", cutoff.size = 4)
@ %def   
  \caption{Set cutoff on the Manhattan plot. The 'blue' line shows cutoff at value 3.}
  \label{fig:cutoff}
\end{figure}


This is equivalent to \ggplot{} 's API.
 
<<cutoff-low, eval = FALSE>>=
plotGrandLinear(gr.snp, aes(y = pvalue)) + geom_hline(yintercept = 3, color = "blue", size = 4)
@ %def 

Sometimes the names of chromosomes maybe very long, you may want to rotate them, 
let's make a longer name first
 
<<longer>>=
## let's make a long name
nms <- seqnames(seqinfo(gr.snp))
nms.new <- paste("chr00000", nms, sep = "")
names(nms.new) <- nms
gr.snp <- renameSeqlevels(gr.snp, nms.new)
seqlengths(gr.snp)
autoplot(gr) + coord_genome()
@ %def 

Then rotate it!
 \begin{figure}[rotate]
   \centering
<<rotate, fig.height = 4>>=
plotGrandLinear(gr.snp, aes(y = pvalue)) + theme(axis.text.x=theme_text(angle=-90, hjust=0))
@ %def   
  \caption{Rotate the x lable to save space.}
  \label{fig:rotate}
\end{figure}

% \clearpage
As you can tell from above examples, all utilities works for \ggplot{} will work
for \ggbio{} too.

\chapter{Karyogram overview}
\section{Introduction}
A karyotype is the number and appearance of chromosomes in the nucleus of a
eukaryotic cell\footnote{http://en.wikipedia.org/wiki/Karyotype}. It's one kind
of overview when we want to show distribution of certain events on the genome,
for example, binding sites for certain protein, even compare them acroos samples
as example shows in this section.

\Robject{GRanges} and \Robject{Seqinfo} object are also an ideal container for
storing data needed for karyogram plot. Here is the strategy we used for
generating ideogram templates.
\begin{itemize}
\item Althouth \Robject{seqlengths} is not required, it's highly recommended for
  plotting karyogram. If a \Robject{GRanges} object contains
  \Robject{seqlengths}, we know exactly how long each chromosome is, and will
  use this information to plot genome space, particularly we plot all levels
  included in it, \textbf{NOT JUST} data space.
\item If a \Robject{GRanges} has no \Robject{seqlengths}, we will issue a
  warning and try to estimate the chromosome lengths from data included. This is
  \textbf{NOT} accurate most time, so please pay attention to what you are going
  to visualize and make sure set \Robject{seqlengths} before hand.
\end{itemize}

\section{\Rfunction{autoplot}}
Let's first introduce how to use \autoplot{} to generate karyogram graphic.

The most easy one is to just plot Seqinfo by using \autoplot{}, if your \gr{}
object has seqinfo with seqlengths information.

<<>>=
data(hg19Ideogram, package = "biovizBase")
chrs <- paste0("chr", 1:20)
p <- autoplot(seqinfo(hg19Ideogram)[chrs])
p
@

% Then you could add additional more data(\Rclass{GRanges} object) on this
% overview, for examle, if you have a set of cytoband information, and
% \Rfunction{scale\_fill\_giemsa} did the trick to correct the color.

%<<fig = TRUE>>=
%hg19c <- hg19IdeogramCyto
%hg19c <- ggbio:::subsetByChrs(hg19c, chrs)
%seqlevels(hg19c) <- seqlevels(hg19Ideogram)
%seqlengths(hg19c) <- seqlengths(hg19Ideogram)
%p + layout_karyogram(hg19c, aes(fill = gieStain))
%p + layout_karyogram(hg19c, aes(fill = gieStain)) +  scale_fill_giemsa()
%@ %def 


Even more typical karyogram overview with cytoband, this will even show the
arms, two required columns are required 'name' and 'gieStain'.

<<default, fig.width = 5, fig.height = 4>>=
data(hg19IdeogramCyto, package = "biovizBase")
head(hg19IdeogramCyto)
p <- autoplot(hg19IdeogramCyto, layout = "karyogram", cytoband = TRUE) 
@ %def   

\sfbox{Your turn: change the order of chromosomes.}

% To understand why we call it kayogram, let's first visualize some cytoband. We
% use \Rfunarg{layout} argument to specify this special layout "karyogram". And
% under this layout, \Rfunarg{cytoband} argument is acceptable, default is
% \Rcode{FALSE}, if set to \Rcode{TRUE}, we assume your have additional
% information associated with the data, stored in column \Rcode{gieStain}, it will
% try to fill colors based on this variable according to a pre-set staining
% colors. You may notice, this data set doesn't contain seqlengths information,
% but the data space actually cover the real space, so it's not going to be a
% problem.

% <<default, fig = TRUE>>=
% data(hg19IdeogramCyto, package = "biovizBase")
% autoplot(hg19IdeogramCyto, layout = "karyogram", cytoband = TRUE) 
% @ %def   

% You may want to change the order of chromosomes, \Rfunction{keepSeqlevels} are
% convenient for this purpose, it's defined in package \Rpackage{GenomicRanges}.

% <<change-order, fig = TRUE>>=
% hg19 <- keepSeqlevels(hg19IdeogramCyto, paste0("chr", c(1:22, "X", "Y")))
% head(hg19)
% autoplot(hg19, layout = "karyogram", cytoband = TRUE)
% @ %def   

% This \Robject{GRanges} object is special, contains two required colomns, 'name'
% and 'giestain', in this case, \Rfunarg{cytoband} argument could set to
% \Rcode{TRUE}, and we draw special ideogram not just rectangles but show
% \textbf{centromere} as possible.

% If we set it to \Rcode{FALSE}, we treat it as a normal \Robject{GRanges},
% nothing special with cytoband and arm information. So to show the cytoband, we
% need to specify which color column variable to fill as cytoband, function
% \Rfunction{aes} use an unevaluated expression like \Rcode{fill = gieStain},
% \textit{gieStain} is column name which store cytoband color, notice that we
% don't use quotes around it, this means it's not something defined globally, but
% some column name defined in the data. The system will usually automatically
% assign categorical colors to represent this variable. But instead, cytoband
% already have some pre-defined colors which mimic the color you observed under
% microscope. Function \Rfunction{scale\_fill\_giemsa} did this trick to correct
% the color. If it's first time you observe usage by \Rcode{+}, it's a very
% popular API in package \ggplot{}\footnote{http://had.co.nz/ggplot2/}, which
% could add graphics layer by layer or revise a existing graphic.
% \begin{figure}[!htpb]
%   \centering
% <<cyto-normal, fig = TRUE>>=
% library(GenomicRanges)
% ## it's a 'ideogram'
% biovizBase::isIdeogram(hg19)
% ## set to FALSE
% autoplot(hg19, layout = "karyogram", cytoband = FALSE, aes(fill = gieStain)) +
%   scale_fill_giemsa()
% @ %def   
% \caption{Cytoband on karyogram layout. We treat it as normal \Robject{GRanges}
%   data set, so we fill with gieStain color, and use
%   \Rfunction{scale\_fill\_giemsa} to use customized color. Notice the difference
%   if it's not a 'ideogram' object. we don't draw centromere particularly.}
% \label{fig:cytoband-custom}
% \end{figure}
% % \clearpage

% Let's try a different data set which is not an 'ideogram', but a normal
% \Robject{GRanges} object that most people will have, extra data such as
% statistical values or categorical levels are stored in element data columns used
% for aesthetics mapping.

We use a default data in package \Rpackage{biovizBase}, which is a subset of RNA
editing set in human. The data involved in this \Robject{GRanges} is sparse, so
we cannot simply use it to make karyogram, otherwise, the estimated chromosome
lengths will be very rough and inaccurate. So what we need to do is:

\begin{enumerate}
\item Adding seqlegnths to this \Robject{GRanges} object. If you adding
  seqlengths to object, we have two ways to show chromosome space as karyogram. 
  \\\Rcode{autoplot(object, layout = 'karyogram')} or 
  \\\Rcode{autoplot(seqinfo(object))}.
\item Changing the order of chromosomes.  
\item Visualize it and map variable to different aesthetics.  
\end{enumerate}

<<load-RNAediting>>=
data(darned_hg19_subset500, package = "biovizBase")
dn <- darned_hg19_subset500
head(dn)
## add seqlengths
## we have seqlegnths information in another data set
data(hg19Ideogram, package = "biovizBase")
seqlengths(dn) <- seqlengths(hg19Ideogram)[names(seqlengths(dn))]
## now we have seqlengths
head(dn)
## then we change order
dn <- keepSeqlevels(dn, paste0("chr", c(1:22, "X")))
autoplot(dn, layout = "karyogram")
## this equivalent to 
## autoplot(seqinfo(dn))
@ %def   


Then we take one step further, the power of \ggplot{} or \ggbio{} is the
flexible multivariate data mapping ability in graphics, make data exploration
much more convenient. In the following example, we are trying to map a
categorical variable 'exReg' to color, this variable is included in the data,
and have three levels, '3' indicate 3' utr, '5' means 5' utr and 'C' means
coding region. We have some missing values indicated as \Rcode{NA}, in default,
it's going to be shown in gray color, and keep in mind, since the basic
geom(geometric object) is rectangle, and genome space is very large, so change
both color/fill color of the rectangle to specify both border and filled color
is necessary to get the data shown as different color, otherwise if the region
is too small, border color is going to override the fill color.

<<load-RNAediting-color>>=
## since default is geom rectangle, even though it's looks like segment
## we still use both fill/color to map colors
autoplot(dn, layout = "karyogram", aes(color = exReg, fill = exReg))
@ %def   

Or you can set the missing value to particular color you want.

Note: NA values is not shown on the legend.

<<load-RNAediting-color-NA>>=
## since default is geom rectangle, even though it's looks like segment
## we still use both fill/color to map colors
autoplot(dn, layout = "karyogram", aes(color = exReg, fill = exReg)) +
  scale_color_discrete(na.value = "brown")
@ %def   

% A test could be performed to demonstrate why 'seqlengths' of object
% \Robject{GRanges} is important. Let's assume we set wrong chromosome lengths by
% accident, lengths are all equal to chromosome 1. We arbitrarily set it to the
% same number so that every chromosome are of equal length. From Figure
% \ref{fig:exReg-NA-fake}, it's clear that this will affect what we see. So please
% make sure 
% \begin{itemize}
% \item You get data space cover exactly the same chromosome space for each
%   chromosome. or
% \item You set the seqlengths to the right number.
% \end{itemize}
% Otherwise you will see weird pattern from your results, so actually it's a good
% way to test your raw data too, if you raw data have something beyond chromosome
% space, you need to dig into it to see what happened.


%<<load-RNAediting-color-fake, fig = TRUE>>=
%dn2 <- dn
%seqlengths(dn2) <- rep(max(seqlengths(dn2)), length(seqlengths(dn2)))
%autoplot(dn2, layout = "karyogram", aes(color = exReg, fill = exReg)) 
%@ %def   

\section{\Rfunction{plotKaryogram}}
\Rfunction{plotKaryogram} (or \Rfunction{plotStackedOverview}) are specialized
function to draw karyogram graphics. It's actually what function \autoplot{}
calls inside. API is a littler simpler because layout 'karyogram' is default in
these two functions. So equivalent usage is like
<<plotKaryogram, eval = FALSE>>=
plotKaryogram(dn)
plotKaryogram(dn, aes(color = exReg, fill = exReg))
@ %def 


\section{\Rfunction{layout\_karyogram}}
In this section, a lower level function \Rfunction{layout\_karyogram} is going
to be introduced. This is convenient API for constructing karyogram plot and
adding more data layer by layer. Function \Rfunction{ggplot} is just to create
blank object to add layer on.

You need to pay attention to 
\begin{itemize}
\item when you add plots layer by layer, seqnames of different data must be the
  same to make sure the data are mapped to the same chromosome. For example, if
  you name chromosome following schema like \textit{chr1} and use just number
  \textit{1} to name other data, they will be treated as different chromosomes.
\item cannot use the same aesthetics mapping multiple time for different
  data. For example, if you have used aes(color = ), for one data, you cannot
  use aes(color = ) anymore for mapping variables from other add-on data, this
  is currently not allowed in \ggplot{}, even though you expect multiple color
  legend shows up, this is going to confuse people which is which. HOWEVER,
  \Rfunarg{color} or \Rfunarg{fill} without \Rcode{aes()} wrap around, is
  allowed for any track, it's set single arbitrary color. This is shown in
  Figure \ref{fig:low-default-addon}.
\item Default rectangle y range is [0, 10], so when you add on more data layer
  by layer on existing graphics, you can use \Rfunarg{ylim} to control how to
  normalize your data and plot it relative to chromosome space. For example,
  with default, chromosome space is plotted between y [0, 10], if you use
  \Rcode{ylim = c(10 , 20)}, you will stack data right above each chromosomes
  and with equal width. For geom like 'point', which you need to specify 'y'
  value in \Rcode{aes()}, we will add 5\% margin on top and at bottom of that
  track.
\end{itemize}

<<low-default>>=
## plot ideogram
p <- ggplot(hg19) + layout_karyogram(cytoband = TRUE)
p
## eqevelant autoplot(hg19, layout = "karyogram", cytoband = TRUE)
@ %def   

<<low-default-addon>>=
p <- p + layout_karyogram(dn, geom = "rect", ylim = c(11, 21), color = "red")
## commented line below won't work
## the cytoband fill color has been used already.
## p <- p + layout_karyogram(dn, aes(fill = exReg, color = exReg), geom = "rect")
p
@ %def   

Then we construct another multiple layer graphics for multiple data using
different geom, suppose we want to show RNA-editing sites on chromosome space as
rectangle(looks like segment in graphic) and stack a line for another track
above.

<<edit-space>>=
## plot chromosome space
p <- autoplot(seqinfo(dn))
## make sure you pass rect as geom
## otherwise you just get background
p <- p + layout_karyogram(dn, aes(fill = exReg, color = exReg), geom = "rect")
values(dn)$pvalue <- rnorm(length(dn))
p + layout_karyogram(dn, aes(x = start, y = pvalue), 
                     ylim = c(10, 30), geom = "line", color = "red")
p
@ %def 

\end{document}