% \VignetteIndexEntry{Part 2: Single chromosome ideogram.} % \VignetteDepends{} % \VignetteKeywords{visualization utilities} % \VignettePackage{ggbio} \documentclass[11pt,a4paper]{article} % \usepackage{times} \usepackage{hyperref} \usepackage{verbatim} \usepackage{graphicx} \usepackage{fancybox} \usepackage{color} <>= opts_chunk$set(eval=FALSE) @ % \setkeys{Gin}{width=0.95\textwidth} \textwidth=6.5in \textheight=8.5in \parskip=.3cm \parindent = 0cm \oddsidemargin=-.1in \evensidemargin=-.1in \headheight=-.3in \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rcode}[1]{{\texttt{#1}}} \newcommand{\software}[1]{\textsf{#1}} \newcommand{\R}{\software{R}} \newcommand{\Bioc}{\software{Bioconductor}} \newcommand{\IRanges}{\Rpackage{IRanges}} \newcommand{\biovizBase}{\Rpackage{biovizBase}} \newcommand{\ggbio}{\Rpackage{ggbio}} \newcommand{\visnab}{\Rpackage{visnab}} \newcommand{\ggplot}{\Rpackage{ggplot2}} \newcommand{\grid}{\Rpackage{grid}} \newcommand{\gridExtra}{\Rpackage{gridExtra}} \newcommand{\qplot}{\Rfunction{qplot}} \newcommand{\autoplot}{\Rfunction{autoplot}} \newcommand{\gr}{\Rfunction{GRanges}} \newcommand{\knitr}{\Rpackage{knitr}} \newcommand{\tracks}{\Rfunction{tracks}} \newcommand{\chipseq}{\Rpackage{chipseq}} % my own frambox \newcommand{\sfbox}[2][Tips]{ \begin{center} \shadowbox{ \parbox{0.8\linewidth}{ \textcolor{blue}{#1:} #2 } } \end{center} } \title{Single chromosome ideogram} \author{Tengfei Yin} \date{\today} \begin{document} % \setkeys{Gin}{width=0.6\textwidth} \maketitle \newpage \tableofcontents \newpage <>= library(knitr) opts_chunk$set(fig.path='./figures/ggbio-', fig.align='center', fig.show='asis', eval = TRUE, fig.width = 5, fig.height = 5) options(replace.assign=TRUE,width=90) @ <>= options(width=72) @ \section{Introduction} \textbf{Single Chromosome Ideogram:} it is widely used in most track-based genome browsers, usually on top of all tracks, and use an indicator such as a highlighted window to indicate current zoomed region being viewed for different data tracks below, in this case, users won't lose too much context when zoomed into certain region. We are going to introduce two types of single chromosome visualization in this vignette. \begin{itemize} \item The first one is used to be embedded into tracks as an overview, it's not a simple \Rclass{ggplot} object. Only one highlighted rectangle are allowed to be plotted on top of it. We will focus mostly on this type of visualization in this vignette. In \ggbio{}, this object belongs to a special class called 'ideogram', which has several effect which will be introduced later. \item If you want to render more data on single chromosome visualization, you have to use a special case for karyogram overview, which contains only one chromosome, more information about karyogram overview could be found in another vignettes about overview visualization. \end{itemize} % \item \textbf{Karyogram Overview:} could be used to plot one or more than one % chromosomes as an overview for your data, support multiple chromosome or % single chromosome cross multiple samples comparison, please check the examples % in this vignette. This should be used as an individual plot, not to be % embedded into tracks. More important, the returned object is a normal 'ggplot' % object, could be revised easily in anyway. % \end{itemize} \section{Single chromosome visualization} \subsection{Single chromosome use to be embedded in tracks.} \Rfunction{plotIdeogram} is a wrapper function around some functionality in package \Rpackage{rtracklayer} to help download cytoband table from UCSC automatically and return a graphic object with class 'ideogram'. \begin{itemize} \item If you don't pass genome name, it is going to ask your option from available genomes. NOTE: not all genome has cytoband information, if nocytoband information is available, only seqlengths information will be returned and a message will be printed. When \textit{cytoband} information is available, the arm of chromosomes could be inferred, and plotted as you expected. You could always use \Rfunarg{cytoband} argument to control it. \item If argument \Rfunarg{subchr} is not specified, the first chromosomes is going to be used. \end{itemize} <>= p <- plotIdeogram() @ %def \begin{verbatim} Please specify genome 1: hg19 2: hg18 3: hg17 4: hg16 5: felCat4 6: felCat3 7: galGal4 8: galGal3 9: galGal2 10: panTro3 11: panTro2 12: panTro1 13: bosTau7 14: bosTau6 15: bosTau4 16: bosTau3 17: bosTau2 18: canFam3 19: canFam2 20: canFam1 21: loxAfr3 22: fr3 23: fr2 24: fr1 25: nomLeu1 26: gorGor3 27: cavPor3 28: equCab2 29: equCab1 30: petMar1 31: anoCar2 32: anoCar1 33: calJac3 34: calJac1 35: oryLat2 36: myoLuc2 37: mm10 38: mm9 39: mm8 40: mm7 41: hetGla1 42: monDom5 43: monDom4 44: monDom1 45: ponAbe2 46: chrPic1 47: ailMel1 48: susScr2 49: ornAna1 50: oryCun2 51: rn5 52: rn4 53: rn3 54: rheMac2 55: oviAri1 56: gasAcu1 57: echTel1 58: tetNig2 59: tetNig1 60: melGal1 61: macEug2 62: xenTro3 63: xenTro2 64: xenTro1 65: taeGut1 66: danRer7 67: danRer6 68: danRer5 69: danRer4 70: danRer3 71: ci2 72: ci1 73: braFlo1 74: strPur2 75: strPur1 76: apiMel2 77: apiMel1 78: anoGam1 79: droAna2 80: droAna1 81: droEre1 82: droGri1 83: dm3 84: dm2 85: dm1 86: droMoj2 87: droMoj1 88: droPer1 89: dp3 90: dp2 91: droSec1 92: droSim1 93: droVir2 94: droVir1 95: droYak2 96: droYak1 97: caePb2 98: caePb1 99: cb3 100: cb1 101: ce10 102: ce6 103: ce4 104: ce2 105: caeJap1 106: caeRem3 107: caeRem2 108: priPac1 109: aplCal1 110: sacCer3 111: sacCer2 112: sacCer1 Selection: \end{verbatim} After first plotting, the data is automatically hooked with the graphic object, when you do edit and zooming, it will NOT download it anymore, and you can even change the view to another chromosomes. That's the special part about object with class 'ideogram'. <>= library(ggbio) ## requrie connection p <- plotIdeogram(genome = "hg19") p p <- plotIdeogram(genome = "hg19", cytoband = FALSE) p ## the data stored with p, won't download again for zooming head(attr(p, "ideogram.data")) @ %def \sfbox{ \Rfunarg{aspect.ratio} by default is 1/20, if you set it to \Rcode{NULL}, you have to resize the graphic device manually. You can always set the aspect.ration in theme() function of \ggplot{} by \Rcode{+theme(aspect.ratio = )}} You can always download the data manualy and save it and use it later, the function used called \Rfunction{getIdoegram} in package \Rpackage{biovizBase}. Or more flexible relevant function in package \Rpackage{rtracklayer}. The data \textit{hg19IdeogramCyto} is a default data of human in \ggbio{}. What if you cannot get cytoband information from UCSC, but have the data available in hand? You can construct the \Robject{GRanges} object manually, but have to satisfy following restriction: Object have to has elementMeta columns: \begin{itemize} \item name: start with p or q. to tell the different arms of chromosomes. such as \textbf{p36.22} and \textbf{q12}. \item gieStain: dye color of cytoband. such as \textbf{gneg}. \end{itemize} % We say it's a ideogram \Rclass{GRanges} object. It's not a special class, just % it could be recongnized by function % \Rfunction{plotIdeogram}. \Rfunction{isIdeogram} in package % \Rpackage{biovizBase} could be used to check on your data, to see if it contain % sufficient information about cytoband and arms or not. <>= data(hg19IdeogramCyto, package = "biovizBase") ## data structure head(hg19IdeogramCyto) plotIdeogram(hg19IdeogramCyto) @ %def Here comes more special features about the single chromosome 'ideogram' object, it all aims to be conventient when it's embeded in tracks. For a normal \ggbio{} plot or \ggplot{} plot object, when you set limmits, it zooms in certain ranges. But, for an 'ideogram' object, set limits will \textbf{only} add highlights rectangle! You could specify argument \Rfunarg{zoom.region} in \Rfunction{plotIdeogram} function, or plus a function \Rfunction{xlim}, it accpets \begin{itemize} \item nuemric range \item IRanges \item GRanges object, when it's GRanges object, it will change the chromosome if seqnames is not what it is before. \end{itemize} The highlighted style will be remembered when you zoom use xlim. <>= plotIdeogram(hg19IdeogramCyto, "chr1", zoom.region = c(1e7, 5e7)) ## change style of highlighted rectangle ## p <- plotIdeogram(hg19IdeogramCyto, "chr1") p <- plotIdeogram(hg19IdeogramCyto, "chr1", zoom.region = c(1e7, 5e7), fill = NA, color = "blue", size = 2, zoom.offset = 4) p class(p) p + xlim(1e7, 5e7) library(GenomicRanges) p + xlim(IRanges(5e7, 7e7)) ## change visualized chromosomes p + xlim(GRanges("chr2", IRanges(1e7, 5e7))) @ %def Default ideogram has no X-scale label, to add axis text, you have to specify argument \Rfunarg{xlabel} to \Rcode{TRUE}. <>= plotIdeogram(hg19IdeogramCyto, "chr1", xlabel = TRUE) @ %def Some time, you don't want to visualize a chromosome with cytobands, or you cannot find any information about cytobands, in this case, you can simply visualize a blank chromosome as overview template. \ggbio{} has several ways to do it. \begin{itemize} \item Use argument \Rfunarg{cytoband}. Set it to \Rcode{FALSE}. \item Pass a GRanges with no extra column such as \textbf{name, gieStain}. it will automatically parse and estimate the chromosome lengths. It is \textbf{IMPORTANT} that to create an accurate lengths for chromosomes, you need to either make sure the ranges you passed covers all chromosomes or you need to specify the \Rcode{seqlengths} for our \Robject{GRanges} object. \item Use autoplot,Seqinfo, when you only pass one chromosomes, it automatically convert it to an 'ideogram'. \end{itemize} When there is no seqlengths, the length is estiamted from the data(cytoband). <>= ## there are no seqlengths data(hg19IdeogramCyto, package = "biovizBase") seqlengths(hg19IdeogramCyto) ## so directly plot will try to aggregate and estimate lengths of chromosomes, ## this is not accurate p1 <- plotIdeogram(hg19IdeogramCyto, "chr1", cytoband = FALSE, xlabel = TRUE) p1 @ %def Another default data 'hg19Ideogram' contains seqlengths, more suitable for plotting blank overview. Use 'Seqinfo' is convenient way to construct single chromosome overview or karyogram overview. <<>>= data(hg19Ideogram, package= "biovizBase") autoplot(seqinfo(hg19Ideogram)[paste0("chr", 1:13)]) @ %def Single chromosome visualization by seqinfo, if argument \Rfunarg{ideogram} is set to TRUE, the returned object is an 'ideogram' object. By default, it's a normal ggplot object, their lookings are different too. <>= head(hg19Ideogram) library(GenomicRanges) ## single ideogram p <- autoplot(seqinfo(hg19Ideogram)["chr1"]) p + theme(aspect.ratio = 1/20) class(p) p <- autoplot(seqinfo(hg19Ideogram)["chr1"], ideogram = TRUE) p class(p) @ %def To add more data freely on your single chromosome overview, I can see cases that users are familiar with \ggbio{} and \ggplot{} and they hope to \begin{itemize} \item Tweak with graphics more before embedded in tracks. \item Just visualize data on a single chromosome. \end{itemize} You can \begin{itemize} \item Set \Rfunarg{ideogram} to \Rcode{TRUE}, and change class back to ggplot default, then tweak with low level function. \item Default then use \Rfunction{layout\_karyogram}. \end{itemize} use argument \Rfunarg{ideogram} to set it to \Rcode{FALSE}, then it's just a formal ggplot object, and you could manipulate it as usual. <>= ## not ideogram, just ggplot object p <- autoplot(seqinfo(hg19Ideogram)["chr1"], ideogram = TRUE) class(p) class(p) <- c("gg", "ggplot") gr <- GRanges("chr1", IRanges(start = sample(1:1e8, size = 20), width = 5), seqlengths = seqlengths(hg19Ideogram)["chr1"]) library(biovizBase) p + geom_rect(data = mold(gr), aes(xmin = start, xmax = start, ymin = 0, ymax = 10), fill = "black", color = "black") ## or default + layout_karyogram p <- autoplot(seqinfo(hg19Ideogram)["chr1"]) + layout_karyogram(gr) + theme(aspect.ratio = 1/20) p @ %def \subsection{Get ideogram or customize the colors} We only provide default cytoband ideogram information and trying to cover all the cases might be encountered in real world, but what if you want to create your ideogram color yourself? To update the cytoband color with complete definition, simply replace the pre-defined color set. This will affect all the \R{} session. <>= optlist <- getOption("biovizBase") cyto.new <- rep(c("red", "blue"), length = length(optlist$cytobandColor)) names(cyto.new) <- names(optlist$cytobandColor) head(cyto.new) ## suppose cyto.new is your new defined color optlist$cytobandColor <- cyto.new options(biovizBase = optlist) ## see what happenned... plotIdeogram(hg19IdeogramCyto) @ %def \section{sessionInfo} <<>>= sessionInfo() @ %def \end{document}