% -*- mode: noweb; noweb-default-code-mode: R-mode; -*- %\VignetteIndexEntry{Using 450k annotations for 27k chips} \documentclass{article} \usepackage{amsmath,pstricks} \usepackage{hyperref} \usepackage[authoryear,round]{natbib} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\textit{#1}}} \SweaveOpts{keep.source=TRUE} \begin{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \setkeys{Gin}{width=0.8\textwidth} \author{Tim Triche, Jr.} \title{HumanMethylation27k probes in gene bodies} \maketitle \clearpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \part*{27k probes by design and color} First we have to grab the probes that made it onto the 450k chip. <<load library and identify 27k probes, eval=T>>= library('IlluminaHumanMethylation450k.db') probes.27k <- IlluminaHumanMethylation450k_get27k() lapply(probes.27k, function(x) { if( class(x) == 'list') lapply(x, head) else head(x) }) @ \clearpage \part*{Annotation} How many probes align to UCSC gene bodies? That's a bit complicated, because each Illumina probe can be mapped to several transcripts, and each transcript to several probes. Normalizing the schema reduced the database size by 100MB. <<list the 27k probe IDs, eval=T>>= probes.27k <- unlist(probes.27k, recursive=T) head(mget(probes.27k, IlluminaHumanMethylation450kPROBELOCATION, ifnotfound=NA)) @ \clearpage Location mapping is many-to-one and emerges as a list of concatenations, so we cannot simply set up a simpleBimap object... UNLESS the concatenation is part of a VIEW (across four or so different tables). So a VIEW is what we use. <<annotate the 27k probes, eval=T>>= probeloc <- mget(probes.27k, IlluminaHumanMethylation450kPROBELOCATION, ifnotfound=NA) body.or.exon <- function(x) length( grep('(Body|Exon)', x) ) > 0 length(which(unlist(lapply(probeloc, body.or.exon)))) in.body <- function(x) length( grep('Body', x) ) > 0 gene.body.probes <- names(which(unlist(lapply(probeloc, in.body)))) length(which(unlist(lapply(probeloc, in.body)))) head(gene.body.probes) @ They are not nearly as scarce as I initially thought. \part*{Versioning} It's important to keep track of where information came from, and who is in charge of keeping it organized, when documenting phenomena like this. <<identify database and package versions, eval=T>>= IlluminaHumanMethylation450k_dbInfo()[c(8:10,22:24,31,33),] IlluminaHumanMethylation450kSVNID IlluminaHumanMethylation450kBLAME @ Writing to Bioconductor standards simply enforces this. \clearpage \part*{R session} <<results=tex>>= toLatex(sessionInfo()) @ \end{document}