\documentclass[a4paper]{article}

%\VignetteIndexEntry{Introduction to the TIN package}
%\VignettePackage{TIN}
%\VignetteEngine(knitr::knitr_notangle)
%\usepackage{natbib}
\usepackage{amsmath}
\RequirePackage{graphicx,ae,fancyvrb}
%\usepackage[utf8]{inputenc}

\setlength{\textheight}{8.5in}
\setlength{\textwidth}{6in}
\setlength{\topmargin}{-0.25in}
\setlength{\oddsidemargin}{0.25in}
\setlength{\evensidemargin}{0.25in}

\title{An overview of the TIN package}
\author{Bjarne Johannessen}

\begin{document}
\setkeys{Gin}{width=0.99\textwidth}

\maketitle

\tableofcontents

\section{Introduction}
This document gives an overview and demonstration of the \texttt{TIN}
package. The package provides a set of tools for transcriptome
instability analysis based on exon-level microarray expression profiles.
Alternative splicing is an important mechanism for gene expression, and
disruption from normal splicing patterns can be harmful for eukaryotic
cells. By applying high-throughput technologies, it is possible to identify
genes and exons that are subject to splicing discrepancies. The \texttt{TIN}
package includes a set of tools for aberrant exon usage calculations, and
for analyzing correlation between transcriptome instability and splicing
factor expression.


\subsection{Data input}
Input data to the \texttt{TIN} package is raw expression data (CEL files) and
preprocessed gene-level expression values.

\subsection{Data sets}
The package includes three data sets:
\begin{itemize}
    \item \texttt{splicingFactors}: A list of 280 splicing factor
        genes \cite{sveen11}.
    \item \texttt{geneSets}: 1,454 Gene Ontology gene sets \cite{subramanian05}.
    \item \texttt{geneAnnotation}: Matching gene symbols and Affymetrix
        transcript cluster identifiers.
\end{itemize}
In addition, two toy data sets are included in the package. See the worked
example below for a demonstration.


\section{Example}
The following example illustrates the analysis pipeline of the \texttt{TIN}
package. First, load the package in R:
<<>>=
library(TIN)
@

\subsection{Data access}
We will need access to all three data sets included in the package:
<<eval=TRUE>>=
data(splicingFactors)
data(geneSets)
data(geneAnnotation)
@
The splicingFactors data set contains gene symbols and Affymetrix transcript
cluster identifiers for 280 genes known to be involved in splicing. The
geneSets data set is a collection of 1,454 Gene Ontology gene sets included in
the package to enable comparisons of associations between aberrant exon usage
and expression levels with other general gene sets. The list comprises one
major collection of gene sets in the Molecular Signatures Database, MSigDB
\cite{subramanian05}. The geneAnnotation data set is a list of matching gene
symbols and Affymetrix transcript cluster identifiers.

\subsection{Sample data}
Two sample data sets are included for educational purposes. 
<<eval=TRUE>>=
data(sampleSetFirmaScores)
data(sampleSetGeneSummaries)
@
The two sample datasets include small parts of a comprehensive prostate cancer
data set (GEO accession number GSE21034) published in \cite{taylor10}.


\subsection{FIRMA analysis}
By issuing the first command,
<<eval=TRUE>>=
fs <- firmaAnalysis(useToyData=TRUE)
@
raw CEL files are being read, and background correction, normalization
(customized RMA approach), and alternative splicing analysis is performed
according to the FIRMA method
(http://www.aroma-project.org/vignettes/FIRMA-HumanExonArrayAnalysis). Local
path to the aroma.affymetrix root directory and the name of the sample set is
sent as parameters to the function. The function returns a data.frame with
log2 FIRMA (alternative splicing) scores for each probeset/sample combination.

Next we read preprocessed gene-level expression values by
<<eval=TRUE>>=
gs <- readGeneSummaries(useToyData=TRUE)
@
The input parameter should be a table tab-separated file with one row for each
gene and one column for each sample. Affymetrix transcript cluster identifiers
should be used as row names, whereas sample names for each sample should be
used as column names. These values can be generated by using for instance
Affymetrix Power Tools or Expression Console.

After reading input data, aberrant exon usage can be calculated by
<<eval=TRUE>>=
tra <- aberrantExonUsage(1.0, sampleSetFirmaScores)
@
This function makes use of the data.frame from 'firmaAnalysis' (containing
log2 FIRMA scores for all probe sets/exons (rows) in all samples (columns)),
and a number (default 1.0) indicating which top percentile value of the global
FIRMA scores to be used as threshold for denoting aberrant exon usage. The tra
object is a list containing one number for each sample, indicating to what
degree each sample possess aberrant exon usage. An object called aberrantExons,
consisting of two lists representing how many probe sets for each sample that
are nominated as having high or low aberrant exon usage, is also created by
calling the aberrantExonUsage function. In addition, the two expression values
that are used as threshold for detecting aberrant exon usage are stored in the
quantiles object.

Next, we create permutations of the FIRMA scores for each probe set/exon
across all samples,
<<eval=TRUE>>=
aberrantExonsPerms <- probesetPermutations(sampleSetFirmaScores, quantiles)
@
The perms object contains lists indicating high and low aberrant exon usage
for each sample after the data in the initial firmaScores object has been
reshuffled at each probe set. To calculate the correlation between
sample-wise amounts of aberrant exon usage and splicing factor expression
levels, the correlation function is applied in the following way
<<eval=TRUE>>=
corr <- correlation(splicingFactors, sampleSetGeneSummaries, tra)
@

Correlation between aberrant exon usage and expression levels for a number
of gene sets is calculated by
<<eval=TRUE>>=
gsc <- geneSetCorrelation(geneSets, geneAnnotation, sampleSetGeneSummaries,
    tra, 100)
@
The function calculates Pearson correlation between sample-wise aberrant exon
usage amounts and expression levels of all genes for all gene sets defined by
the input parameter list geneSets.

\subsection{Data plotting}
Four different plotting methods are included in the \texttt{TIN} package. First,
the cluster plot visualizes the hierarchical clustering of the samples based
on splicing factor expression levels.  
<<eval=TRUE>>=
clusterPlot(sampleSetGeneSummaries, tra, "euclidean", "complete",
    "TIN-cluster.pdf")
@

Second, a scatter plot visualizes the relative amounts of aberrant exon usage
for each sample
<<eval=TRUE>>=
scatterPlot("TIN-scatter.pdf", TRUE, aberrantExons, aberrantExonsPerms)
@

Third, the correlationPlot creates a plot that visualizes the number of
splicing factor genes with expression levels significantly correlated with the
sample-wise total relative amounts of aberrant exon usage.
<<eval=TRUE>>=
correlationPlot("TIN-correlation.pdf", tra, sampleSetGeneSummaries,
    splicingFactors, 1000, 1000)
@

Additionally, the posNegCorrPlot is a scatterPlot that compares the amount of
splicing factor genes for which expression levels are significant positively
(vertical axis) and negatively (horizontal axis) correlated with the total
relative amounts of aberrant exon usage per sample.
<<eval=TRUE>>=
posNegCorrPlot("TIN-posNegCorrPlot.pdf", tra, sampleSetGeneSummaries, 
    splicingFactors, 1000, 1000)
@


\begin{thebibliography}{}

\bibitem{sveen11}
Sveen, A., Agesen, TH., Nesbakken, A., Rognum, TO., Lothe, RA., Skotheim, RI.,
\emph{Transcriptome instability in colorectal cancer identified by exon
microarray analyses: Associations with splicing factor expression levels and
patient survival}, Genome Medicine \emph{3}, 32 (2011).

\bibitem{subramanian05}
Subramanian, A., Tamayo P., Mootha, VK., Mukherjee S., Ebert, BL.,
Gillette, MA., Paulovich, A., Pomeroy, SL., Golub, TR., Lander, SL.,
Mesirov, JP., \emph{Gene set enrichment analysis: A knowledge-based approach
for interpreting genome-wide expression profiles}, Proceedings of the National
Academy of Sciences of the United States of America \emph{102}, 15545-15550
(2005).

\bibitem{taylor10}
Taylor BS, Schultz N, Hieronymus H, Gopalan A, Xiao Y, Carver BS, Arora VK,
Kaushik P, Cerami E, Reva B, Antipin Y, Mitsiades N, Landers T, Dolgalev I,
Major JE, Wilson M, Socci ND, Lash AE, Heguy A, Eastham JA, Scher HI, Reuter
VE, Scardino PT, Sander C, Sawyers CL, Gerald WL: \emph{Integrative genomic
profiling of human prostate cancer.} Cancer Cell \emph{18}:11-22 (2010).

\end{thebibliography}


\end{document}