%\VignetteEngine{knitr::knitr}

\documentclass[a4paper, 9pt]{article}

<<style-knitr, eval=TRUE, echo=FALSE, results="asis">>=
BiocStyle::latex()
@

%\VignetteIndexEntry{OncoScore}

\usepackage{hyperref}
\usepackage{amsmath, amsthm, amssymb}
\usepackage{expl3}

\usepackage[utf8]{inputenc}

\usepackage{graphicx}
\usepackage{tcolorbox}
\usepackage{authblk}

\newcommand{\OncoScore}{\textsc{OncoScore}}

\begin{document}

\title{Using the \OncoScore{} package}

\author[1]{Luca De Sano}
\author[2]{Carlo Gambacorti Passerini}
\author[2]{Rocco Piazza}
\author[1,3]{Daniele Ramazzotti}
\author[2]{Roberta Spinelli}

\affil[1]{Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi Milano Bicocca Milano, Italy.}
\affil[2]{Dipartimento di Medicina e Chirurgia, Università degli Studi Milano Bicocca Milano, Italy.}
\affil[3]{Stanford University, Stanford, California, USA}

\date{\today}
\maketitle

\begin{tcolorbox}{\bf Overview.} \OncoScore{} is a tool to measure the association of genes to cancer based
    on citation frequency in biomedical literature. The score is evaluated 
    from PubMed literature by dynamically updatable web queries.

\vspace{1.0cm}


{\em In this vignette, we give an overview of the package by presenting some of its main functions.}

\vspace{1.0cm}

\renewcommand{\arraystretch}{1.5}

\end{tcolorbox}

\vspace{1.0cm}

\renewcommand{\arraystretch}{2}

\begin{center}
\begin{tabular}{l | p{6.0cm} | l  | p{6.0cm}}
{\bf Acronym} & {\bf Extended name} & {\bf Reference}\\ \hline

OncoScore &  OncoScore: a novel, Internet-based tool to assess the oncogenic potential of genes &  \href{https://www.nature.com/articles/srep46290}{Publication}.
\\ \hline

 \end{tabular}
\end{center}

\vspace{1.0cm}

The \OncoScore{} analysis consists of two parts. One can estimate a score to asses 
the oncogenic potential of a set of genes, given the lecterature knowledge, 
at the time of the analysis, or one can study the trend of such score over time.

We next present the two analysis and we conclude with showing the capabilities
of the tool to visualize the results.

\paragraph{\large Requirements. } First we load the library.

<<>>=
library(OncoScore)
@

\paragraph{\large OncoScore analysis.} The query that we show next retrieves
from PubMed the citations, at the time of the query, for a list of genes in
cancer related and in all the documents.

<<eval=FALSE>>=
query = perform.query(c("ASXL1","IDH1","IDH2","SETBP1","TET2"))
@
\begin{verbatim}
### Starting the queries for the selected genes.

### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 309 
    Number of papers found in PubMed for IDH1 was: 1450 
    Number of papers found in PubMed for IDH2 was: 557 
    Number of papers found in PubMed for SETBP1 was: 69 
    Number of papers found in PubMed for TET2 was: 560 

### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 350 
    Number of papers found in PubMed for IDH1 was: 1581 
    Number of papers found in PubMed for IDH2 was: 648 
    Number of papers found in PubMed for SETBP1 was: 89 
    Number of papers found in PubMed for TET2 was: 717 
\end{verbatim}
<<include=FALSE,echo=FALSE>>=
data(query)
@

OncoScore provides a function to merge gene names if requested by the user.
This function is useful when there are aliases in the gene list. 

<<>>=
combine.query.results(query, c('IDH1', 'IDH2'), 'new_gene')
@

OncoScore also provides a function to retireve the names of the genes in a given 
portion of a chromosome that can be exploited if we are dealing, e.g., with copy 
number alterations hitting regions rather than specific genes. 

<<eval=FALSE>>=
chr13 = get.genes.from.biomart(chromosome=13,start=54700000,end=72800000)
head(chr13)
@
\begin{verbatim}
[1] "LINC00374" "RNA5SP30"  "RNU7-87P"  "HNF4GP1"   "RN7SL375P" "BORA"
\end{verbatim}


Furthermore, one can also automatically perform the OncoScore analysis on
chromosomic regions as follows:

<<eval=FALSE>>=
result = compute.oncoscore.from.region(10, 100000, 500000)
@
\begin{verbatim}
### Performing query on BioMart 
### Performing web query on: RNA5SP297 RNA5SP298 RN7SL754P ZMYND11 DIP2C 
### Starting the queries for the selected genes.

### Performing queries for cancer literature 
    Number of papers found in PubMed for RNA5SP297 was: -1 
    Number of papers found in PubMed for RNA5SP298 was: -1 
    Number of papers found in PubMed for RN7SL754P was: -1 
    Number of papers found in PubMed for ZMYND11 was: 27 
    Number of papers found in PubMed for DIP2C was: 1 

### Performing queries for all the literature 
    Number of papers found in PubMed for RNA5SP297 was: -1 
    Number of papers found in PubMed for RNA5SP298 was: -1 
    Number of papers found in PubMed for RN7SL754P was: -1 
    Number of papers found in PubMed for ZMYND11 was: 45 
    Number of papers found in PubMed for DIP2C was: 3 
### Processing data
### Computing frequencies scores 
### Estimating oncogenes
### Results:
     RNA5SP297 -> 0 
     RNA5SP298 -> 0 
     RN7SL754P -> 0 
     ZMYND11 -> 49.07473 
     DIP2C -> 12.30234 
\end{verbatim}


We now compute a score for each of the genes, to estimate their oncogenic
potential.

<<>>=
result = compute.oncoscore(query)
@

\paragraph{\large OncoScore timeline analysis.} The query that we show next retrieves
from PubMed the citations, at specified time points, for a list of genes in
cancer related and in all the documents.

<<eval=FALSE>>=
query.timepoints = perform.query.timeseries(c("ASXL1","IDH1","IDH2","SETBP1","TET2"),
    c("2012/03/01", "2013/03/01", "2014/03/01", "2015/03/01", "2016/03/01"))
@
\begin{verbatim}
### Starting the queries for the selected genes.
### Quering PubMed for timepoint 2012/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 83 
    Number of papers found in PubMed for IDH1 was: 408 
    Number of papers found in PubMed for IDH2 was: 172 
    Number of papers found in PubMed for SETBP1 was: 5 
    Number of papers found in PubMed for TET2 was: 169 
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 91 
    Number of papers found in PubMed for IDH1 was: 488 
    Number of papers found in PubMed for IDH2 was: 234 
    Number of papers found in PubMed for SETBP1 was: 10 
    Number of papers found in PubMed for TET2 was: 196 
### Quering PubMed for timepoint 2013/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 132 
    Number of papers found in PubMed for IDH1 was: 662 
    Number of papers found in PubMed for IDH2 was: 267 
    Number of papers found in PubMed for SETBP1 was: 11 
    Number of papers found in PubMed for TET2 was: 254 
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 149 
    Number of papers found in PubMed for IDH1 was: 753 
    Number of papers found in PubMed for IDH2 was: 336 
    Number of papers found in PubMed for SETBP1 was: 18 
    Number of papers found in PubMed for TET2 was: 302 
### Quering PubMed for timepoint 2014/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 185 
    Number of papers found in PubMed for IDH1 was: 903 
    Number of papers found in PubMed for IDH2 was: 364 
    Number of papers found in PubMed for SETBP1 was: 29 
    Number of papers found in PubMed for TET2 was: 342 
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 208 
    Number of papers found in PubMed for IDH1 was: 1002 
    Number of papers found in PubMed for IDH2 was: 439 
    Number of papers found in PubMed for SETBP1 was: 36 
    Number of papers found in PubMed for TET2 was: 430 
### Quering PubMed for timepoint 2015/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 250 
    Number of papers found in PubMed for IDH1 was: 1188 
    Number of papers found in PubMed for IDH2 was: 467 
    Number of papers found in PubMed for SETBP1 was: 49 
    Number of papers found in PubMed for TET2 was: 452 
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 283 
    Number of papers found in PubMed for IDH1 was: 1300 
    Number of papers found in PubMed for IDH2 was: 550 
    Number of papers found in PubMed for SETBP1 was: 64 
    Number of papers found in PubMed for TET2 was: 576 
### Quering PubMed for timepoint 2016/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 309 
    Number of papers found in PubMed for IDH1 was: 1446 
    Number of papers found in PubMed for IDH2 was: 557 
    Number of papers found in PubMed for SETBP1 was: 69 
    Number of papers found in PubMed for TET2 was: 558 
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 350 
    Number of papers found in PubMed for IDH1 was: 1576 
    Number of papers found in PubMed for IDH2 was: 648 
    Number of papers found in PubMed for SETBP1 was: 89 
    Number of papers found in PubMed for TET2 was: 715 
\end{verbatim}
<<include=FALSE,echo=FALSE>>=
data(query.timepoints)
@

We now compute a score for each of the genes, to estimate their oncogenic
potential at specified time points.

<<>>=
result.timeseries = compute.oncoscore.timeseries(query.timepoints)
@

\paragraph{\large Visualization of the results.} We next plot the scores
measuring the oncogenetic potential of the considered genes as a barplot.

<<image-oncoscore, fig.show='hide', fig.width=5, fig.height=5, results='hide'>>=
plot.oncoscore(result, col = 'darkblue')
@
\incfig[ht]{figure/image-oncoscore-1}{0.5\textwidth}{Oncogenetic potential of the considered genes.}{}

We finally plot the trend of the scores over the considered times as 
absolute and values and as variations.


<<image-timeseries, fig.show='hide', fig.width=5, fig.height=5, results='hide'>>=
plot.oncoscore.timeseries(result.timeseries)
@
\incfig[ht]{figure/image-timeseries-1}{0.6\textwidth}{Absolute values of the oncogenetic potential 
of the considered genes over times.}{}

<<image-oncoscore-timeseries, fig.show='hide', fig.width=7, fig.height=5.5, results='hide'>>=
plot.oncoscore.timeseries(result.timeseries, 
    incremental = TRUE, 
    ylab='absolute variation')
@
\incfig[ht]{figure/image-oncoscore-timeseries-1}{0.6\textwidth}{Variations of the oncogenetic potential 
of the considered genes over times.}{}

<<image-oncoscore-timeseries-incremental, fig.show='hide', fig.width=7, fig.height=6.5, results='hide'>>=
plot.oncoscore.timeseries(result.timeseries,
    incremental = TRUE,
    relative = TRUE,
    ylab='relative variation')
@
\incfig[ht]{figure/image-oncoscore-timeseries-incremental-1}{0.6\textwidth}{Variations as relative values
of the oncogenetic potential of the considered genes over times.}{}


\paragraph{\Rcode{sessionInfo()}}

<<sessioninfo, results='asis', echo=FALSE>>=
toLatex(sessionInfo())
@

\end{document}