%\VignetteIndexEntry{How to Use pkgDepTools}
%\VignetteDepends{Biobase, Rgraphviz}
%\VignetteSuggests{RCurl}
%\VignetteKeywords{graphs, dependency, DAG, package}
%\VignettePackage{pkgDepTools}
\documentclass[12pt]{article}

\newcommand{\file}[1]{{\texttt{#1}}}
\newcommand{\code}[1]{{\texttt{#1}}}
\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textsf{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\acronym}[1]{{\textsf{#1}}}

\title{How to Use pkgDepTools}
\author{Seth Falcon}

\begin{document}

\maketitle
\SweaveOpts{keep.source=TRUE}

\section{Introduction}

The \Rpackage{pkgDepTools} package provides tools for computing and
analyzing dependency relationships among R packages.  With it, you can
build a graph-based representation of the dependencies among all
packages in a list of CRAN-style package repositories.  There are
utilities for computing installation order of a given package and, if
the RCurl package is available, estimating the download size required
to install a given package and its dependencies.

This vignette demonstrates the basic features of the package.


\section{Graph Basics}

A graph consists of a set of nodes and a set of edges representing
relationships between pairs of nodes.  The relationships among the
nodes of a graph are binary; either there is an edge between a pair of
nodes or there is not.  To model package dependencies using a graph,
let the set of packages be the nodes of the graph with directed edges
originating from a given package to each of its dependencies.
Figure~\ref{fig:Category} shows a part of the Bioconductor dependency
graph for to the \Rpackage{Category} package.  Since circular
dependencies are not allowed, the resulting dependency graph will be a
directed acyclic graph (\acronym{DAG}).

\section{Building a Dependency Graph}

<<setup0, echo=FALSE, results=hide>>=
options(width=72)
@ 

<<setup, echo=TRUE, results=hide>>=
library("pkgDepTools")
library("Biobase")
library("Rgraphviz")
@ 

The \Rfunction{makeDepGraph} function retrieves the meta data for all
packages of a specified type (source, win.binary, or mac.binary) from
each repository in a list of repository URLs and builds a
\Rclass{graphNEL}\footnote{See \Robject{help("graphNEL-class")}}
instance representing the packages and their dependency relationships.

The function takes four arguments: 1) \Robject{repList} a character
vector of \acronym{CRAN}-style package repository URLs; 2)
\Robject{suggests.only} a logical value indicating whether the
resulting graph should represent relations from the \code{Depends}
field (\code{FALSE}, default) or the \code{Suggests} field
(\code{TRUE}); 3) \Robject{type} a string indicating the type of
packages to search for, the default is \code{getOption("pkgType")}; 4)
\Robject{keep.builtin} which will keep packages that come with a
standard R install in the dependency graph (the default is
\Robject{FALSE}).

Here we use \Rfunction{makeDepGraph} to build dependency graphs of the
BioC and \acronym{CRAN} packages.  Each dependency graph is a
\Rclass{graphNEL} instance.  The out-edges of a given node list its
direct dependencies (as shown for package \Rpackage{annotate}).  The
node attribute ``size'' gives the size of the package in megabytes
when the \Robject{dosize} argument is \Robject{TRUE} (this is the
default).  Obtaining the size of packages requires the
\Rpackage{RCurl} package and can be time consuming for large
repositories since a seprate HTTP request must be made for each
package.  In the examples below, we set \Robject{dosize=FALSE} to
speed the computations.

<<testMakeDepGraph0, cache=TRUE, results=hide, echo=TRUE>>=
biocUrl <- biocReposList()["bioc"]
biocDeps <- makeDepGraph(biocUrl, type="source", dosize=FALSE)
@ 
%
<<testMakeDepGraph>>=
biocDeps
edges(biocDeps)["annotate"]
## if dosize=TRUE, size in MB is stored
## as a node attribute:
## nodeData(biocDeps, n="annotate", attr="size")
@ 

\section{Using the Dependency Graph}

The dependencies of a given package can be visualized using the graph
generated by \Rfunction{makeDepGraph} and the \Rpackage{Rgraphviz}
package.  The graph shown in Figure~\ref{fig:Category} was produced
using the code shown below.  The \Rfunction{acc} method from the
\Rpackage{graph} package returns a vector of all nodes that are
accessible from the given node.  Here, it has been used to obtain the
complete list of \Rpackage{Category}'s dependencies.

<<CategoryPlot, fig=TRUE, prefix=FALSE, include=FALSE, echo=TRUE>>=
categoryNodes <- c("Category", 
                   names(acc(biocDeps, "Category")[[1]]))
categoryGraph <- subGraph(categoryNodes, biocDeps)
nn <- makeNodeAttrs(categoryGraph, shape="ellipse")
plot(categoryGraph, nodeAttrs=nn)
@ 

\begin{figure}[hbt]
\begin{center}
\setkeys{Gin}{width=0.95\textwidth}
\includegraphics{CategoryPlot}
\end{center}
\caption{The dependency graph for the \Rpackage{Category} package.}
\label{fig:Category}
\end{figure}

In R, there is no easy to way to preview a given package's
dependencies and estimate the amount of data that needs to be
downloaded even though the \Rfunction{install.packages} function will
search for and install package dependencies if you ask it to by
specifying \code{dependencies=TRUE}.  The \Rfunction{getInstallOrder}
function provides such a ``preview''.

For computing installation order, it is useful to have a single graph
representing the relationships among all packages in all available
repositories.  Below, we create such a graph combining all CRAN and
Bioconductor packages.

<<demo-setup, cache=TRUE>>=
allDeps <- makeDepGraph(biocReposList(), type="source",
                        keep.builtin=TRUE, dosize=FALSE)

@ 

Calling \Rfunction{getInstallOrder} for package \Rpackage{GOstats}, we
see a listing of only those packages that need to be installed.  Your
results will be different based upon your installed packages.

<<demo1>>=
getInstallOrder("GOstats", allDeps)
@ 

When \code{needed.only=FALSE}, the complete dependency list is
returned regardless of what packages are currently installed.

<<demo2>>=
getInstallOrder("GOstats", allDeps, needed.only=FALSE)
@ 

The edge directions of the dependency graph can be reversed and the
resulting graph used to determine the set of packages that make use of
(even indirectly) a given package.  For example, one might like to
know which packages make use of the \Rpackage{methods} package.  Here
is one way to do that:

<<whoDependsOnMe>>=
allDepsOnMe <- reverseEdgeDirections(allDeps)
usesMethods <- dijkstra.sp(allDepsOnMe, start="methods")$distance
usesMethods <- usesMethods[is.finite(usesMethods)]
length(usesMethods) - 1 ## don't count methods itself
table(usesMethods)

@ 


<<sessionInfo, results=tex>>=
toLatex(sessionInfo())
@ 

\end{document}