%\VignetteIndexEntry{From R to Java}
%\VignetteKeywords{Web services}
%\VignettePackage{RWebServices}

\documentclass[]{article}

\usepackage[colorlinks,linkcolor=blue,pagecolor=blue,urlcolor=blue]{hyperref}
\usepackage{graphicx}

\newcommand{\lang}[1]{{\texttt{#1}}}
\newcommand{\pkg}[1]{{\textsf{#1}}}
\newcommand{\code}[1]{\texttt{#1}}
\newcommand{\func}[1]{{\texttt{#1}}}
\newcommand{\method}[1]{{\texttt{#1}}}
\renewcommand{\arg}[1]{{\texttt{#1}}}
\newcommand{\ret}[1]{{\texttt{#1}}}
\newcommand{\obj}[1]{{\texttt{#1}}}
\newcommand{\class}[1]{{\textit{#1}}}

\newcommand{\R}{\lang{R}}
\newcommand{\Java}{\lang{Java}}

\newcommand{\RWebServices}{\pkg{RWebServices}}
\newcommand{\TypeInfo}{\pkg{TypeInfo}}
\newcommand{\SJava}{\pkg{SJava}}

\newcommand{\STS}{\code{SimultaneousTypeSpecification}}
\newcommand{\ITS}{\code{IndependentTypeSpecification}}
\newcommand{\TypedSignature}{\code{TypedSignature}}
\newcommand{\STT}{\code{StrictIsTypeTest}}
\newcommand{\DTT}{\code{DynamicTypeTest}}
\newcommand{\ITT}{\code{InheritsTypeTest}}
\newcommand{\oneWayAnova}{\command{oneWayAnova}}

%% \newcommand{\STS}{\func{SimultaneousTypeSpecification}}
%% \newcommand{\ITS}{\func{IndependentTypeSpecification}}
%% \newcommand{\TypedSignature}{\func{TypedSignature}}
%% \newcommand{\STT}{\func{StrictIsTypeTest}}
%% \newcommand{\DTT}{\func{DynamicTypeTest}}
%% \newcommand{\ITT}{\func{InheritsTypeTest}}

\begin{document}

\title{From \R{} to \Java{}: the \TypeInfo{} and \RWebServices{} paradigm}
\author{
  Nianhua Li\footnote{Fred Hutchinson Cancer Research Center, 
    1100 Fairview Ave.\ N., PO Box 19024 Seattle, WA 98109},
  Martin T. Morgan,
  Seth Falcon,\\
  Robert Gentleman,
  Duncan Temple Lang\footnote{Department of Statistics, 
    4210 Mathematical Sciences Building, One Shield Avenue, Davis, CA 95616}
}
\date{14 July, 2006}
\maketitle

\begin{abstract}
  Web services are most effective on statically typed objects exposed
  in a well-developed infrastructure. This document summarizes our
  approach to exposing \R{} objects and functionality in a \Java{}
  class hierarchy of statically typed methods. The approach is to use
  \R{}'s formal (S4) class system to strongly type \R{} functions
  using \TypeInfo{}. We then convert strongly typed functions to
  \Java{} objects and methods for exposure as \Java-based web
  services. 
  
  Exposing and implementing the web service in \Java{} involves the
  package \SJava{}. Documentation for these steps will be provided
  later.
\end{abstract}

<<echo=FALSE>>=
options(width=60)
@ 

\section{Introduction}

Exposing \R{} objects and functions as web services poses several
challenges. First, \R{} has both informal `classes' and a formal (S4)
class system, whereas web services are most effective with
well-defined objects.  Second \R{} functions are not strongly typed,
whereas web services deploy statically typed functions. Finally,
well-developed infrastructure supports \Java{}-based web services,
whereas web services client and server functionality for \R{} requires
substantial \emph{de novo} development. \TypeInfo{} and
\RWebServices{} are packages that combine to provide a paradigm for
exposing \R{} functions as effective web services in a \Java-based web
services context.

Here we document the paradigm of using \TypeInfo{} and \RWebServices{}
for type mapping between \R{} and \Java.

%% TODO: why auto-generation between R and Java? Java has structure to
%% deploy web services in a common framework. Server model easy to set
%% up. Need to take data from the web to R via Java, relying on
%% established web services infrastructure to perform most of the
%% web-based communication. SJava provides a two-way street -- both R
%% to Java and reverse.

\section{Steps to describing \R{} objects in \Java}

\subsection{Adding \TypeInfo{} to \R{} functions}

The main purpose of \TypeInfo{} is to provide type specification for
function arguments and return values. By `type specification' we mean
definition of argument and return types in terms of defined \R{} objects.
The named objects are defined in \R{}, and objects and function
definitions are translated to equivalent \Java{} objects and methods
using \RWebServices{}.

To illustrate, the following defines and invokes a hypothetical \R{}
function \func{square} taking an un-typed argument \arg{x} and
returning an untyped return value.
<<simplestFunc>>=
square <- function(x) {
    return(x^2)
}
square(10)
@ 
%
The function evaluates correctly when provide a numeric argument;
non-numeric arguments result in a run-time error.  Importantly, there
is no way to query the function to determine its argument or return
type.

Type specification is applied by loading the \TypeInfo{} package and
annotating the definition of \func{square}:
<<simplestTypeInfo>>=
library(TypeInfo)
STS <- SimultaneousTypeSpecification
TS <- TypedSignature

typeInfo(square) <-  
  STS(TS(x = "numeric"), returnType = "numeric")
@ 
(The symbols \code{STS} and \code{TS} are defined for convenience to
be synonyms for the longer function names from the \TypeInfo{}
library). 

Applying \TypeInfo{} provides two important changes to the behavior of
\func{square}, without altering the body of the function. First, the
argument \arg{x} and return type \emph{must} be objects of type
\code{numeric} (approximately, \code{double[]} in \Java{}). Attempts
to invoke \func{square} with non-numeric arguments result in an error.
Programming errors returning non-numeric values also cause an error.

The second important consequence of applying \TypeInfo{} is to allow
functions annotated in this way to be queried for their argument and
return types:
<<simplestTypeInfoQuery>>=
typeInfo(square)
@ 
This information can be readily extracted and transformed
programmatically.

\R{} functionality is usually organized into \emph{packages}. The
intention is that package authors, or individuals responsible for
exposing \R{} functionality as web services, apply \TypeInfo{} to
functions in the package. Thus type-specified functions are defined
within packages.

Full documentation of \TypeInfo{} is available with the package.
Entering \code{library(help=TypeInfo)} at the \R{} prompt provides a
synopsis of available commands. Documentation of each command is
available by typing \code{?typeInfo} at the \R{} prompt. Additional
illustration of \TypeInfo, written for a general audience, is
distributed with the packages as a PDF file TypeInfoNews.

\subsection{Using \RWebServices{} to create \Java{} mappings}

The main purpose of \RWebServices{} is to translate \R{} object and
function definitions into equivalent \Java{} class definitions. Note
that there are two components to translation. The focus here is on
\emph{describing} \R{} objects in \Java{}. The process of moving data
from \R{} to \Java{} and vice-versa is implicit in this description,
but the software for performing this translation (\SJava) is not part
of the paradigm being described here.

\RWebServices{} operate on type-specified functions. \RWebServices{}
extracts information about argument and return types. It determines
the underlying structure of potentially complicated \R{} objects
specified in the type definition. Based on this information,
\RWebServices{} produces \Java{} class hierarchies reflecting data
objects, and composes \Java{} method signatures appropriate for the
functions.

From the \R{} perspective, the process of producing web services
templates for a function, e.g., \func{caAffy} with \TypeInfo{}
applied in the package \pkg{CaAffy} is straight-forward:
<<simplestRWebServices,eval=FALSE>>=
library(CaAffy)
RJavaSignature(c(caAffy))
@ 
%
\func{RJavaSignature} queries \func{caAffy} for its argument types. It
then uses standard S4 object type definition specified in \pkg{CaAffy}
(or other \R{} packages), and function definitions in \pkg{CaAffy} to
construct \Java{} signatures. \func{RJavaSignature} then produces
documented \Java{} beans representing the \R{} data objects and
functions, organized in a hierarchy reflecting the package structure.
Suppose \func{caAffy} takes arguments \arg{magePlaceholder} and
\arg{caAffyTuningParam} of class \class{MagePlaceholder} and
\class{CaAffyTuningParam}, and returns an object of
\class{MagePlaceholder}. The \Java{} beans and methods are packaged as
described below.

Full documentation of \RWebServices{} is available with the library.
Entering \code{library(help=RWebServices)} at the \R{} prompt provides
a synopsis of available commands. Documentation of each command is
available by typing \code{?RJavaSignature} at the \R{} prompt.
Although the \RWebServices{} package depends on \SJava{} for
performing web services, the functionality described here does not
use the facilities of \SJava.

\section{Understanding \Java{} representations of \R{} objects and functions}
\RWebServices{} has two main functionalities. First, \RWebServices{}
generates \Java{} representations of \R{} functions and data
objects. Second, \RWebServices{} allows \R{} functions to be evaluated
from within \Java{}, including \Java{}-based web or analytic services.
This section describes in detail the functioning of \RWebServices{}
as it generates \Java{} representations.

A central purpose of \RWebServices{} is to generate \Java{}
representations of \R{} data and functions.  The main interface to
\RWebServices{} is provided through the \R{} function
\method{RJavaSignature}. Starting with a list (provided by the user or
programmatically extracted from the package) of \TypeInfo{}-annotated
functions, \RWebServices{} parses the functions for data types, and
creates Java representations of each data type and method.  The Java
representation of methods and parsed data types are then collated into
\Java{} packages with a layout consistent with the \R{} package
structure.  \RWebServices{} also generates \Java{} service APIs and
adapters for the \R{} functions. Internally, the function
\method{RWebServices:::generateFunctionMap} is responsible for these
steps.

The \Java{} data and method representations are written to disk as a
file hierarchy reflecting the structure of the corresponding \R{}
objects, including the libraries in which the \R{} data types and
methods were defined. Details are provided below, but a simple example
is:
\begin{verbatim}
package / CaAffy / data (Java data objects)
                 / functions (Java methods for R functions)
        / CaPROcess / data 
                    / functions
        / CaDNAcopy / data
                    / functions
service / bioconductor (Java service API) 
\end{verbatim}
The \R{} packages in this example include \pkg{CaAffy},
\pkg{CaPROcess}, and \pkg{CaDNAcopy}.

\subsection{\Java{} representations of \R{} data objects}
The responsibility for generating \Java{} representations of \R{} data
objects is in the internally defined function
\method{RWebServices:::generateDataMap}. This function operates by
creating a hash of \R{} data types used in the \R{} functions. The
function then creates \Java{} class definitions representing the \R{}
data types (limitations concerning multiple inheritance are described
below).  The representations reflect underlying \R{} data type
structure, for instance, capturing slots present in S4 classes. Part
of this process is to identify functions required for low-level data
conversion (e.g., R \verb|numeric| to \Java{} \verb|RDouble|); details
of the low-level conversion process are presented below. \R{} class
names are mangled to reflect \Java{} conventions (e.g., \R{}
\verb|class.name| becomes \Java{} \verb|className|) and to avoid \Java{}
keyword conflicts.

The \Java{} representations are written to disk in a folder
\verb|data| contained inside the corresponding package folder, e.g.,
\verb|biocJavaMap/CaAffy/data|.

\subsection{Generating \Java{} representations of \R{} function signatures}
\method{RWebServices:::generateDataMap} uses the \R{} function
signature to generate \Java{} class methods. Methods are constructed
by looking up input and output \R{} data types with their
corresponding \Java{} representation. Argument input names are
mangled to be consistent with \Java{} convention. \Java method names
correspond to \R{} function names, except when several \R{} functions have
the same name but different return types. In this case simple aliases
(e.g., \verb|foo_1|, \verb|foo_2|) are created in the \Java{}
representation. 

The \Java{} representations are written to disk in a folder
\verb|function| containing a single class with methods corresponding
to all \R{} functions defined in the \R{} package.

\subsection{Generating the \Java{} API and adapters}
\RWebServices{} creates an API that represents the main entry to
invoke \R{} functionality from \Java{}. In its simplest form, the API
consists of a single \Java{} class (e.g., service.bioconductor.java)
with a method for each \R{} function. Each method in the API invokes
the corresponding method in the individual \Java{} packages. For
example, the \verb|affy| method in the main service API might invoke
\verb|biocJavaMap.CaAffy.function.caAffy()|. Multiple web services can
also be defined, with each service API dispatching to one or several
\Java{} packages encapsulating \R{} methods.

\RWebServices{} also creates a naive client interface to be used
during testing, and an adapter to implement the web service interface
generated by Axis or other web service facilities.

The \Java{} API, client, and adapters are written to disk in the folder
\texttt{service / bioconductor} (or as defined by the user).

\section{Understanding how \Java{} invokes \R{} functions}

Invoking \R{} functions from \Java{} relies on the \SJava{} package.
There are two main tasks. The first is conversion of data types
between \Java{} and \R{}. The second is to evaluate the \R{} functions,
using an \R{} session embedded in the \Java{} virtual machine.

\subsection{Data types and conversions}
\SJava{} allows C code to interface between native \Java{} types
(accessible through JNI) and native \R{} types (\R{} native types are
C data structures that define S-expressions, or SEXPs). Each data type
conversion is performed by converter functions, written in C or \R{}.
Converters for basic data types are provided by \SJava.  Additional
converters can extend or override the basic converters, and can be
registered with \SJava{} for dynamic dispatch. 

\subsubsection{Data models}
\RWebServices{} uses the flexible infrastructure of \SJava{} to
convert basic \R{} types to \Java{} primary types (\verb|integer|,
\verb|double| or classes (e.g., \verb|Integer[]|, \verb|Double[]|,
etc.), and to convert the structured S4 \R{} objects to corresponding
\Java{} classes. This basic mapping provides sufficient flexibility
for data transfer between languages, while promoting interoperability
through reuse of common data types.  \RWebServices{} also supports a
richer object model, capturing the use of \R{} \emph{attributes} to
convey object information, e.g., about dimensions or missing values.
This richer model is not exposed in caBig.

The \Java{} representation of complex \R{} objects (e.g., S4 objects)
are programmatically generated using \R{} language reflection to
identify object structure (\R{} slots) in terms of basic \R{} types.
Limitations to this approach are indicated below. Additional \R{}
class structures can also be represented in \Java. For instance, class
unions are an \R{} concept where members of the class union form a
single class, even though they are otherwise unrelated. 
<<classUnionEg>>=
setClass("A", "logical")
setClass("B", "character")
setClassUnion("C", c("A","B"))
@ 
%
An instance of class \class{C} can be assigned either logical or
character values.  This pattern of inheritance cannot be represented
as a single Java object, but \RWebServices{} implements Java
representations of class unions using inspiration from the
\href{http://java.sun.com/blueprints/corej2eepatterns/Patterns/DataAccessObject.html}{Abstract
  Factory} pattern.

\subsubsection{Converters}
A converter handles conversion between a specific pair of \R{} and
\Java{} objects.  There are two components to \RWebServices{}
converters.  A `match' function (e.g.,
\method{RWebServices:::cvtIntegerToJava}) is used for dynamic
dispatch. A convert function (e.g.,
\method{RWebServices:::cvtIntegerToJava}) in \RWebServices{} is
written in \R{}; converters rely on calls to underlying C code (e.g.,
\verb|RIntegerVector_JavaIntArray|) or on \SJava{} functionality to
copy data types between \R{} SEXPs and \Java{} native representations.

Converters for each complex \R{} object is programmatically generated
by recursively visiting the object slots (corresponding to \Java{}
fields) until basic \R{} types are encountered. Converters are
included in the \verb|data| output directory, e.g.,
\verb|biocJavaMaps/CaAffy/data/TypeConverter.R|) and loaded in the
embedded \R{}.

\subsubsection{Limitations}
There are several limitations to the object model and conversion
process outlined here.  \R{} objects can have arbitrary attributes,
but the \RWebServices{} implementation only recognizes attributes
essential for representing data structures to web or analytic services
(e.g., \verb|dim| to describe \verb|RArray| dimensions). The main
reason for restricting \RWebServices{} in this way is that the
resulting \Java{} representation is likely not to be used often. The
implementation is flexible enough that future extensions are possible.

Classes from the the informal 'S3' object system of \R{} do not
contain sufficient information about class structure for programmatic
transformation between \R{} and \Java; these objects can be defined
more formally as S4 objects, and the S4 objects used with \TypeInfo{}
to specify argument and return types.

S4 classes consist of slots specific to the class, and relationships
to other classes; the class system is similar to but richer than that
in Java, allowing multiple inheritance, class unions, etc.
\RWebServices{} captures the entire data representation of S4 objects,
but does not contain information about class relations. For instance,
in the following example
<<S4-limits>>=
setClass("D", representation=representation(x="numeric"))
setClass("E", contains="D", representation=representation(y="numeric"))
@ 
%
An \R{} instance of class \class{D} has two slots x, y; information
about the inheritance of x is contained in the class definition of
\class{D}, but the structure of instances of \class{D} does not include this
information.  The \Java{} representation of class \class{D} created by
\RWebServices{} has two fields x and y, but no knowledge of the class
hierarchy that these slots represent in \R{} because \Java{} requires
single inheritance.  This is a satisfactory solution for present
purposes, since the data contained in the \Java{} instance is
sufficient for data transformation. A development might more fully
leverage single inheritance in \Java{} to represent classes with only
single inheritance in \R.

\RWebServices{} allows \R{} objects to be represented in \Java, but
does not provide facilities for automatically representing \Java{}
objects as \R{} classes. This is satisfactory for the goal of exposing
\R{} functions and data object as web or analytic services.

\subsection{Function invocation}
Invocation of \R{} functions is initiated in the \Java{} API created
by \RWebServices{} (e.g., \texttt{service / bioconductor}). This API
initializes and uses \SJava{} facilities. \SJava{} embeds \R{} in the
\Java{} virtual machine as a shared library. \SJava{} mediates
interactions with the embedded \R{} through instances of the \Java{}
classes \class{ROmegahatInterpreter} and \class{REvaluator}. The
\Java{} API uses \class{REvaluator} to establish the environment for
\R{} function evaluation, including loading \R{} packages required for
function evaluation and installing converter functions. The \Java{}
virtual machine is now able to invoke \R{} functions.

The interface to \R{} functions starts at the main API. The main API
invokes the package-level (e.g., \pkg{CaAffy}) \Java{} representations
of the \R{} function. The package-level representation invokes
\method{REvaluator.call()}.  This method takes as arguments a
character string representing the \R{} function name and a \Java{}
\class{Object[]} containing \Java{} representations of input
parameter, and returns a \Java{} \class{Object}.
\method{REvaluator.call} invokes necessary data translators for data
transfer to and from \R{}, and arranges for \R{} function evaluation
of appropriate arguments. Input parameters and return types of
\method{REvaluator.call()} are generic; type coercion takes place in
the package-level \Java{} representations.

Error handling facilities are available. Errors triggering the
exception handling system in \R{} during function evaluation or type
conversion are propagated as \Java{} exceptions, and returned to the
\Java{} virtual machine.  Serious \R{} faults (e.g.,segmentation
faults) trigger \Java{} exceptions that are also propagated.

The implementation has several limitations. Callbacks to \Java{} from
\R{} are not yet tested. \SJava{} implements the concept of foreign
language references, where functions in one language operate on
references to complex data types in the other language, rather than on
the data itself. The \RWebServices{} implementation has not yet taken
advantage of this feature.

Finally, \R{} is not thread safe, so that each \Java{} virtual machine
can have at most one instance of \R{}. This requires that evaluation
of several functions must occur sequentially.  One solution is to use
multiple \Java{} processes in a coordinated fashion, e.g., using the
\Java{} Message Service.
  

\section{Next steps: Exposing \R{} as web and analytic services}
The forgoing sections have described how \R{} data types and functions
are exposed to \Java{} applications. There are well-established
mechanisms to facilitate the transformation of stand-alone \Java{}
applications to web or analytic services. For example, Apache Axis
tools generate WSDL from stand-alone applications, and web services
layers from WSDL. Likewise, the caGrid tool Introduce coupled with
caDSR tools for semantic annotation allow generation of analytic
services from stand-alone \Java{} applications.

%

%% TODO: (selected) items that cannot be translated.

%% TODO: indicate ability to pass (nearly) arbitrary objects (e.g.,
%% binary objects representing images).

%% TODO: cleaner ending

%% \section{Deploying web services}

%% This portion of the documentation is in preparation.

%% TODO: Patrick McConnell

%% TODO: suck text from TypeInfoNews, 

%% modify intro to stress importance of type info for easy exposure as web service, drop hints & tips

\end{document}