\MakeShortVerb{\"}
\newcommand{\Java}{Java}
\newcommand{\javaTeX}{java\TeX}

\title{The \javaTeX\ project \& \texttt{web2java}}%
%\thanks{This document --- possibly extended and updated ---
%can be retrieved from
%\texttt{ftp://ftp.maths.tcd.ie/pub/TeX/javaTeX/javaTeX.tex}}}

\author{Timothy Murphy\\
\texttt{<tim@maths.tcd.ie>}\\
School of Mathematics, Trinity College Dublin}
\begin{Article}
%\date{29 July 1996}

\begin{abstract}
The general aim of the \javaTeX\ project
is to examine the relation --- if any ---
between \Java\ and \TeX,
and in particular (as a first step) to translate
the standard \TeX\ programs ("tangle", "tex", "mf", etc)
into \Java, using "web2java" ---
a simple modification of the standard Unix\TeX{} utility "web2c".
\end{abstract}

\section{Why Java?}

Java is a dialect of C;
and 95\% of the code produced by "web2java"
is indistinguishable from C.
At the same time, Java is much safer than C;
for example, run-time checking of array-bounds
catches many errors that would be missed by C.

Java is object-oriented, which is of course a Good Thing.
Although in fact we have made very little use of this ---
apart from hiving off a few functions
(I suppose we should say ``methods''!)
like "reset" and "rewrite" into "TeXlib.class".
But it would be nice to have a generic DVI driver
which sent ``messages'' to an abstract "printer",
which could be ``instantiated'' as a PostScript printer,
a screen viewer, etc.

Java has a simple graphics interface,
although again we have made no use of this to date.
It might prove \emph{too} simple for the purposes of \TeX.

Java also has a simple network interface,
which could prove useful
in integrating \TeX\ into WWW (the World-Wide Web).

A theoretical advantage of Java is that
it compiles into device-independent ``bytecode'' format,
so that, for example, "tangle.class" compiled on a Sun
should run under Windows 95.

There remains the intriguing possibility of DVI ``Applets'',
as a way of putting \TeX\ on the Web.

Against all these advantages there is at present
one overwhelming disadvantage ---
Java is extremely slow.
For example, it takes about 40 times as long
to "tangle" "tex.web" in Java as it does in C.
Doubtless this ratio can be somewhat reduced
by better coding;
but it seems improbable that it could be improved
more than four-fold.

Our hope must be that a true Java compiler
will soon be available, converting Java into machine code.
Since Java is a standard programming language
there seems no reason why such a compiler should not emerge;
we would exect to see one before the end of the year.

\section{How \texttt{web2java} works}

"Web2java" --- like "web2c" ---
is a post-processor to "tangle".
To create "foo.java" from "foo.web" and "foo.ch"
one first runs "tangle":
\begin{verbatim}
    tangle foo.web foo.ch
\end{verbatim}
This creates the Pascal (or pseudo-Pascal) file "tangle.p".

(If you like driving in the slow lane,
you could run the Java tangle instead:
\begin{verbatim}
    java javaTeX.tangle foo.web foo.ch
\end{verbatim}
Class files are supposedly machine independent,
so "tangle.class" from the \javaTeX{} distribution
should run on any system.
Note that this file, like all \javaTeX{} programs,
is defined to be in the "javaTeX" package,
and so must be placed in a subdirectory called "javaTeX"
relative to the "CLASSPATH".)

This file is then passed through "web2java" to create "foo.java":
\[
"foo.web" + "foo.ch" \xrightarrow{"tangle"} "foo.p" 
\xrightarrow{"web2java"} "foo.java".
\]
Actually, this is a slight oversimplification.
The file "common.defines" is prepended to "foo.p"
\emph{before} passing through "web2java".
And the resulting file is passed through "javafix.perl"
\emph{after} "web2java":
\[
\begin{array}{l}
"common.defines" + "foo.p" \xrightarrow{"web2java"}\quad \mid \\
\xrightarrow{"javafix.perl"} "foo.java".
\end{array}
\]
All this is completely analagous to "web2c",
except that we have replaced the C program "fixwrites.c"
by the Perl script "javafix.perl".

\section{The \texttt{web2java} program}

The filter "web2java" is created
by the programs "flex" and "bison" (or "lex" and "yacc")
from the files "web2java.l" and "web2java.y".
This is completely analagous to "web2c".

The file "web2java.l" is identical to "web2c.l",
with the addition of 5 new tokens:
"try", "catch", "finally", "new" and "cast".

The syntax description in "web2java.y"
has rather more changes, compared with "web2c.y".

On the plus side, since Java has no pointers
all the pointer-related material has been deleted.
There is no attempt to determine if a function argument
is ``formal var'' or not;
and no need therefore to re-name functions with such arguments.

Classes have been introduced in a rather crude way,
by allowing
\begin{verbatim}
    VARIABLE = VARIABLE '.' VARIABLE
\end{verbatim}
Thus in "Math.abs(...)" "Math" has been defined
as a variable ("@define var Math;").

On the other hand, in
\begin{verbatim}
    web_file: Data_Input_Stream;
\end{verbatim}
the class "DataInputStream" is defined as a type
("@define type DataInputStream;").

No class (so far) has appeared in both r\^oles,
so no conflict has arisen.
(An earlier version, with "class" and "object" tokens,
proved too complicated.)


\section{Implementation}

For the most part Web to Java is if anything 
simpler than Web to C.
One apparent difficulty is the lack of a pre-processor in Java,
since "web2c" leaves a good deal of work to "cpp".
This means that more must be done in the change file,
which is probably a Good Thing.

The 3 main issues which arise are:
\begin{itemize}
\itemsep=0.25\itemsep
\item The absence of "goto"s in Java;
\item The lack of "typedef"s in Java; and
\item Input/Output.
\end{itemize}
These are discussed in the following 3 subsections.

\subsection{Removing \texttt{goto}'s}

Java has no "goto";
Perhaps in compensation,
it allows "break" and "continue" statements
to carry a \emph{label}, as for example in
"break lab21" or "continue lab3".
The corresponding labels
must appear at the beginning of the loop in question.
(A "break" label can also be attached to a "switch" statement,
but we make no use of that.)

We implement this as follows:
"goto <num>" or "goto +<num>" are converted to "break lab",
as for example
\[
"goto 6" \mapsto "break lab6",\quad
"goto +17" \mapsto "break lab17".
\]
On the other hand "goto -<num>"
is converted to "continue lab", as for example
\[
"goto -7" \mapsto "continue lab7".
\]
Finally, "goto 0" is converted to "break":
\[
"goto 0" \mapsto "break".
\]

Labels of the form $\pm$"<num>:" are commented out:
\[
"-20:" \mapsto "/* lab20 */".
\]
As we shall see, this allows old labels to be left.

In practice, new labels are needed in virtually all cases.
Here is a simple example
\begin{verbatim}
    while n<>2 do begin
      ... goto done ...
    end;
    done: ...
\end{verbatim}
Here "done" is defined at the beginning of the web file:
\begin{verbatim}
    @d done=30 {go here to exit a loop}
\end{verbatim}
We alter this (in the change file, of course) to
\begin{verbatim}
    @d Done=30 {go here to exit a loop}
    @d done==+Done
\end{verbatim}
and we modify the loop above to
\begin{verbatim}
    Done: while n<>2 do begin
      ... goto done ...
    end;
    done: ...
\end{verbatim}
This translates in the Pascal file to
\begin{verbatim}
    30: while n<>2 do begin
      ... goto +30 ...
    end;
    -30: ...
\end{verbatim}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% move this so that the figure comes out in an appropriate place
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\input{java-goto}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
and this in turn translates into the Java code
\begin{verbatim}
    lab30: while n<>2 do begin
      ... break lab30 ...
    end;
    /* lab30: */ ...
\end{verbatim}

It will be seen that in this case only one line needed to be changed,
by the addition of the label "Done:".

More often, 2 lines need to be changed.
The following is a typical case.
\begin{verbatim}
    reswitch: case ch of
      'x': ... goto reswitch ...
    endcases;
\end{verbatim}
At the beginning of the web file we read
\begin{verbatim}
    @d reswitch=21 {go here to start a case
                            statement again}
\end{verbatim}
We change this to
\begin{verbatim}
    @d Reswitch=21 {go here to start a case
                            statement again}
    @d reswitch==-Reswitch
    @d break==goto 0
\end{verbatim}
Now we alter the switch statement to
\begin{verbatim}
    Reswitch: loop begin case ch of
      'x': ... goto reswitch ...
    endcases; break end;
\end{verbatim}
In the Pascal code this becomes
\begin{verbatim}
    31: loop begin case ch of
      'x': ... goto -31 ...
    endcases; goto 0 end;
\end{verbatim}
which "web2java" converts to
\begin{verbatim}
    lab31: while (true) { switch(ch) {
      case 'x': ... continue lab31 ...
    } break; }
\end{verbatim}

In practice all "goto" statements in the `classic' web files
can be dealt with in this way,
introducing an appropriate loop if necessary.
One can imagine cases where this would be very difficult,
if not impossible.
Fortunately these do not arise in practice.
In fact virtually all "goto" statements can be eliminated as above,
by following a very small number of practical rules.

Figure~\ref{fig:dvitype} is a fairly complicated case, from "dvitype.ch".
Recall that the material in "dvitype.web" between "@x" and "@y"
is replaced by the material between "@y" and "@z".

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% this figure relates to something a little later on; placed here to
%%% force it to come atop the proper case
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\input{java-io}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{Type definitions}

There are no "typedef"s in Java.
In theory one could replace "typedef"s
by class definitions,
but that would add considerable complication to the code.
Instead we simply change them to substitutions
(as though in C changing "typedef"s to "#define"'s).

So for example we make the change
\begin{verbatim}
    @x 
    @<Types...@>=
    @!ASCII_code=0..255; 
    @y
    @d ASCII_code==0..255
    @z
\end{verbatim}
Later "web2java" will replace this range "0..255"
by an appropriate type (currently "int").
This entails some changes in "web2java.y",
to allow ranges for procedure and function parameters,
as eg in
\begin{verbatim}
    procedure p(x:0..255);
\end{verbatim}

Presently all ranges are replaced by "int",
since Java is rather strict about type conversion,
and requires casting where C does not.

\subsection{Input/Output}

On the whole, Java I/O is closer to Pascal syntax than is C.
Thus
\begin{verbatim}
    write_ln(term_out, 'value is ', v);
\end{verbatim}
in Pascal becomes
\begin{verbatim}
    System.out.println("value is " + v);
\end{verbatim}
in Java.

However, we follow "web2c" in leaving this translation
to the post-processor ---
in our case the script "javafix.perl".

This Perl script looks for `words' starting with "jT",
and deals only with these.
Thus the translation above is implemented through
the definitions
\begin{verbatim}
    @d term_out == System.out
    @d write_ln == jT_print_ln
\end{verbatim}

The only unusual feature of Java I/O
is that most I/O statements must be contained
in a "try" statement,
which must be followed by a "catch" statement
to catch any I/O `errors'.
However, this is perfectly straightforward,
as may be seen in figure~\ref{fig:exceptions-io}, which shows an I/O
function from "dvitype.ch".

For simplicity, "reset" and "rewrite" are defined 
in the `\TeX{} library class' "TeXlib.java".
Thus
\begin{verbatim}
    reset(web_file);
\end{verbatim}
is replaced in "tangle.ch" by
\begin{verbatim}
    web_file:=TeXlib.reset(web_name);
\end{verbatim}

\section{Conclusions}

Writing a Java change file is a straightforward exercise,
following the lines suggested above.

Whether it is time well-spent is another matter!
As we have said, the resulting programs
are painfully slow;
and the viability of the exercise must depend
on the development of true Java compilers.

\subsection{Project details}

All the material in the project is in the public domain,
and can be retrieved by anonymous FTP from
"ftp://ftp.maths.tcd.ie/pub/TeX/javaTeX".
(It is hoped later to submit the programs to the CTANs.)

Any suggestions and contributions are very welcome.

A mailing list has been set up.
To join this, email "majordomo@maths.tcd.ie"
with the message (not heading)
\begin{verbatim}
    subscribe javatex
\end{verbatim}
The material forming the project is also available
through the mailing-list.
You can obtain this file, for example, 
by emailing "majordomo@maths.tcd.ie" with the message
\begin{verbatim}
    get javatex javaTeX.tex
\end{verbatim}
To obtain a list of available files, send the message
\begin{verbatim}
    index javatex
\end{verbatim}
For information on the working of "majordomo" send the message
\begin{verbatim}
    help
\end{verbatim}
to "majordomo@maths.tcd.ie".

\end{Article}