---
abstract: |
TeXaccents is a standalone utility designed to convert legacy (La)TeX ligatures and codes for "accented" characters to Unicode equivalents (text mode, no math) . For example, `\={a}` ('a' with macron) will be converted to `ā`.
author:
- "Guido Milanese[^1]"
date: 17^th^ September 2022
lang: en
title: |
TeXaccents\
version 1.0.1
fontfamily: libertine
fontsize: 12pt
---
# General information
Even if modern compilers handle Unicode encoding, (La) and files
featuring "legacy" encoding for non-Ascii characters are still very
common, and users may need to incorporate old code into new texts that
make use of modern text encoding.
Several utilities are available online that claim to be able to convert
legacy (La) encoding to standard Unicode. See:
- *Simple LaTeX to Text Converter*. A complex programme, able to deal
with maths. Insofar as non-Ascii chars are concerned, it fails
sometimes, at least according to my tests. See
. Written
in Python.
- *LaTeX handler*. Converts non-Ascii (La) encoding to Unicode.
However, it does not seem to be able to deal with the legacy
encoding, e.g. `{\a}` instead of `\{a}` or `\a`. It does not convert
simple ligatures as `\ae{}` `\oe{}`. I used the tables provided by
this programme as a starting point. Written in Python. See
.
- *Pandoc* is the standard programme for any text format conversion
(). It converts almost all the accents (thorn
and eth missing?), but (if I have checked this correctly) normalises
files stripping non-standard fields. This can be a problem for
scholars who frequently use non-standard fields, such as e.g.
"shorttitle", required by not a few bibliographic styles.
*TeXaccents* should be able to transform (La) normal text or "accents"
(not "math" accents) to their Unicode equivalent. The programme deals
with the following codes (*not all the fonts are able to output all the
required Unicode glyphs of this table!*):
| NAME | \tex | Unicode |
|--------------- |------- |---------|
| Umlaut | \"{a} | ä |
| acute | \'{a} | á |
| double acute | \H{a} | a̋ |
| grave | \`{a} | à |
| circumflex | \^{a} | â |
| caron hraceck | \v{a} | ǎ |
| breve | \u{a} | ă |
| cedilla | \c{c} | ç |
| dot | \.{a} | ȧ |
| dot under | \d{a} | ạ |
| ogonek | \k{a} | ą |
| tilde | \~{a} | ã |
| macron | \={a} | ā |
| bar under | \b{a} | a̱ |
| ring over | \r{a} | å |
The programme should recognize the following varieties:
::: {.center}
`\'a` -- `\'{a}` -- `{\'a}` -- `{{\'a}}`
:::
It transforms also the encoding for : `æ œ Æ Œ ð Ð þ Þ ø Ø ł Ł`.
Checking the page
I could
not find a legacy text mode encoding for:
**ƀ Ƀ đ Đ ǥ Ǥ ħ Ħ ɨ Ɨ ŧ Ŧ ƶ Ƶ** (some chars are accessible in math
mode).
# Setup
## From source
The programme is written in Snobol
( or
) and should run on any platform.
Steps:
1. Install Snobol4 (version 2.3, March 2022) from
. Make sure to
install the compiler in a folder listed in your `PATH` or add the
folder to your path. On Linux the folder `snobol4` is installed
under `/usr/local/bin/`, which is normally listed in the PATH of a
standard Linux system.
2. Test the compiler running `snobol4` from the command line. Leave the
compiler with `Ctr-C` or writing `end`.
3. Copy `texaccents.sno` and all the provided `*.inc` files
> `compiler.inc` `delete.inc` `grepl.inc` `newline.inc` `systype.inc`
to a folder of your choice (e.g. `/home//bin`).
4. In this folder, run
`snobol4 texaccents.sno testaccents-in testaccents-out` to test the
programme. The test file contains all the accents listed above. See
the result typing `cat testaccents-out` (Unixes / Powershell) or
`type testaccents-out` (Windows/Dos prompt), or open the file with
your text editor. The output file name is just a suggestion, of
course.
## Windows standalone version
If preferred, a Windows EXE standalone file is provided. It was compiled using
Spitbol (see ); the source code has been slightly adapted to Spitbol (basically only input/output syntax). From any directory, run `texaccents.exe INPUT OUTPUT`. To test the programme, run `texaccents.exe testaccents-in testaccents-out`. As above, the output file name is just a suggestion.
# History
- 25^th^ July 2022. First version (after trying unsuccesfully to
convert an old file with existing utilities)
- 17^th^ August 2022. First complete version (0.9).
- 27^rd^ August 2022. This version (1.0) with documentation and
comments.
- 17^th^ September 2022. Windows standalone executable. Manual page written.
Version message added; help message improved. In the source, a regular
shebang according to the recommendation of CTAN
() was added. Documentation updated
accordingly.
# Contacts / todo
Bugs / suggestions / improvements: please write to
[guido.milanese\@unicatt.it](guido.milanese@unicatt.it) using
*TeXaccents* as subject of the mail.
Genoa, Italy, 17^th^ September 2022
[^1]: Università Cattolica d.S.C., Dipartimento di scienze storiche e
filologiche, via Trieste 17, I-25121 Brescia