\iffalse meta-comment File: l3styleguide.tex Copyright (C) 2011,2012,2014-2024 The LaTeX Project It may be distributed and/or modified under the conditions of the LaTeX Project Public License (LPPL), either version 1.3c of this license or (at your option) any later version. The latest version of this license is in the file https://www.latex-project.org/lppl.txt This file is part of the "l3kernel bundle" (The Work in LPPL) and all files in that bundle must be distributed together. The released version of this bundle is available from CTAN. \fi \documentclass{l3doc} \title{% The \LaTeX3 kernel: style guide for code authors% } \author{% The \LaTeX{} Project\thanks {% E-mail: \href{mailto:latex-team@latex-project.org}% {latex-team@latex-project.org}% }% } \date{Released 2024-12-25} \begin{document} \maketitle \tableofcontents \section{Introduction} This document is intended as a style guide for authors of code and documentation for the \LaTeX3 kernel. It covers both aspects of coding style and the formatting of the sources. The aim of providing these guidelines is help ensure consistency of the code and sources from different authors. Experience suggests that in the long-term this helps with maintenance. There will of course be places where there are exceptions to these guidelines: common sense should always be applied! \section{Documentation style} \LaTeX3 source and documentation should be written using the document class \cls{l3doc} in \file{dtx} format. This class provides a number of logical mark up elements, which should be used where possible. In the main, this is standard \LaTeX{} practice, but there are a few points to highlight: \begin{itemize} \item Where possible, use \cs{cs} to mark up control sequences rather than using a verbatim environment. \item Arguments which are given in braces should be marked using \cs{Arg} when code-level functions are discussed, but using \cs{marg} for document functions. \item The names \TeX{}, \LaTeX{}, \emph{etc}.\ use the normal logical mark up followed by an empty group (|{}|), with the exception of |\LaTeX3|, where the number should follow directly. \item Where in line verbatim text is used, it should be marked up using the \verb=|...|= construct (\emph{i.e.}~vertical bars delimit the verbatim text). \item In line quotes should be marked up using the \cs{enquote} function. \item Where numbers in the source have a mathematical meaning, they should be included in math mode. Such in-line math mode material should be marked up using |$...$| and \emph{not} |\(...\)|. \end{itemize} Line length in the source files should be under $80$ characters where possible, as this helps keep everything on the screen when editing files. In the \file{dtx} format, documentation lines start with a \texttt{\%}, which is usually followed by a space to leave a \enquote{comment margin} at the start of each line. As with code indenting (see later), nested environments and arguments should be indented by (at least) two spaces to make the nature of the nesting clear. Thus for example a typical arrangement for the \env{function} environment might be \begin{verbatim*} \begin{function}{\seq_gclear:N, \seq_gclear:c} \begin{syntax} \cs{seq_gclear:N} \meta{sequence} \end{syntax} Clears all entries from the \meta{sequence} globally. \end{function} \end{verbatim*} The \enquote{outer} \verb*|% \begin{function}| should have the customary space after the |%| character at the start of the line. In general, a single \env{function} or \env{macro} environment should be used for a group of closely-related functions, for example argument specification variants. In such cases, a comma-separated list should be used, as shown in the preceding example. \section{Format of the code itself} The requirement for fewer than $80$ characters per line applies to the code itself as well as the surrounding documentation. A number of the general style principles for \LaTeX3 code apply: these are described in the following paragraph and an example is then given. With the exception of simple runs of parameter (|{#1}|, |#1#2|, \emph{etc.}), everything should be divided up using spaces to make the code more readable. In general, these will be single spaces, but in some places it makes more sense to align parts of the code to emphasise similarity. (Tabs should not be used for introducing white space.) Each conceptually-separate step in a function should be on a separate line, to make the meaning clearer. Hence the \texttt{false} branch in the example uses two lines for the two auxiliary function uses. Within the definition, a two-space indent should be used to show each \enquote{level} of code. Thus in the example \cs{tl_if_empty:nTF} is indented by two spaces, but the two branches are indented by four spaces. Within the \texttt{false} branch, the need for multiple lines means that an additional two-space indent should be used to show that these lines are all part of the brace group. The result of these lay-out conventions is code which in general looks like the example: \begin{verbatim*} \cs_new:Npn \module_foo:nn #1#2 { \tl_if_empty:nTF {#1} { \module_foo_aux:n { X #2 } } { \module_foo_aux:nn {#1} {#2} \module_foo_aux:n { #1 #2 } } } \end{verbatim*} \section{Code conventions} All code-level functions should be \enquote{long} if they accept any arguments, even if it seems \enquote{very unlikely} that a \cs{par} token will be passed. Thus \cs{cs_new_nopar:Npn} and so forth should only be used to create interfaces at the document level (where trapping \cs{par} tokens may be appropriate) or where comparison to other code known not to be \enquote{long} is required (\emph{e.g.}~when working with mixed \LaTeXe{}/\pkg{expl3} situations). The expandability of each function should be well-defined. Functions which cannot be fully expanded must be \texttt{protected}. This means that expandable functions must themselves only contain expandable material. Functions which use any non-expandable material must be defined using \cs{cs_new_protected:Npn} or similar. When using \cs{cs_generate_variant:Nn}, group related variants together to make the pattern clearer. A common example is variants of a function which has an \texttt{N}-type first argument: \begin{verbatim} \cs_generate_variant:Nn \foo:Nn { NV , No } \cs_generate_variant:Nn \foo:Nn { c , cV , co } \end{verbatim} There may be cases where omitting braces from \texttt{o}-type arguments is desirable for performance reasons. This should only be done if the argument is a single token, thus for example \begin{verbatim} \tl_set:No \l_some_tl \l_some_other_tl \end{verbatim} remains clear and can be used where appropriate. \section{Private and internal functions} Private functions (those starting \cs{__}) should not be used between modules. The only exception is where a \enquote{family} of modules share some \enquote{internal} methods: this happens most obviously in the kernel itself. Any internal functions or variables \emph{must} be documented in the same way as public ones. The \pkg{DocStrip} method should be used for internal functions in a module. This requires a line \begin{quote} \ttfamily \%<@@=\meta{module}> \end{quote} at the start of the source (\texttt{.dtx}) file, with internal functions then written in the form \begin{verbatim} \cs_new_protected:Npn \@@_function:nn #1#2 ... \end{verbatim} \subsection{Access from other modules} There may be cases where it is useful to use an internal function from a third-party module (this includes cases where you are the author of both but they are not part of the same \enquote{family}). In these cases, you should \emph{copy} the definition of the internal function to your code: this avoids relying on non-documented interfaces. At the same time, it is strongly encouraged that you discuss your requirements with the author of the code you need to access. The best long-term solution to these cases is for new documented interfaces to be added to the parent module. \subsection{Access to primitives} As \pkg{expl3} is still a developing system, there are places where direct access to engine primitives is required. These are all marked as \enquote{do not use} in the code and so require special handling. Where a programmer is sure that they need to use a primitive (for example where the team have not yet covered access to an area) then a local copy of the primitive should be made, for example \begin{verbatim} \cs_new_eq:NN \__module_message:w \tex_message:D % ... \cs_new_protected:Npn \__module_fancy_msg:n #1 { \__module_message:w { *** #1 *** } } \end{verbatim} This approach makes it possible for the team and others to find such usage (by searching for the \texttt{:D} argument type) but avoids multiple uses in general code. At the same time, the team ask that these use cases are raised on the \texttt{LaTeX-L} mailing list. The team are keen to collect use cases for areas that have not yet been addressed and to provide new code where the required interfaces become clear. Programmers using primitives should be ready to make updates to their code as the team develop additional interfaces. \section{Auxiliary functions} In general, the team encourages the use of descriptive names in \LaTeX3 code. Thus many helper functions would have names which describe briefly what they do, rather than simply indicating that they are auxiliary to some higher-level function. However, there are places where one or more \texttt{aux} functions are required. Where possible, these should be differentiated by signature \begin{verbatim} \cs_new_protected:Npn \@@_function:nn #1#2 { ... } \cs_new_protected:Npn \@@_function_aux:nn #1#2 { ... } \cs_new_protected:Npn \@@_function_aux:w #1#2 \q_stop { ... } \end{verbatim} Where more than one auxiliary shares the same signature, the recommended naming scheme is \texttt{auxi}, \texttt{auxii} and so on. \begin{verbatim} \cs_new_protected:Npn \@@_function_auxi:nn #1#2 { ... } \cs_new_protected:Npn \@@_function_auxii:nn #1#2 { ... } \end{verbatim} The use of \texttt{aux_i}, \texttt{aux_ii}, \emph{etc.}\ is discouraged as this conflicts with the convention used by \cs{use_i:nn} and related functions. \section{Functions with `weird' arguments} When defining commands that do not follow the usual convention of accepting arguments as single-tokens or braced-text, the \verb|w| argument specifier is used to denote that the function signature cannot fully describe the syntax. Two examples from the \LaTeX3 kernel are: \begin{quote} \verb|\use_none_delimit_by_q_stop:w| $\langle$\,\emph{text}\,$\rangle$ \verb|\q_stop|\\ \verb|\use_i_delimit_by_q_stop:nw| \char`\{ $\langle$\,\emph{arg}\,$\rangle$\char`}\,$\langle$\,\emph{text}\,$\rangle$ \verb|\q_stop| \end{quote} More complex definitions are possible if commands are to parse tokens, such as the internal kernel command \begin{verbatim} \cs_new_protected:Npn \__clist_get:wN #1 , #2 \q_stop #3 { \tl_set:Nn #3 {#1} } \end{verbatim} When the \verb|w| specifier is being used, it is encouraged not to try and complicate the rest of the signature too much---for example, it would be considered poor style to have a function with a signature like \verb|\foo_bar:wnw| unless there were very clear reasons of code clarity. A signature such as \verb|:ww| would certainly be discouraged. Examining the examples above, it can be seen that there are scenarios in which it may make logical sense for having a signature such as \verb|:wN| or \verb|:nw|, but when in doubt the recommended approach is to simply use \verb|:w| as a catch-all. \end{document}