%\documentclass[nojss]{jss} \documentclass[]{article} \usepackage{verbatim} \usepackage{url} \usepackage{booktabs} \usepackage{amsmath} \usepackage{amssymb} \usepackage{fullpage} \usepackage{verbatim} \usepackage{hyperref} \usepackage{enumitem} \usepackage[nounderscore]{syntax} \setlength{\grammarparsep}{2pt plus 1pt minus 1pt} % increase separation between rules \setlength{\grammarindent}{6em} % increase separation between LHS/RHS % Grammar Definitions \newcommand{\ntelem}[1] {{\textless \textit{#1}\textgreater}} \newcommand{\ntgram}[1] {{\texttt{<{#1}>}}} % Rewriting JSS commands \newcommand{\code}[1] {{\texttt{#1}}} \newcommand{\pkg}[1] {{\bf{#1}}} \newcommand{\proglang}[1] {{\textsf{#1}}} <>= #options(width=70) #options(prompt="R> ") options(warn = (-1)) @ <>= set.seed(0) @ %opening %\VignetteIndexEntry{Grammatical evolution: A tutorial using gramEvol} \title{Grammatical Evolution: A Tutorial using \pkg{gramEvol}} \author{Farzad Noorian, Anthony M. de Silva, Philip H.W. Leong} \begin{document} %\SweaveOpts{concordance=TRUE} %\VignetteEngine{knitr::knitr} \maketitle %\begin{abstract} %\end{abstract} \section{Introduction} Grammatical evolution (GE) is an evolutionary search algorithm, similar to genetic programming (GP). It is typically used to generate programs with syntax defined through a grammar. The original author's website \cite{o2001grammatical} is a good resource for a formal introduction to this technique: \begin{itemize} \item \url{http://www.grammatical-evolution.org/} \end{itemize} This document serves as a quick and informal tutorial on GE, with examples implemented using the \pkg{gramEvol} package in \proglang{R}. \section{Grammatical Evolution} The goal of using GE is to automatically generate a program that minimises a cost function: \begin{enumerate} \item A \emph{grammar} is defined to describe the syntax of the programs. \item A \emph{cost function} is defined to assess the quality (the \emph{cost} or \emph{fitness}) of a program. \item An \emph{evolutionary algorithm}, such as GA, is used to search within the space of all programs definable by the grammar, in order to find the program with the lowest cost. \end{enumerate} Notice that by a \emph{program}, we refer to any sequence of instructions that perform a specific task. This ranges from a single expression (e.g., \code{sin(x)}), to several statements with function declarations, assignments, and control flow. The rest of this section will describe each component in more details. \subsection{Grammar} A grammar is a set of rules that describe the syntax of sentences and expressions in a language. While grammars were originally invented for studying natural languages, they are extensively used in computer science for describing programming languages. \subsubsection{Informal introduction to context-free grammars} \label{sec:informal_grammar} GE uses a \emph{context-free grammar} to describe the syntax of programs. A grammar in which the rules are not sensitive to the sentence's context is called a \emph{context-free grammar} (CFG), and is defined using a collection of \emph{terminal} symbols, \emph{non-terminal} symbols, \emph{production rules}, and a \emph{start} symbol \cite{Sethi1986Compiler}: \begin{itemize} \item Terminal symbols are the lexicon of the language. \item Non-terminal symbols are used to describe the class of words in the language, or \emph{variables} that can take different values. For example, a \ntelem{subject}, a \ntelem{verb}, or an \ntelem{object}. \item A production rule defines what symbols replace a non-terminal. For example, each of the four following lines is a production rule: \begin{grammar} ::= \ntelem{subject} \ntelem{verb} \ntelem{object}. | \ntelem{subject} \ntelem{verb}. \hspace*{\fill} (1.a), (1.b) ::= I | You | They \hspace*{\fill} (2.a), (2.b), (2.c) ::= read | write | check \hspace*{\fill} (3.a), (3.b), (3.c)