---
title: "Overview: Creating FFTs with FFTrees"
author: "Nathaniel D. Phillips and Hansjörg Neth"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: fft.bib
csl: apa.csl
vignette: >
  %\VignetteIndexEntry{Overview: Creating FFTs with FFTrees}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, echo = FALSE}
knitr::opts_chunk$set(collapse = FALSE, 
                      comment = "#>", 
                      prompt = FALSE,
                      tidy = FALSE,
                      echo = TRUE, 
                      message = FALSE,
                      warning = FALSE,
                      # Default figure options:
                      dpi = 100, 
                      fig.align = 'center',
                      fig.height = 6.0,
                      fig.width  = 6.5, 
                      out.width = "580px")
```

```{r pkgs, echo = FALSE, message = FALSE, results = 'hide'}
library(FFTrees)
library(dplyr)
library(testthat)
library(tidyselect)
library(magrittr)
library(knitr)
```

```{r urls, echo = FALSE, message = FALSE, results = 'hide'}
# URLs:
url_pkg_CRAN   <- "https://CRAN.R-project.org/package=FFTrees"
url_pkg_GitHub <- "https://github.com/ndphillips/FFTrees"
url_pkg_issues <- "https://github.com/ndphillips/FFTrees/issues"

url_JDM_issue <- "https://journal.sjdm.org/vol12.4.html"
url_JDM_html  <- "https://journal.sjdm.org/17/17217/jdm17217.html"
url_JDM_pdf   <- "https://journal.sjdm.org/17/17217/jdm17217.pdf"

url_JDM_doi <- "https://doi.org/10.1017/S1930297500006239"

email_contact <- "Nathaniel.D.Phillips.is@gmail.com"
url_contact   <- "https://www.linkedin.com/in/nathanieldphillips/"
```

<!-- Brief intro: -->

The R package **FFTrees** [@phillips2017FFTrees; @FFTrees-pkg] makes it easy to create, visualize, and evaluate fast-and-frugal decision trees\ (FFTs). 
FFTs are simple and transparent decision algorithms for solving binary classification problems in an\ effective and efficient fashion. 


## Fast-and-Frugal Trees (FFTs) 

<!-- Defining FFTs: -->

A _fast-and-frugal tree_ (FFT) [@martignon2003naive] is a set of hierarchical rules for solving binary classification tasks based on very little pieces of information (usually using\ 4 or fewer cues). 
In contrast to more complex decision trees, each node of an\ FFT has exactly two branches. 
A branch can either contain another cue (i.e., ask another question) or lead to an exit (i.e., yield a decision or prediction outcome).  
Each non-final node of an\ FFT has one exit branch and the final node has two exit branches. 

<!-- Characteristics and benefits of FFTs: -->

FFTs are simple and effective decision strategies that use minimal information for making decisions in binary classification problems [see @gigerenzer1999fast;@gigerenzer1999good]. 
FFTs are often preferable to more complex decision strategies (such as logistic regression, LR) because they rarely over-fit data [@gigerenzer2009homo] and are easy to interpret, implement, and communicate in real-world settings [@marewski2012heuristic]. 
FFTs have been designed to tackle many real world tasks from making fast decisions in emergency rooms [@green1997alters] to detecting depression [@jenny2013simple]. 

<!-- Emphasize transparency: -->

Whereas their performance and success are empirical questions, a key theoretical advantage of FFTs is their _transparency_ to decision makers and anyone aiming to understand and evaluate the details of an algorithm. 
In the words of @burton2020, "human users could interpret, justify, control, and interact with a fast-and-frugal decision aid" (p.\ 229). 


## Using the FFTrees package

The **FFTrees** package makes it easy to produce, display, and evaluate FFTs [@phillips2017FFTrees]. 
The package's main function is `FFTrees()` which takes formula\ `formula` and dataset\ `data` arguments and returns several FFTs that attempt to classify training cases into criterion classes. 
The FFTs created can then be used to predict new data to cross-validate their performance. 

Here is an example of using the main `FFTrees()` function to fit FFTs to `heart.train` data: 

```{r fft-example, message = FALSE, results = 'hide'}
# Create a fast-and-frugal tree (FFT) predicting heart disease:
heart.fft <- FFTrees(formula = diagnosis ~.,
                     data = heart.train,
                     data.test = heart.test,
                     main = "Heart Disease",
                     decision.labels = c("Healthy", "Diseased"))
```

The resulting `FFTrees` object `heart.fft` contains 7\ FFTs that were fitted to the `heart.test` data. 
To evaluate a tree's predictive performance, we compare its predictions for the un-trained `heart.test` data with their true criterion values. 
Here is how we can apply the best training FFT to the `heart.test` data: 

```{r fig-1, fig.width = 6.5, fig.height = 6.0, out.width = "600px", fig.align = 'center', fig.cap = "A fast-and-frugal tree (FFT) to predict heart disease status."}
# Visualize predictive performance:
plot(heart.fft, data = "test")
```

## Getting started 

To start using the **FFTrees** package, we recommend studying the [Tutorial: Creating FFTs for heart disease](FFTrees_heart.html). 
The tutorial illustrates the basics steps of creating, visualizing, and evaluating fast-and-frugal trees (FFTs). 
The scientific background of FFTs and the development of **FFTrees** are described in @phillips2017FFTrees (doi\ [10.1017/S1930297500006239](`r url_JDM_doi`) | [html](`r url_JDM_html`) | [PDF](`r url_JDM_pdf`)). 
The following vignettes provide details on related topics and corresponding examples. 

### Vignettes

<!-- Table of all vignettes: -->

Here is a complete list of the vignettes available in the **FFTrees** package: 

|   | Vignette | Description |
|--:|:------------------------------|:-------------------------------------------------|
|   | [Main guide: FFTrees overview](guide.html) | An overview of the **FFTrees** package |
| 1 | [Tutorial: FFTs for heart disease](FFTrees_heart.html)   | An example of using `FFTrees()` to model heart disease diagnosis |
| 2 | [Accuracy statistics](FFTrees_accuracy_statistics.html) | Definitions of accuracy statistics used throughout the package |
| 3 | [Creating FFTs with FFTrees()](FFTrees_function.html) | Details on the main `FFTrees()` function |
| 4 | [Manually specifying FFTs](FFTrees_mytree.html)   | How to directly create FFTs without using the built-in algorithms |
| 5 | [Visualizing FFTs](FFTrees_plot.html) | Plotting `FFTrees` objects, from full trees to icon arrays |
| 6 | [Examples of FFTs](FFTrees_examples.html) | Examples of FFTs from different datasets contained in the package |


### Datasets

The **FFTrees** package contains several datasets ---\ mostly from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/)\ --- that allow you to address interesting questions when exploring FFTs: 

- `blood` -- Which people donate blood? [source](https://archive.ics.uci.edu/ml/datasets/Blood+Transfusion+Service+Center)
- `breastcancer` -- Which patients suffer from breast cancer? [source](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic))
- `car` -- Which cars are acceptable? [source](http://archive.ics.uci.edu/ml/datasets/Car+Evaluation)
- `contraceptive` -- Which factors determine whether women use contraceptives? [source](https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice)
- `creditapproval` -- Which factors determine a creditcard approval? [source](https://archive.ics.uci.edu/ml/datasets/Credit+Approval)
- `fertility` -- Which factors predict a fertile sperm concentration? [source](https://archive.ics.uci.edu/ml/datasets/Fertility)
- `forestfires` -- Which environmental conditions predict forest fires? [source](https://archive.ics.uci.edu/ml/datasets/Forest+Fires)
- `heartdisease` -- Which patients suffer from heart disease? [source](https://archive.ics.uci.edu/ml/datasets/Heart+Disease)
- `iris.v` -- Which iris belongs to the class "virginica"? [source](https://archive.ics.uci.edu/ml/datasets/Forest+Fires)
- `mushrooms` -- Which features predict poisonous mushrooms? [source](https://archive.ics.uci.edu/ml/datasets/Mushroom)
- `sonar` -- Did a sonar signal bounce off a metal cylinder (or a rock)? [source](https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks)) 
- `titanic` -- Which passengers survived the Titanic? [source](https://www.encyclopedia-titanica.org) 
- `voting` -- How did U.S. congressmen vote in 1984? [source](https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records)
- `wine` -- What determines ratings of wine quality? [source](https://archive.ics.uci.edu/ml/datasets/Wine)


#### Details about the datasets {-}

When preparing data to be predicted by FFTs, we usually distinguish between several (categorical or numeric) predictors and a (binary) criterion variable. 
**Table\ 1** provides basic information on the datasets included in the **FFTrees** package (see their documentation for additional details). 

**Table\ 1:** Key information on the datasets included in **FFTrees**. 

```{r dataframe for overview data, echo = FALSE}
## Preparations for applying the describe_data() function to all data sets
## When new data sets are included, add their info so that they will also be shown in the vignette-table!

# List all data sets: 
data_list <- list(blood, breastcancer, car, contraceptive, creditapproval, fertility, forestfires, 
                  heartdisease, iris.v, mushrooms, sonar, titanic, voting, wine) 

# Vector with all names of the data sets:
data_names <- c("blood", "breastcancer", "car", "contraceptive", "creditapproval", "fertility", "forestfires", 
                "heartdisease", "iris.v", "mushrooms", "sonar", "titanic", "voting", "wine")

# Vector with all criterion names:
criterion_names <- c("donation.crit", "diagnosis", "acceptability", "cont.crit", "crit", 
                     "diagnosis","fire.crit", "diagnosis", "virginica", "poisonous", "mine.crit", "survived", "party.crit", "type") 

# Vector with criterion values of interest: 
baseline_values <- c(1, TRUE, "acc", TRUE, TRUE, TRUE, TRUE, 
                     TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, "red")

# Use combined lists/vectors and apply describe_data() to each:
result_list <- mapply(describe_data, data = data_list, 
                      data_name = data_names, criterion_name = criterion_names, 
                      baseline_value = baseline_values, SIMPLIFY = FALSE)

# Combine results in df:
combined_result <- do.call(rbind, result_list)

# Round baseline and NA pct values for brevity:
combined_result$Baseline_pct<- round(combined_result$Baseline_pct, 1)
combined_result$NAs_pct<- round(combined_result$NAs_pct, 2)

# Rename columns:
colnames(combined_result) <- c("Dataset name", 
                               "Number of cases",
                               "Criterion name", 
                               "Baseline (`TRUE`,\\ in\\ %)",
                               "Number of predictors",
                               "Number of NAs", 
                               "NAs (in\\ %)")

# Render the table from the data frame
# use as many items per page as we have data sets
# redefine column names as we like them:

knitr::kable(combined_result, format = "html") 
```


## Citing **FFTrees** 

We had a lot of fun creating **FFTrees** and hope you like it too! 
For an accessible introduction to FFTs, we recommend reading our article in the journal _Judgment and Decision Making_ ([2017](`r url_JDM_doi`)), entitled _FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees_ (available in [html](`r url_JDM_html`) | [PDF](`r url_JDM_pdf`)\ ).  

**Citation** (in APA format): 

- Phillips, N. D., Neth, H., Woike, J. K. & Gaissmaier, W. (2017). 
FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees. 
_Judgment and Decision Making_, _12_ (4), 344–368. 
doi\ [10.1017/S1930297500006239](`r url_JDM_doi`) 

<!-- PDF available at `r url_JDM_pdf` -->

When using **FFTrees** in your own work, please cite our article and spread the word, so that we can continue developing the package. 

**BibTeX Citation**: 

```{r bibtex-citation, eval = FALSE, highlight = FALSE}
@article{FFTrees,
 title = {FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees},
 author = {Phillips, Nathaniel D and Neth, Hansjörg and Woike, Jan K and Gaissmaier, Wolfgang},
 year = 2017, 
 journal = {Judgment and Decision Making},
 volume = 12,
 number = 4,
 pages = {344--368},
 url = {https://journal.sjdm.org/17/17217/jdm17217.pdf},
 doi = {10.1017/S1930297500006239}
}
```


## Contact

- The latest release of **FFTrees** is available at [`r url_pkg_CRAN`](`r url_pkg_CRAN`). 

- The latest developer version is available at [`r url_pkg_GitHub`](`r url_pkg_GitHub`). 

- For comments, tips, and bug reports, please post at [`r url_pkg_issues`](`r url_pkg_issues`) or contact Nathaniel at `r email_contact` or [`r url_contact`](`r url_contact`). 

<!-- Logo: -->

```{r logo, echo = FALSE, fig.align = "center", out.width="40%"}
knitr::include_graphics("../inst/FFTrees_Logo.jpg")
```

<!-- Automatic references: -->

## Bibliography

<!-- eof. -->