---
title: "Modifying existing pipelines"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 4
description: >
  Shows how to insert, replace, and remove steps in a pipeline.
vignette: >
  %\VignetteIndexEntry{Modifying existing pipelines}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r knitr-setup, include = FALSE}
require(pipeflow)

knitr::opts_chunk$set(
  comment = "#",
  prompt = FALSE,
  tidy = FALSE,
  cache = FALSE,
  collapse = TRUE
)

old <- options(width = 100L)
library(ggplot2)
```

### Existing pipeline

```{r define-pipeline, include = FALSE, echo = FALSE}
pip <- Pipeline$new("my-pipeline", data = airquality)
pip$add(
    "data_prep",
    function(data = ~data) {
        replace(data, "Temp.Celsius", (data[, "Temp"] - 32) * 5/9)
    }
)
pip$add(
    "model_fit",
    function(
        data = ~`data_prep`,
        xVar = "Temp.Celsius"
    ) {
        lm(paste("Ozone ~", xVar), data = data)
    }
)
pip$add(
    "model_plot",
    function(
        model = ~`model_fit`,
        data = ~`data_prep`,
        xVar = "Temp.Celsius",
        title = "Linear model fit"
    ) {
        coeffs <- coefficients(model)
        ggplot(data) +
            geom_point(aes(.data[[xVar]], .data[["Ozone"]])) +
            geom_abline(intercept = coeffs[1], slope = coeffs[2]) +
            labs(title = title)
    }
)
pip$set_params(list(xVar = "Solar.R"))
pip$set_params(list(title = "Some new title"))
pip$set_data(airquality[1:10, ])
pip$run()
```

Let's start where we left off in the
[Get started with pipeflow](v01-get-started.html) vignette, that is, we have
the following pipeline

```{r show-pipeline}
pip
```

with the following set data

```{r show-data}
pip$get_data() |> head(3)
```


### Insert new step

Let's say we want to insert a new step after the `data_prep` step
that standardizes the y-variable.

```{r insert-step}
pip$insert_after(
    afterStep = "data_prep",
    step = "standardize",
    function(
        data = ~`data_prep`,
        yVar = "Ozone"
    ) {
        data[, yVar] <- scale(data[, yVar])
        data
    }
)
```


```{r}
pip
```

```{r, eval = getOption("pipeflow.visNetwork", default = FALSE)}
library(visNetwork)
do.call(visNetwork, args = pip$get_graph()) |>
    visHierarchicalLayout(direction = "LR", sortMethod = "directed")
```

```{r, echo = FALSE, eval = getOption("pipeflow.visNetwork", default = FALSE)}
library(visNetwork)
do.call(visNetwork, args = c(pip$get_graph(), list(height = 300))) |>
    visHierarchicalLayout(direction = "LR", sortMethod = "directed")
```

As we can see, the `standardize` step is now part of the pipeline, but
so far it is not used by any other step.

### Replace existing steps

Let's revisit the function definition of the `model_fit` step

```{r}
pip$get_step("model_fit")[["fun"]]
```

To use the standardized data, we need to change the data dependency
such that it refers to the `standardize` step. Also instead of
a fixed y-variable in the model, we want to pass it as a paramter.

```{r replace-model-fit-step}
pip$replace_step(
    "model_fit",
    function(
        data = ~standardize,        # <- changed data reference
        xVar = "Temp.Celsius",
        yVar = "Ozone"              # <- new y-variable
    ) {
        lm(paste(yVar, "~", xVar), data = data)
    }
)
```


The `model_plot` step needs to be updated in a similar way.

```{r replace-model-plot-step}
pip$replace_step(
    "model_plot",
    function(
        model = ~model_fit,
        data = ~standardize,         # <- changed data reference
        xVar = "Temp.Celsius",
        yVar = "Ozone",              # <- new y-variable
        title = "Linear model fit"
    ) {
        coeffs <- coefficients(model)
        ggplot(data) +
            geom_point(aes(.data[[xVar]], .data[[yVar]])) +
            geom_abline(intercept = coeffs[1], slope = coeffs[2]) +
            labs(title = title)
    }
)
```

The updated pipeline now looks as follows.

```{r}
pip
```

```{r, echo = FALSE, eval = getOption("pipeflow.visNetwork", default = FALSE)}
library(visNetwork)
do.call(visNetwork, args = c(pip$get_graph(), list(height = 100))) |>
    visHierarchicalLayout(direction = "LR")
```

We see that the `model_fit` and `model_plot` steps now use the standardized data.
Let's re-run the pipeline and inspect the output.

```{r}
pip$set_params(list(xVar = "Solar.R", yVar = "Wind"))
pip$run()
```

```{r}
pip$get_out("model_fit") |> coefficients()
```

```{r, fig.alt = "model-plot"}
pip$get_out("model_plot")
```


### Removing steps

Let's see the pipeline again.
```{r}
pip
```

When you are trying to remove a step, `pipeflow` by default checks if
the step is used by any other step, and raises an error if removing the
step would violate the integrity of the pipeline.

```{r try-remove-step}
try(pip$remove_step("standardize"))
```

To enforce removing a step together with all its downstream
dependencies, you can use the `recursive` argument.

```{r remove-steps-recursively}
pip$remove_step("standardize", recursive = TRUE)
```

```{r}
pip
```

Naturally, the last step never has any downstream dependencies, so it
can be removed without any issues. There is another way to just remove
the last step.

```{r}
pip$pop_step()
```

```{r}
pip
```

Replacing steps in a pipeline as shown in this vignette will allow to re-use existing
pipelines and adapt them programmatically to new requirements.
Another way of re-using pipelines is to combine them, which is shown in the
[Combining pipelines](v03-combine-pipelines.html) vignette.

```{r, include = FALSE}
options(old)
```