--- title: "Using quasiquotation to add variable and value labels" author: "Daniel Lüdecke" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using quasiquotation to add variable and value labels} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r echo = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") if (!requireNamespace("sjmisc", quietly = TRUE) || !requireNamespace("rlang", quietly = TRUE)) { knitr::opts_chunk$set(eval = FALSE) } ``` Labelling data is typically a task for end-users and is applied in own scripts or functions rather than in packages. However, sometimes it can be useful for both end-users and package developers to have a flexible way to add variable and value labels to their data. In such cases, [quasiquotation](https://adv-r.hadley.nz/quasiquotation.html) is helpful. This vignette demonstrate how to use quasiquotation in _sjlabelled_ to label your data. ## Adding value labels to variables using quasiquotation Usually, `set_labels()` can be used to add value labels to variables. The syntax of this function is easy to use, and `set_labels()` allows to add value labels to multiple variables at once, if these variables share the same value labels. In the following examples, we will use the `frq()` function, that shows an extra **label**-column containing _value labels_, if the data is labelled. If the data has _no_ value labels, this column is not shown in the output. ```{r message=FALSE, warning=FALSE} library(sjlabelled) library(sjmisc) # for frq()-function library(rlang) # unlabelled data dummies <- data.frame( dummy1 = sample(1:3, 40, replace = TRUE), dummy2 = sample(1:3, 40, replace = TRUE), dummy3 = sample(1:3, 40, replace = TRUE) ) # set labels for all variables in the data frame test <- set_labels(dummies, labels = c("low", "mid", "hi")) attr(test$dummy1, "labels") frq(test, dummy1) # and set same value labels for two of three variables test <- set_labels( dummies, dummy1, dummy2, labels = c("low", "mid", "hi") ) frq(test) ``` `val_labels()` does the same job as `set_labels()`, but in a different way. While `set_labels()` requires variables to be specified in the `...`-argument, and labels in the `labels`-argument, `val_labels()` requires both to be specified in the `...`. `val_labels()` requires _named_ vectors as argument, with the _left-hand side_ being the name of the variable that should be labelled, and the _right-hand side_ containing the labels for the values. ```{r message=FALSE, warning=FALSE} test <- val_labels(dummies, dummy1 = c("low", "mid", "hi")) attr(test$dummy1, "labels") # remaining variables are not labelled frq(test) ``` Unlike `set_labels()`, `val_labels()` allows the user to add _different_ value labels to different variables in one function call. Another advantage, or difference, of `val_labels()` is it's flexibility in defining variable names and value labels by using quasiquotation. ### Add labels that are stored in a vector To use quasiquotation, we need the **rlang** package to be installed and loaded. Now we can have labels in a character vector, and use `!!` to unquote this vector. ```{r message=FALSE, warning=FALSE} labels <- c("low_quote", "mid_quote", "hi_quote") test <- val_labels(dummies, dummy1 = !! labels) attr(test$dummy1, "labels") ``` ### Define variable names that are stored in a vector The same can be done with the names of _variables_ that should get new value labels. We then need `!!` to unquote the variable name and `:=` as assignment. ```{r message=FALSE, warning=FALSE} variable <- "dummy2" test <- val_labels(dummies, !! variable := c("lo_var", "mid_var", "high_var")) # no value labels attr(test$dummy1, "labels") # value labels attr(test$dummy2, "labels") ``` ### Both variable names and value labels are stored in a vector Finally, we can combine the above approaches to be flexible regarding both variable names and value labels. ```{r message=FALSE, warning=FALSE} variable <- "dummy3" labels <- c("low", "mid", "hi") test <- val_labels(dummies, !! variable := !! labels) attr(test$dummy3, "labels") ``` ## Adding variable labels using quasiquotation `set_label()` is the equivalent to `set_labels()` to add variable labels to a variable. The equivalent to `val_labels()` is `var_labels()`, which works in the same way as `val_labels()`. In case of _variable_ labels, a `label`-attribute is added to a vector or factor (instead of a `labels`-attribute, which is used for _value_ labels). The following examples show how to use `var_labels()` to add variable labels to the data. We demonstrate this function without further explanation, because it is actually very similar to `val_labels()`. ```{r message=FALSE, warning=FALSE} dummy <- data.frame( a = sample(1:4, 10, replace = TRUE), b = sample(1:4, 10, replace = TRUE), c = sample(1:4, 10, replace = TRUE) ) # simple usage test <- var_labels(dummy, a = "first variable", c = "third variable") attr(test$a, "label") attr(test$b, "label") attr(test$c, "label") # quasiquotation for labels v1 <- "First variable" v2 <- "Second variable" test <- var_labels(dummy, a = !! v1, b = !! v2) attr(test$a, "label") attr(test$b, "label") attr(test$c, "label") # quasiquotation for variable names x1 <- "a" x2 <- "c" test <- var_labels(dummy, !! x1 := "First", !! x2 := "Second") attr(test$a, "label") attr(test$b, "label") attr(test$c, "label") # quasiquotation for both variable names and labels test <- var_labels(dummy, !! x1 := !! v1, !! x2 := !! v2) attr(test$a, "label") attr(test$b, "label") attr(test$c, "label") ``` ## Conclusion As we have demonstrated, `var_labels()` and `val_labels()` are one of the most flexible and easy-to-use ways to add value and variable labels to our data. Another advantage is the consistent design of all functions in **sjlabelled**, which allows seamless integration into pipe-workflows.