--- title: "Working with Labelled Data" author: "Daniel Lüdecke" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Working with Labelled Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r echo = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` This vignette shows a small example how functions to work with labelled data can be implemented in a typical data visualization workflow. # Labelled Data In software like SPSS, it is common to have value and variable labels as variable attributes. Variable values, even if categorical, are mostly numeric. In R, however, you may use labels as values directly: ```{r} factor(c("low", "high", "mid", "high", "low")) ``` Reading SPSS-data with **haven** or **sjlabelled** keeps the numeric values for variables and adds the value and variable labels as attributes. See following example from the sample-dataset efc, which is part of the **sjlabelled**-package: ```{r} library(sjlabelled) data(efc) str(efc$e42dep) ``` While all plotting and table functions of the [sjPlot-package](https://cran.r-project.org/package=sjPlot) make use of these attributes, many packages and/or functions do not consider these attributes, e.g. R base graphics: ```{r warning=FALSE, fig.height=6, fig.width=7} library(sjlabelled) data(efc) barplot( table(efc$e42dep, efc$e16sex), beside = TRUE, legend.text = TRUE ) ``` As you can see in the above figure, the plot has neither axis nor legend labels. # Adding value labels as factor values `as_label()` is a sjlabelled-function that converts a numeric variable into a factor and sets attribute-value-labels as factor levels. When using factors with valued levels, the bar plot will be labelled. ```{r warning=FALSE, fig.height=6, fig.width=7} barplot( table(sjlabelled::as_label(efc$e42dep), sjlabelled::as_label(efc$e16sex)), beside = TRUE, legend.text = TRUE ) ``` # Getting and setting value and variable labels There are four functions that let you easily set or get value and variable labels of either a single vector or a complete data frame: * `get_label()` to get variable labels * `get_labels()` to get value labels * `set_label()` to set variable labels (add them as vector attribute) * `set_labels()` to set value labels (add them as vector attribute) With this function, you can easily add titles to plots dynamically, i.e. depending on the variable that is plotted. ```{r warning=FALSE, fig.height=6, fig.width=7} barplot( table(sjlabelled::as_label(efc$e42dep), sjlabelled::as_label(efc$e16sex)), beside = TRUE, legend.text = TRUE, main = get_label(efc$e42dep) ) ``` # Restore labels from subsetted data The base `subset()` function drops label attributes (or vector attributes in general) when subsetting data. In the sjlabelled-package, there are handy functions to deal with this problem: `copy_labels()` and `remove_labels()`. `copy_labels()` adds back labels to a subsetted data frame based on the original data frame. And `remove_labels()` removes all label attributes. ## Losing labels during subset ```{r} efc.sub <- subset(efc, subset = e16sex == 1, select = c(4:8)) str(efc.sub) ``` ## Add back labels ```{r, message=FALSE} efc.sub <- copy_labels(efc.sub, efc) str(efc.sub) ``` # Conclusion When working with labelled data, especially when working with data sets imported from other software packages, it comes very handy to make use of the label attributes. The **sjlabelled**-package supports this feature and offers useful functions for these tasks.