The heartbeatr package is designed to streamline the processing of data generated with ElectricBlue’s PULSE systems (https://electricblue.eu/pulse).

This vignette explains the structure of the input data and demonstrates the heartbeatr workflow based on a simple and compliant dataset. It also provides important recommendations on how to effectively process large datasets.

Input data

heartbeatr is ONLY designed to handle data generated by PULSE systems and from a single experiment. heartbeatr is unlikely to be able to process files generated with any other device. The automated processing will also likely fail if the input data doesn’t fully conform to certain specifications, namely:

the formatting of each input file (see PULSE files)
the consistency of the parameters used to generate all input files (see one experiment).

PULSE files

The term “PULSE files” refers to the .CSV files generated by PULSE systems.
Such files follow a strict structure composed by a header section and a data section.

Please note that simply opening a PULSE file using Microsoft's Excel and closing again can introduce modifications to the files (the date_time format will automatically be converted to the same format used by one's machine), potentially making it unreadable by heartbeatr.
If you want to check the data stored in a PULSE file, either use a basic text editor (such as BBEdit) or create a copy of the file before opening it.

header section

The header follows a two-column field,value structure and includes all the PULSE system parameter options used when the file was generated. The number of fields may change in future PULSE system versions, but will always include:

device
rate_Hz
utc
time_zone_h
daylight_saving_time
local_time

# pulse example allows quick access to test files
fn   <- pulse_example()[1] 
head <- read.csv(fn, nrows = 50, col.names = c("field", "value"))
skip <- max(grep("----------------", head$field))
head[1:skip,]
#>                             field                                    value
#> 1             www.electricblue.eu                                 Portugal
#> 2  ------------------------------                     --------------------
#> 3                   Pulse version                                     V2.3
#> 4  ------------------------------                     --------------------
#> 5                          device                                    Pulse
#> 6                         rate_Hz                                       25
#> 7                             utc 0 (logged data is always stored in UTC0)
#> 8                     time_zone_h                                     0.00
#> 9            daylight_saving_time                                    FALSE
#> 10                     local_time                         2024-10-01 10:51
#> 11 ------------------------------                     --------------------

A few notes about the header fields:

utc will always be equal to 0 because PULSE systems always record data using UTC0, without exception; this is crucial as it ensures full protection from potential mistakes originating when daylight saving time changes or when aggregating data created by different PULSE systems that could be running different values for the parameters time_zone_h and daylight_saving_time.
the local_time field facilitates the quick conversion of the logged timestamps (always in UTC0) to local time; however, note that local_time is calculated based on the values provided by the user for time_zone_h and daylight_saving_time, and therefore will be incorrect if those parameters aren’t adjusted properly.

data section

The data section is composed by a line with all logging channel ids, followed by data lines with a timestamp and the values read by the PULSE system for each of the channels. Note that there’s always a reading for each channel, even if not all were in fact used during data collection.

read.csv(fn, skip = skip + 1)

#>                      time c01  c02  c03  c04  c05 c06  c07  c08  c09  c10
#> 1 2024-10-01 10:51:39.017 655 1657 2531 2446 2284   0 1519 1999 1548 1353
#> 2 2024-10-01 10:51:39.184 283 1797    0 2294 2029   0 1147  917  366  917
#> 3 2024-10-01 10:51:39.402 626 1789    0 2476 1769   0 1332  663  803 1383
#> 4 2024-10-01 10:51:39.443 830 1776    0 2514 1787   0 1421  939 1299 1553
#> 5 2024-10-01 10:51:39.483 939 1765    0 2183 1701   0 1486 1109 1727 1643

One experiment

Crucially, in order to process the dataset, heartbeatr must join data from all the files it receives as input. Therefore, all files must have been generated suing the same key PULSE system parameter options, to ensure full consistency across files. This usually means that datasets should be processed one experiment at a time, even if data are to be merged later on. Thus, within the context of heartbeatr, one experiment concerns data collected:

by a single PULSE system
using the same parameter options
- firmware version
- device name
- sampling frequency
- recording time
- sleep time
with the same channel ids
without time overlaps

In summary, even if there are four distinct treatments in a scientific experiment and different PULSE system is being used simultaneously to record data from each treatment, then data for each of the four treatments should initially be processed by heartbeatr independently, and only later be merged.

The `heartbeatr` workflow

If the dataset follows all the rules and structuring mentioned above, it can be processed as follows:

import data into R with `pulse_read()`

pulse_read() takes paths to PULSE files, reads all data and binds into a single, continuous stream.

pulse_data <- pulse_read(
  paths = pulse_example(), 
  msg = FALSE
  )

For the purpose of this vignette, we will restrict the number of channels being analysed (this makes the code run faster and the visualizations less crowded).

channels <- paste0("c", formatC(1:10, width = 2, flag = "0"))
discard  <- c("c01", "c02", "c05", "c07", "c08", "c09", "c10")
pulse_data$data <- pulse_data$data[, c("time", setdiff(channels, discard))]

pulse_data$data
#> # A tibble: 20,094 × 4
#>    time                  c03   c04   c06
#>    <dttm>              <dbl> <dbl> <dbl>
#>  1 2024-10-01 10:51:39  2531  2446     0
#>  2 2024-10-01 10:51:39     0  2294     0
#>  3 2024-10-01 10:51:39     0  2476     0
#>  4 2024-10-01 10:51:39     0  2514     0
#>  5 2024-10-01 10:51:39     0  2183     0
#>  6 2024-10-01 10:51:39     0  1591     0
#>  7 2024-10-01 10:51:39     0   894     0
#>  8 2024-10-01 10:51:39     0   332    46
#>  9 2024-10-01 10:51:39     0     0   519
#> 10 2024-10-01 10:51:39     0     0  1176
#> # ℹ 20,084 more rows

Note that over the short span of this example dataset (just 13.4 minutes), the resulting dataframe already comprises 20094 rows of raw data.

split continuous data over time windows with `pulse_split()`

pulse_split() takes the output of pulse_read() and splits it into chunks.

pulse_data_split <- pulse_split(
  pulse_data = pulse_data, 
  window_width_secs = 30, 
  window_shift_secs = 60, 
  msg = FALSE
  )

pulse_data_split
#> # A tibble: 13 × 3
#>        i smoothed data              
#>    <int> <lgl>    <list>            
#>  1     1 FALSE    <tibble [758 × 4]>
#>  2     2 FALSE    <tibble [754 × 4]>
#>  3     3 FALSE    <tibble [752 × 4]>
#>  4     4 FALSE    <tibble [749 × 4]>
#>  5     5 FALSE    <tibble [749 × 4]>
#>  6     6 FALSE    <tibble [740 × 4]>
#>  7     7 FALSE    <tibble [749 × 4]>
#>  8     8 FALSE    <tibble [756 × 4]>
#>  9     9 FALSE    <tibble [749 × 4]>
#> 10    10 FALSE    <tibble [754 × 4]>
#> 11    11 FALSE    <tibble [755 × 4]>
#> 12    12 FALSE    <tibble [755 × 4]>
#> 13    13 FALSE    <tibble [755 × 4]>

In this case, the combination of window_width_secs and window_shift_secs chosen resulted in 13 time windows with raw data.

pulse_data_split$data[[1]]
#> # A tibble: 758 × 4
#>    time                  c03   c04   c06
#>    <dttm>              <dbl> <dbl> <dbl>
#>  1 2024-10-01 10:52:00  4095  2651  2958
#>  2 2024-10-01 10:52:00  4095  2685  3532
#>  3 2024-10-01 10:52:00  4095  2842  4095
#>  4 2024-10-01 10:52:00  4095  2921  4095
#>  5 2024-10-01 10:52:00  4095  2879  4095
#>  6 2024-10-01 10:52:00  4095  2717  4095
#>  7 2024-10-01 10:52:00  4095  2669  4095
#>  8 2024-10-01 10:52:00  4095  2716  4095
#>  9 2024-10-01 10:52:00  4095  2795  4095
#> 10 2024-10-01 10:52:00  4095  2745  4095
#> # ℹ 748 more rows

interpolate and smooth data with `pulse_optimize()`

pulse_optimize() takes the output from pulse_split() and refines the signals in each time window in order to increase the likelihood that heart rates can be determined accurately. This typically involves applying linear interpolation (to ensure that each wave peak is defined by a sufficient number of points) and smoothing (to remove low-frequency noise) to the data.

pulse_data_optimized <- pulse_optimize(
  pulse_data_split = pulse_data_split, 
  interpolation_freq = 40,
  bandwidth = 0.75,
  raw_v_smoothed = FALSE,
  multi = TRUE
  )

before: ($smoothed = FALSE and variable number of rows in each window)

pulse_data_split
#> # A tibble: 13 × 3
#>        i smoothed data              
#>    <int> <lgl>    <list>            
#>  1     1 FALSE    <tibble [758 × 4]>
#>  2     2 FALSE    <tibble [754 × 4]>
#>  3     3 FALSE    <tibble [752 × 4]>
#>  4     4 FALSE    <tibble [749 × 4]>
#>  5     5 FALSE    <tibble [749 × 4]>
#>  6     6 FALSE    <tibble [740 × 4]>
#>  7     7 FALSE    <tibble [749 × 4]>
#>  8     8 FALSE    <tibble [756 × 4]>
#>  9     9 FALSE    <tibble [749 × 4]>
#> 10    10 FALSE    <tibble [754 × 4]>
#> 11    11 FALSE    <tibble [755 × 4]>
#> 12    12 FALSE    <tibble [755 × 4]>
#> 13    13 FALSE    <tibble [755 × 4]>

after: ($smoothed = TRUE and same number of rows in each window)

pulse_data_optimized
#> # A tibble: 13 × 3
#>        i smoothed data                
#>    <int> <lgl>    <list>              
#>  1     1 TRUE     <tibble [1,200 × 4]>
#>  2     2 TRUE     <tibble [1,200 × 4]>
#>  3     3 TRUE     <tibble [1,200 × 4]>
#>  4     4 TRUE     <tibble [1,200 × 4]>
#>  5     5 TRUE     <tibble [1,200 × 4]>
#>  6     6 TRUE     <tibble [1,200 × 4]>
#>  7     7 TRUE     <tibble [1,200 × 4]>
#>  8     8 TRUE     <tibble [1,200 × 4]>
#>  9     9 TRUE     <tibble [1,200 × 4]>
#> 10    10 TRUE     <tibble [1,200 × 4]>
#> 11    11 TRUE     <tibble [1,200 × 4]>
#> 12    12 TRUE     <tibble [1,200 × 4]>
#> 13    13 TRUE     <tibble [1,200 × 4]>

compute heart rates with `pulse_heart()`

pulse_heart() takes the output of pulse_optimize() and estimates a single value of heart rate frequency for each channel/time window combination.

heart_rates <- pulse_heart(
  pulse_data_split = pulse_data_optimized,
  msg = FALSE
  )

heart_rates
#> # A tibble: 39 × 9
#>        i smoothed id    time                data        hz     n    sd    ci
#>    <int> <lgl>    <chr> <dttm>              <list>   <dbl> <int> <dbl> <dbl>
#>  1     1 TRUE     c03   2024-10-01 10:52:14 <tibble> 0.313     9 0.586 1.15 
#>  2     1 TRUE     c04   2024-10-01 10:52:14 <tibble> 0.39     11 0.303 0.595
#>  3     1 TRUE     c06   2024-10-01 10:52:14 <tibble> 0.405    12 0.045 0.089
#>  4     2 TRUE     c03   2024-10-01 10:53:15 <tibble> 0.283     8 1.04  2.04 
#>  5     2 TRUE     c04   2024-10-01 10:53:15 <tibble> 0.379    11 0.104 0.204
#>  6     2 TRUE     c06   2024-10-01 10:53:15 <tibble> 0.423    12 0.089 0.174
#>  7     3 TRUE     c03   2024-10-01 10:54:14 <tibble> 0.263     8 0.146 0.287
#>  8     3 TRUE     c04   2024-10-01 10:54:14 <tibble> 0.385    11 0.171 0.335
#>  9     3 TRUE     c06   2024-10-01 10:54:14 <tibble> 0.426    12 0.131 0.256
#> 10     4 TRUE     c03   2024-10-01 10:55:15 <tibble> 0.258     7 0.902 1.77 
#> # ℹ 29 more rows

$hz holds the estimates of heart rate
$n is the number of wave peaks identified and used to compute hz
$sd is the standard deviation of the lengths of all intervals between consecutive wave peaks

detect heart rate doublings with `pulse_doublecheck()`

Under certain conditions, the algorithm used in pulse_heart() may identify heartbeat wave signals that feature two peaks per heartbeat as two individual heartbeats. This has the adverse effect of leading to estimates of heart rates that are twice as high as the real value. To minimize this, the output of pulse_heart() can be run through pulse_doublecheck(), which will identify and correct the majority of those instances.

heart_rates <- pulse_doublecheck(
  heart_rates = heart_rates
  )

heart_rates
#> # A tibble: 39 × 11
#>        i smoothed id    time                data        hz     n    sd    ci
#>    <int> <lgl>    <chr> <dttm>              <list>   <dbl> <int> <dbl> <dbl>
#>  1     1 TRUE     c03   2024-10-01 10:52:14 <tibble> 0.313     9 0.586 1.15 
#>  2     1 TRUE     c04   2024-10-01 10:52:14 <tibble> 0.39     11 0.303 0.595
#>  3     1 TRUE     c06   2024-10-01 10:52:14 <tibble> 0.405    12 0.045 0.089
#>  4     2 TRUE     c03   2024-10-01 10:53:15 <tibble> 0.136     4 1.8   3.53 
#>  5     2 TRUE     c04   2024-10-01 10:53:15 <tibble> 0.379    11 0.104 0.204
#>  6     2 TRUE     c06   2024-10-01 10:53:15 <tibble> 0.423    12 0.089 0.174
#>  7     3 TRUE     c03   2024-10-01 10:54:14 <tibble> 0.13      4 0.202 0.396
#>  8     3 TRUE     c04   2024-10-01 10:54:14 <tibble> 0.385    11 0.171 0.335
#>  9     3 TRUE     c06   2024-10-01 10:54:14 <tibble> 0.426    12 0.131 0.256
#> 10     4 TRUE     c03   2024-10-01 10:55:15 <tibble> 0.258     7 0.902 1.77 
#> # ℹ 29 more rows
#> # ℹ 2 more variables: d_r <dbl>, d_f <lgl>

Two columns (d_r and d_f) were added to heart_rates, providing the necessary information to screen and handle heart rate doublings.

#> ℹ loading data for analysis
#> ✔ loading data for analysis [417ms]
#> 
#> ℹ computing heart rates
#> ✔ computing heart rates [1.3s]
#> 
#> ℹ finalizing
#> ✔ finalizing [47ms]
#> 
#> → completed: 2025-09-14 10:30:26
#> → [elapsed: 0.03 mins]
#> ℹ loading data for analysis
#> ✔ loading data for analysis [416ms]
#> 
#> ℹ computing heart rates
#> ✔ computing heart rates [1.2s]
#> 
#> ℹ finalizing
#> ✔ finalizing [48ms]
#> 
#> → completed: 2025-09-14 10:30:28
#> → [elapsed: 0.03 mins]

Note that by default pulse_doublecheck() runs with correct = TRUE. When that is the case, values for hz in the output of pulse_doublecheck() have already been halved whenever heart rate doublings have been identified.

select which heart rate estimates to keep with `pulse_choose_keep()`

The final step involves matching key quality indicators against user-provided thresholds to assess whether or not to keep each heart rate estimate.
This is achieved by running the output of pulse_doublecheck() through pulse_choose_keep(), resulting in the addition of the column $keep, which can be used to filter out poor estimates.

heart_rates <- pulse_choose_keep(
  heart_rates = heart_rates
  )

heart_rates
#> # A tibble: 39 × 12
#>        i smoothed id    time                data        hz     n    sd    ci
#>    <int> <lgl>    <chr> <dttm>              <list>   <dbl> <int> <dbl> <dbl>
#>  1     1 TRUE     c03   2024-10-01 10:52:14 <tibble> 0.313     9 0.586 1.15 
#>  2     1 TRUE     c04   2024-10-01 10:52:14 <tibble> 0.39     11 0.303 0.595
#>  3     1 TRUE     c06   2024-10-01 10:52:14 <tibble> 0.405    12 0.045 0.089
#>  4     2 TRUE     c03   2024-10-01 10:53:15 <tibble> 0.136     4 1.8   3.53 
#>  5     2 TRUE     c04   2024-10-01 10:53:15 <tibble> 0.379    11 0.104 0.204
#>  6     2 TRUE     c06   2024-10-01 10:53:15 <tibble> 0.423    12 0.089 0.174
#>  7     3 TRUE     c03   2024-10-01 10:54:14 <tibble> 0.13      4 0.202 0.396
#>  8     3 TRUE     c04   2024-10-01 10:54:14 <tibble> 0.385    11 0.171 0.335
#>  9     3 TRUE     c06   2024-10-01 10:54:14 <tibble> 0.426    12 0.131 0.256
#> 10     4 TRUE     c03   2024-10-01 10:55:15 <tibble> 0.258     7 0.902 1.77 
#> # ℹ 29 more rows
#> # ℹ 3 more variables: d_r <dbl>, d_f <lgl>, keep <lgl>

Note that 7 entries were classified as not meeting the selection criteria.
If we filter those out, we’re left with 32 entries that can be used for subsequent analyses.

dplyr::filter(heart_rates, keep)
#> # A tibble: 32 × 12
#>        i smoothed id    time                data        hz     n    sd    ci
#>    <int> <lgl>    <chr> <dttm>              <list>   <dbl> <int> <dbl> <dbl>
#>  1     1 TRUE     c03   2024-10-01 10:52:14 <tibble> 0.313     9 0.586 1.15 
#>  2     1 TRUE     c04   2024-10-01 10:52:14 <tibble> 0.39     11 0.303 0.595
#>  3     1 TRUE     c06   2024-10-01 10:52:14 <tibble> 0.405    12 0.045 0.089
#>  4     2 TRUE     c04   2024-10-01 10:53:15 <tibble> 0.379    11 0.104 0.204
#>  5     2 TRUE     c06   2024-10-01 10:53:15 <tibble> 0.423    12 0.089 0.174
#>  6     3 TRUE     c03   2024-10-01 10:54:14 <tibble> 0.13      4 0.202 0.396
#>  7     3 TRUE     c04   2024-10-01 10:54:14 <tibble> 0.385    11 0.171 0.335
#>  8     3 TRUE     c06   2024-10-01 10:54:14 <tibble> 0.426    12 0.131 0.256
#>  9     4 TRUE     c04   2024-10-01 10:55:15 <tibble> 0.338    10 0.131 0.257
#> 10     4 TRUE     c06   2024-10-01 10:55:15 <tibble> 0.422    13 0.082 0.16 
#> # ℹ 22 more rows
#> # ℹ 3 more variables: d_r <dbl>, d_f <lgl>, keep <lgl>

The simple version

Users can also execute the entire heartbeatr workflow with a single call to the wrapper function PULSE().

# the code below takes approximately 30 seconds   
#  to run on a machine with the following specs:
# - MacBook Air 2022
# - chip:   Apple M2
# - memory: 16 GB RAM
# - macOS:  Sequoia 15.3.1 
heart_rates_PULSE <- PULSE(
  paths = pulse_example(),
  window_width_secs = 30,
  window_shift_secs = 60,
  bandwidth         = 0.75,
  raw_v_smoothed    = FALSE,
  discard_channels  = discard
  )
#> ℹ loading data for analysis
#> ✔ loading data for analysis [415ms]
#> 
#> ℹ computing heart rates
#> ✔ computing heart rates [1.2s]
#> 
#> ℹ finalizing
#> ✔ finalizing [48ms]
#> 
#> → completed: 2025-09-14 10:30:30
#> → [elapsed: 0.03 mins]

heart_rates_PULSE
#> # A tibble: 39 × 12
#>        i smoothed id    time                data        hz     n    sd    ci
#>    <int> <lgl>    <chr> <dttm>              <list>   <dbl> <int> <dbl> <dbl>
#>  1     1 TRUE     c03   2024-10-01 10:52:14 <tibble> 0.313     9 0.586 1.15 
#>  2     1 TRUE     c04   2024-10-01 10:52:14 <tibble> 0.39     11 0.303 0.595
#>  3     1 TRUE     c06   2024-10-01 10:52:14 <tibble> 0.405    12 0.045 0.089
#>  4     2 TRUE     c03   2024-10-01 10:53:15 <tibble> 0.136     4 1.8   3.53 
#>  5     2 TRUE     c04   2024-10-01 10:53:15 <tibble> 0.379    11 0.104 0.204
#>  6     2 TRUE     c06   2024-10-01 10:53:15 <tibble> 0.423    12 0.089 0.174
#>  7     3 TRUE     c03   2024-10-01 10:54:14 <tibble> 0.13      4 0.202 0.396
#>  8     3 TRUE     c04   2024-10-01 10:54:14 <tibble> 0.385    11 0.171 0.335
#>  9     3 TRUE     c06   2024-10-01 10:54:14 <tibble> 0.426    12 0.131 0.256
#> 10     4 TRUE     c03   2024-10-01 10:55:15 <tibble> 0.258     7 0.902 1.77 
#> # ℹ 29 more rows
#> # ℹ 3 more variables: d_r <dbl>, d_f <lgl>, keep <lgl>

Note that the final output is exactly the same and that there are no performance advantages from using a single call to PULSE() compared to using the step-by-step workflow.

identical(heart_rates_PULSE, heart_rates)
#> [1] TRUE

Despite its convenience, there are still moments when the step-by-step may come handy:

when there’s a need to access the output of a particular function (e.g., when debugging an error)
during initial runs aimed at fine tuning the PULSE function parameters

Visualizing

The most basic visualization of the output of the heartbeatr workflow is a plot heart rate over time.
While not meant to provide the highest quality graphics, the heartbeatr package includes two main plotting functions to ensure that users can quickly inspect the results.

check all data using `pulse_plot()`

Inspecting the entire processed dataset can be done with a simple call.

pulse_plot(heart_rates)

It is usually important to filter out estimates that didn’t meet the quality thresholds.

pulse_plot(dplyr::filter(heart_rates, keep))
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in sqrt(sum.squares/one.delta): NaNs produced
#> Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf

Function parameters and general ggplot2 syntax can be used to further customize the output of pulse_plot().

pulse_plot(
  dplyr::filter(heart_rates, keep), 
  facets = FALSE, 
  bpm    = TRUE, 
  smooth = FALSE, 
  points = FALSE
  )

pulse_plot(filter(heart_rates, keep)) +
  ggplot2::geom_vline(ggplot2::aes(xintercept = mean(heart_rates$time))) +
  ggplot2::theme_bw() +
  ggplot2::ggtitle("example")
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in sqrt(sum.squares/one.delta): NaNs produced
#> Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf

It is also possible to focus on a single id.

pulse_plot(dplyr::filter(heart_rates, keep), ID = "c03")
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in sqrt(sum.squares/one.delta): NaNs produced
#> Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf

inspect raw data with `pulse_plot_raw()`

Often, it is crucial to check more closely the data underlying a particular heart rate estimate.

There are to ways to plot a specific channel + time window:

use a value pointing to an index $i
provide a timestamp

If we want to check channel “c03” at around 2024-10-01 10:58…

dplyr::filter(heart_rates, id == "c03")
#> # A tibble: 13 × 12
#>        i smoothed id    time                data        hz     n    sd    ci
#>    <int> <lgl>    <chr> <dttm>              <list>   <dbl> <int> <dbl> <dbl>
#>  1     1 TRUE     c03   2024-10-01 10:52:14 <tibble> 0.313     9 0.586 1.15 
#>  2     2 TRUE     c03   2024-10-01 10:53:15 <tibble> 0.136     4 1.8   3.53 
#>  3     3 TRUE     c03   2024-10-01 10:54:14 <tibble> 0.13      4 0.202 0.396
#>  4     4 TRUE     c03   2024-10-01 10:55:15 <tibble> 0.258     7 0.902 1.77 
#>  5     5 TRUE     c03   2024-10-01 10:56:15 <tibble> 0.279     8 0.541 1.06 
#>  6     6 TRUE     c03   2024-10-01 10:57:15 <tibble> 0.232     6 1.18  2.32 
#>  7     7 TRUE     c03   2024-10-01 10:58:15 <tibble> 0.231     7 0.143 0.28 
#>  8     8 TRUE     c03   2024-10-01 10:59:15 <tibble> 0.202     5 2.05  4.02 
#>  9     9 TRUE     c03   2024-10-01 11:00:15 <tibble> 0.186     5 0.035 0.069
#> 10    10 TRUE     c03   2024-10-01 11:01:14 <tibble> 0.21      6 1.83  3.59 
#> 11    11 TRUE     c03   2024-10-01 11:02:14 <tibble> 0.232     6 1.97  3.86 
#> 12    12 TRUE     c03   2024-10-01 11:03:15 <tibble> 0.199     6 0.889 1.74 
#> 13    13 TRUE     c03   2024-10-01 11:04:14 <tibble> 0.182     4 0.104 0.204
#> # ℹ 3 more variables: d_r <dbl>, d_f <lgl>, keep <lgl>

… we can use target_i

pulse_plot_raw(heart_rates, ID = "c03", target_i = 7)

… or we can use target_time.

pulse_plot_raw(heart_rates, ID = "c03", target_time = "2024-10-01 10:58")

Note that both approaches yield the same result. Also, when using target_time, there’s no need to provide a timestamp accurate to the second. Instead, pulse_plot_raw() will return the time window that is closest to the timestamp provided.

It is also possible to use the range parameter to check the raw data for a group of estimates.

pulse_plot_raw(heart_rates, ID = "c03", target_time = "2024-10-01 10:58", range = 2)

Optimizing parameters (`bandwidth`)

The heartbeatr is a generic workflow that is not optimized for any organism in particular. This makes it suitable to process datasets derived from distinct species with different basal heart rates and other circulatory system characteristics, all of which could negatively impact the performance of a highly specific algorithm. It also ensures that a wide range of experimental conditions can be accommodated.

However, this also means that users should exercise caution when selecting key function parameters, chiefly among them the smoothing parameter bandwidth, as that can have important implications on the performance of the data processing. Other parameters such as window_width_secs, window_shift_secs, min_data_points, and interpolation_freq are also important, however in their case, reading through each function’s manual should be sufficient to make good guessess.

Regarding the trickier bandwidth parameter, it establishes the radius used for the operation of the smoothing function (ksmooth, from the stats package). Higher values remove more noise from the signal, improving the ability of the algorithm to select “true” peaks. However, values too high will end up masking out real wave crests, eventually leading to a degradation of performance.

Because of inter- and intra-specific variability in heart rates, variability in the placement of sensors, and the co-variation of these factors with the rate of the heart being recorded (more stressful treatments lead to higher heart rates, where each heartbeat wave encompasses a narrower span of time, a situation that often amplifies limitations of the algorithm), it is not possible to provide users with definitive advice of how to set the bandwidth parameter.

Furthermore, the bandwidth of a Kernel smoother is a free parameter, meaning that it cannot, by definition, be predicted precisely or constrained a priori, and must always be estimated experimentally for each dataset.

Therefore, the recommended approach is the following:

Use the subset and subset_seed parameters of the PULSE function to process a subset of your data (so that the processing runs quicker) using multiple combinations of the parameters you want to test (in the example below, bandwidth is set to 0.2 and 1.5).
- It is advised to set subset_seed to a fixed numeric value (e.g., 1) so that the subsets generated always contain the same data points, which will facilitate the comparison of performances.
- It is also advised to set raw_v_smoothed to FALSE so that only the smoothed version of the raw data is used.

x_02 <- PULSE(
  paths = pulse_example(), bandwidth = 0.2,
  raw_v_smoothed = FALSE, discard_channels = paste0("c0", 1:9), 
  subset = 10, subset_seed = 1, subset_reindex = TRUE, show_progress = FALSE)
x_15 <- PULSE(
  paths = pulse_example(), bandwidth = 1.5,
  raw_v_smoothed = FALSE, discard_channels = paste0("c0", 1:9), 
  subset = 10, subset_seed = 1, subset_reindex = TRUE, show_progress = FALSE)

Inspect the performance of the analyses, paying special attention to how correctly heart beats have been identified.

# when bandwith = 0.2, too many peaks are identified
pulse_plot_raw(x_02, ID = "c10", target_i = 1)

# when bandwith = 0.5, only one peak is identified in each heartbeat, as it should
pulse_plot_raw(x_15, ID = "c10", target_i = 1)

Identify the combination of parameters that best suits your data
Run the analysis normally (subset set to 0) to process the entire dataset
Review the resulting estimates

Warning against HRV analyses

HRV (Heart Rate Variability) analyses focus on the variability of the intervals between consecutive heartbeats and are extremely important for assessing cardiac performance and heart conditions. There are numerous studies focusing on this metric, especially those addressing human health.

The authors of this package want to make clear that the SD provided by the heartbeatr workflow - in effect, a measure of heart rate variability - SHOULD NOT be used for HRV analyses on data collected from shelled invertebrates.

This recommendation is justified because, while photoplethysmographic data is often used for HRV analyses, that is typically the case when such data is collected in a controlled manner, such as from a person’s finger. In such cases, the quality of the raw signal tends to be sufficiently dependable, and therefore, analyses can go beyond the mere study of the heart rate frequency to also focus on variations of the timing of the intervals between consecutive heartbeats over short periods of time (a.k.a., HRV).

However, while we recognize the research value of such analyses, we cannot recommend a similar approach based on data collected with the PULSE system from shelled invertebrates. The reason behind this is that many factors often impact the quality of the signal captured, including the placement of the sensor on the shell, which may be “looking” at different components of the circulatory system, the impact of movements of the animal’s body unrelated to cardiac performance, and many other quality-degrading factors.

As a result, we believe that making assertions about an organism’s physiological state based on the variability of the timing between consecutive peaks is beyond the capabilities of the PULSE-heartbeatr workflow. That is why we only recommend the use of the SD of the intervals between peaks during each split window for assisting in determining the quality of the computed heart rate estimates and filtering out low-quality data points – a task that is far less consequential and that therefore can accept a greater deal of uncertainty.

On top of that, even the quantification of the accuracy of the timing of individual heartbeats isn’t straightforward, as it would necessitate a reference dataset for which the timing of the heartbeats was determinatively known – something that very rarely would be available to researchers, as it would imply the acquisition of a simultaneous ECG, which is invasive and beyond most researchers’ technical capabilities (and likely not feasible even with today’s technology, given that the implanted ECG electrodes would need to remain in place for the entire duration of the experiment).

Post-processing

Before moving onto the proper analysis of the results, it is often important to perform two final tweeks to the output of the heartbeatr workflow.

Normalizing

Even individuals from the same species have different basal heart rates. This means that it is often better to compare the rate of change of heart rate instead of the absolute values themselves. To do that, the heart rate estimates must first be normalized using a stable reference period, usually during acclimation.

heart_rates_norm <- pulse_normalize(
  dplyr::filter(heart_rates, keep), 
  t0 = "2024-10-01 10:52", 
  span_mins = 5
  )

heart_rates_norm
#> # A tibble: 32 × 13
#>        i smoothed id    time                data        hz     n    sd    ci
#>    <int> <lgl>    <chr> <dttm>              <list>   <dbl> <int> <dbl> <dbl>
#>  1     1 TRUE     c03   2024-10-01 10:52:14 <tibble> 0.313     9 0.586 1.15 
#>  2     1 TRUE     c04   2024-10-01 10:52:14 <tibble> 0.39     11 0.303 0.595
#>  3     1 TRUE     c06   2024-10-01 10:52:14 <tibble> 0.405    12 0.045 0.089
#>  4     2 TRUE     c04   2024-10-01 10:53:15 <tibble> 0.379    11 0.104 0.204
#>  5     2 TRUE     c06   2024-10-01 10:53:15 <tibble> 0.423    12 0.089 0.174
#>  6     3 TRUE     c03   2024-10-01 10:54:14 <tibble> 0.13      4 0.202 0.396
#>  7     3 TRUE     c04   2024-10-01 10:54:14 <tibble> 0.385    11 0.171 0.335
#>  8     3 TRUE     c06   2024-10-01 10:54:14 <tibble> 0.426    12 0.131 0.256
#>  9     4 TRUE     c04   2024-10-01 10:55:15 <tibble> 0.338    10 0.131 0.257
#> 10     4 TRUE     c06   2024-10-01 10:55:15 <tibble> 0.422    13 0.082 0.16 
#> # ℹ 22 more rows
#> # ℹ 4 more variables: d_r <dbl>, d_f <lgl>, keep <lgl>, hz_norm <dbl>

pulse_plot(heart_rates_norm, normalized = FALSE, facets = FALSE)
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in sqrt(sum.squares/one.delta): NaNs produced
#> Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf

pulse_plot(heart_rates_norm, normalized = TRUE,  facets = FALSE)
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in sqrt(sum.squares/one.delta): NaNs produced
#> Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf

Note how all channels begin with the same normalized heart rate (around 1) and the different trajectories become evident.

Summarising

Lastly, after filtering out the entries that didn’t meet the quality thresholds, we were left with several gaps in the final dataset. One way to handle this is by binning data over wider time windows.

heart_rates_sum <- pulse_summarise(
  heart_rates_norm, 
  FUN = mean, 
  span_mins = 5, 
  min_data_points = 0)

heart_rates_sum
#> # A tibble: 9 × 8
#>       i id    time                   hz hz_norm     n      sd      ci
#>   <dbl> <chr> <dttm>              <dbl>   <dbl> <int>   <dbl>   <dbl>
#> 1     1 c03   2024-10-01 10:50:00 0.222   0.920     2 0.129   0.254  
#> 2     1 c04   2024-10-01 10:50:00 0.385   1.04      3 0.00551 0.0108 
#> 3     1 c06   2024-10-01 10:50:00 0.418   0.991     3 0.0114  0.0223 
#> 4     2 c04   2024-10-01 10:55:00 0.331   0.898     5 0.0163  0.0320 
#> 5     2 c06   2024-10-01 10:55:00 0.426   1.01      5 0.00846 0.0166 
#> 6     3 c03   2024-10-01 10:55:00 0.255   1.06      2 0.0339  0.0665 
#> 7     4 c03   2024-10-01 11:00:00 0.184   0.765     2 0.00283 0.00554
#> 8     4 c04   2024-10-01 11:00:00 0.313   0.847     5 0.0245  0.0481 
#> 9     4 c06   2024-10-01 11:00:00 0.422   1.00      5 0.0164  0.0322

pulse_plot(heart_rates_sum, normalized = FALSE, facets = FALSE)
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : span too small.  fewer data values than degrees of freedom.
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : pseudoinverse used at 1.7278e+09
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : neighborhood radius 303
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : reciprocal condition number 0
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : There are other near singularities as well. 91809
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : span too small.  fewer
#> data values than degrees of freedom.
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
#> 1.7278e+09
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius 303
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
#> number 0
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : There are other near
#> singularities as well. 91809
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : span too small.  fewer data values than degrees of freedom.
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : pseudoinverse used at 1.7278e+09
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : neighborhood radius 303
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : reciprocal condition number 0
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : There are other near singularities as well. 91809
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : span too small.  fewer
#> data values than degrees of freedom.
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
#> 1.7278e+09
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius 303
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
#> number 0
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : There are other near
#> singularities as well. 91809
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : span too small.  fewer data values than degrees of freedom.
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : pseudoinverse used at 1.7278e+09
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : neighborhood radius 303
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : reciprocal condition number 0
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : There are other near singularities as well. 91809
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : span too small.  fewer
#> data values than degrees of freedom.
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
#> 1.7278e+09
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius 303
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
#> number 0
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : There are other near
#> singularities as well. 91809
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf

Now, there are estimates at regular intervals for all channels.

workflow

Input data

PULSE files

header section

data section

One experiment

The heartbeatr workflow

import data into R with pulse_read()

split continuous data over time windows with pulse_split()

interpolate and smooth data with pulse_optimize()

compute heart rates with pulse_heart()

detect heart rate doublings with pulse_doublecheck()

select which heart rate estimates to keep with pulse_choose_keep()