The heartbeatr
package is designed to streamline the
processing of data generated with ElectricBlue’s PULSE systems (https://electricblue.eu/pulse).
This vignette explains the structure of the input data and
demonstrates the heartbeatr
workflow based on a simple and
compliant dataset. It also provides important recommendations on how to
effectively process large datasets.
# library(tidyverse)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringr)
library(tibble)
library(ggplot2)
library(heartbeatr)
heartbeatr
is ONLY designed to handle data generated by
PULSE systems and from a single experiment. heartbeatr
is
unlikely to be able to process files generated with any other device.
The automated processing will also likely fail if the input data doesn’t
fully conform to certain specifications, namely:
The term “PULSE files” refers to the .CSV
files
generated by PULSE systems.
Such files follow a strict structure composed by a header section and a data section.
Please note that simply opening a PULSE file using Microsoft's Excel and closing again can introduce modifications to the files (the date_time format will automatically be converted to the same format used by one's machine), potentially making it unreadable by heartbeatr.
If you want to check the data stored in a PULSE file, either use a basic text editor (such as BBEdit) or create a copy of the file before opening it.
The header follows a two-column field,value
structure
and includes all the PULSE system parameter options used when the file
was generated. The number of fields may change in future PULSE system
versions, but will always include:
# pulse example allows quick access to test files
fn <- pulse_example()[1]
head <- read.csv(fn, nrows = 50, col.names = c("field", "value"))
skip <- max(grep("----------------", head$field))
head[1:skip,]
#> field value
#> 1 www.electricblue.eu Portugal
#> 2 ------------------------------ --------------------
#> 3 Pulse version V2.3
#> 4 ------------------------------ --------------------
#> 5 device Pulse
#> 6 rate_Hz 25
#> 7 utc 0 (logged data is always stored in UTC0)
#> 8 time_zone_h 0.00
#> 9 daylight_saving_time FALSE
#> 10 local_time 2024-10-01 10:51
#> 11 ------------------------------ --------------------
A few notes about the header fields:
utc
will always be equal to 0
because
PULSE systems always record data using UTC0, without exception; this is
crucial as it ensures full protection from potential mistakes
originating when daylight saving time changes or when aggregating data
created by different PULSE systems that could be running different
values for the parameters time_zone_h
and
daylight_saving_time
.local_time
field facilitates the quick conversion
of the logged timestamps (always in UTC0) to local time; however, note
that local_time
is calculated based on the values provided
by the user for time_zone_h
and
daylight_saving_time
, and therefore will be incorrect if
those parameters aren’t adjusted properly.The data section is composed by a line with all logging channel ids, followed by data lines with a timestamp and the values read by the PULSE system for each of the channels. Note that there’s always a reading for each channel, even if not all were in fact used during data collection.
#> time c01 c02 c03 c04 c05 c06 c07 c08 c09 c10
#> 1 2024-10-01 10:51:39.017 655 1657 2531 2446 2284 0 1519 1999 1548 1353
#> 2 2024-10-01 10:51:39.184 283 1797 0 2294 2029 0 1147 917 366 917
#> 3 2024-10-01 10:51:39.402 626 1789 0 2476 1769 0 1332 663 803 1383
#> 4 2024-10-01 10:51:39.443 830 1776 0 2514 1787 0 1421 939 1299 1553
#> 5 2024-10-01 10:51:39.483 939 1765 0 2183 1701 0 1486 1109 1727 1643
Crucially, in order to process the dataset, heartbeatr
must join data from all the files it receives as input. Therefore, all
files must have been generated suing the same key PULSE system parameter
options, to ensure full consistency across files. This usually means
that datasets should be processed one experiment at a time, even if data
are to be merged later on. Thus, within the context of
heartbeatr
, one experiment concerns data
collected:
In summary, even if there are four distinct treatments in a
scientific experiment and different PULSE system is being used
simultaneously to record data from each treatment, then data for each of
the four treatments should initially be processed by
heartbeatr
independently, and only later be merged.
heartbeatr
workflowIf the dataset follows all the rules and structuring mentioned above, it can be processed as follows:
pulse_read()
pulse_read()
takes paths to PULSE files, reads all data
and binds into a single, continuous stream.
For the purpose of this vignette, we will restrict the number of channels being analysed (this makes the code run faster and the visualizations less crowded).
channels <- paste0("c", formatC(1:10, width = 2, flag = "0"))
discard <- c("c01", "c02", "c05", "c07", "c08", "c09", "c10")
pulse_data$data <- pulse_data$data[, c("time", setdiff(channels, discard))]
pulse_data$data
#> # A tibble: 20,094 × 4
#> time c03 c04 c06
#> <dttm> <dbl> <dbl> <dbl>
#> 1 2024-10-01 10:51:39 2531 2446 0
#> 2 2024-10-01 10:51:39 0 2294 0
#> 3 2024-10-01 10:51:39 0 2476 0
#> 4 2024-10-01 10:51:39 0 2514 0
#> 5 2024-10-01 10:51:39 0 2183 0
#> 6 2024-10-01 10:51:39 0 1591 0
#> 7 2024-10-01 10:51:39 0 894 0
#> 8 2024-10-01 10:51:39 0 332 46
#> 9 2024-10-01 10:51:39 0 0 519
#> 10 2024-10-01 10:51:39 0 0 1176
#> # ℹ 20,084 more rows
Note that over the short span of this example dataset (just 13.4 minutes), the resulting dataframe already comprises 20094 rows of raw data.
pulse_split()
pulse_split()
takes the output of
pulse_read()
and splits it into chunks.
pulse_data_split <- pulse_split(
pulse_data = pulse_data,
window_width_secs = 30,
window_shift_secs = 60,
msg = FALSE
)
pulse_data_split
#> # A tibble: 13 × 3
#> i smoothed data
#> <int> <lgl> <list>
#> 1 1 FALSE <tibble [758 × 4]>
#> 2 2 FALSE <tibble [754 × 4]>
#> 3 3 FALSE <tibble [752 × 4]>
#> 4 4 FALSE <tibble [749 × 4]>
#> 5 5 FALSE <tibble [749 × 4]>
#> 6 6 FALSE <tibble [740 × 4]>
#> 7 7 FALSE <tibble [749 × 4]>
#> 8 8 FALSE <tibble [756 × 4]>
#> 9 9 FALSE <tibble [749 × 4]>
#> 10 10 FALSE <tibble [754 × 4]>
#> 11 11 FALSE <tibble [755 × 4]>
#> 12 12 FALSE <tibble [755 × 4]>
#> 13 13 FALSE <tibble [755 × 4]>
In this case, the combination of window_width_secs
and
window_shift_secs
chosen resulted in 13 time windows with
raw data.
pulse_data_split$data[[1]]
#> # A tibble: 758 × 4
#> time c03 c04 c06
#> <dttm> <dbl> <dbl> <dbl>
#> 1 2024-10-01 10:52:00 4095 2651 2958
#> 2 2024-10-01 10:52:00 4095 2685 3532
#> 3 2024-10-01 10:52:00 4095 2842 4095
#> 4 2024-10-01 10:52:00 4095 2921 4095
#> 5 2024-10-01 10:52:00 4095 2879 4095
#> 6 2024-10-01 10:52:00 4095 2717 4095
#> 7 2024-10-01 10:52:00 4095 2669 4095
#> 8 2024-10-01 10:52:00 4095 2716 4095
#> 9 2024-10-01 10:52:00 4095 2795 4095
#> 10 2024-10-01 10:52:00 4095 2745 4095
#> # ℹ 748 more rows
pulse_optimize()
pulse_optimize()
takes the output from
pulse_split()
and refines the signals in each time window
in order to increase the likelihood that heart rates can be determined
accurately. This typically involves applying linear interpolation (to
ensure that each wave peak is defined by a sufficient number of points)
and smoothing (to remove low-frequency noise) to the data.
pulse_data_optimized <- pulse_optimize(
pulse_data_split = pulse_data_split,
interpolation_freq = 40,
bandwidth = 0.75,
raw_v_smoothed = FALSE,
multi = TRUE
)
$smoothed = FALSE
and variable number of
rows in each window)pulse_data_split
#> # A tibble: 13 × 3
#> i smoothed data
#> <int> <lgl> <list>
#> 1 1 FALSE <tibble [758 × 4]>
#> 2 2 FALSE <tibble [754 × 4]>
#> 3 3 FALSE <tibble [752 × 4]>
#> 4 4 FALSE <tibble [749 × 4]>
#> 5 5 FALSE <tibble [749 × 4]>
#> 6 6 FALSE <tibble [740 × 4]>
#> 7 7 FALSE <tibble [749 × 4]>
#> 8 8 FALSE <tibble [756 × 4]>
#> 9 9 FALSE <tibble [749 × 4]>
#> 10 10 FALSE <tibble [754 × 4]>
#> 11 11 FALSE <tibble [755 × 4]>
#> 12 12 FALSE <tibble [755 × 4]>
#> 13 13 FALSE <tibble [755 × 4]>
$smoothed = TRUE
and same number of rows in
each window)pulse_data_optimized
#> # A tibble: 13 × 3
#> i smoothed data
#> <int> <lgl> <list>
#> 1 1 TRUE <tibble [1,200 × 4]>
#> 2 2 TRUE <tibble [1,200 × 4]>
#> 3 3 TRUE <tibble [1,200 × 4]>
#> 4 4 TRUE <tibble [1,200 × 4]>
#> 5 5 TRUE <tibble [1,200 × 4]>
#> 6 6 TRUE <tibble [1,200 × 4]>
#> 7 7 TRUE <tibble [1,200 × 4]>
#> 8 8 TRUE <tibble [1,200 × 4]>
#> 9 9 TRUE <tibble [1,200 × 4]>
#> 10 10 TRUE <tibble [1,200 × 4]>
#> 11 11 TRUE <tibble [1,200 × 4]>
#> 12 12 TRUE <tibble [1,200 × 4]>
#> 13 13 TRUE <tibble [1,200 × 4]>
pulse_heart()
pulse_heart()
takes the output of
pulse_optimize()
and estimates a single value of heart rate
frequency for each channel/time window combination.
heart_rates <- pulse_heart(
pulse_data_split = pulse_data_optimized,
msg = FALSE
)
heart_rates
#> # A tibble: 39 × 9
#> i smoothed id time data hz n sd ci
#> <int> <lgl> <chr> <dttm> <list> <dbl> <int> <dbl> <dbl>
#> 1 1 TRUE c03 2024-10-01 10:52:14 <tibble> 0.313 9 0.586 1.15
#> 2 1 TRUE c04 2024-10-01 10:52:14 <tibble> 0.39 11 0.303 0.595
#> 3 1 TRUE c06 2024-10-01 10:52:14 <tibble> 0.405 12 0.045 0.089
#> 4 2 TRUE c03 2024-10-01 10:53:15 <tibble> 0.283 8 1.04 2.04
#> 5 2 TRUE c04 2024-10-01 10:53:15 <tibble> 0.379 11 0.104 0.204
#> 6 2 TRUE c06 2024-10-01 10:53:15 <tibble> 0.423 12 0.089 0.174
#> 7 3 TRUE c03 2024-10-01 10:54:14 <tibble> 0.263 8 0.146 0.287
#> 8 3 TRUE c04 2024-10-01 10:54:14 <tibble> 0.385 11 0.171 0.335
#> 9 3 TRUE c06 2024-10-01 10:54:14 <tibble> 0.426 12 0.131 0.256
#> 10 4 TRUE c03 2024-10-01 10:55:15 <tibble> 0.258 7 0.902 1.77
#> # ℹ 29 more rows
$hz
holds the estimates of heart rate$n
is the number of wave peaks identified and used to
compute hz
$sd
is the standard deviation of the lengths of all
intervals between consecutive wave peakspulse_doublecheck()
Under certain conditions, the algorithm used in
pulse_heart()
may identify heartbeat wave signals that
feature two peaks per heartbeat as two individual heartbeats. This has
the adverse effect of leading to estimates of heart rates that are twice
as high as the real value. To minimize this, the output of
pulse_heart()
can be run through
pulse_doublecheck()
, which will identify and correct the
majority of those instances.
heart_rates <- pulse_doublecheck(
heart_rates = heart_rates
)
heart_rates
#> # A tibble: 39 × 11
#> i smoothed id time data hz n sd ci
#> <int> <lgl> <chr> <dttm> <list> <dbl> <int> <dbl> <dbl>
#> 1 1 TRUE c03 2024-10-01 10:52:14 <tibble> 0.313 9 0.586 1.15
#> 2 1 TRUE c04 2024-10-01 10:52:14 <tibble> 0.39 11 0.303 0.595
#> 3 1 TRUE c06 2024-10-01 10:52:14 <tibble> 0.405 12 0.045 0.089
#> 4 2 TRUE c03 2024-10-01 10:53:15 <tibble> 0.136 4 1.8 3.53
#> 5 2 TRUE c04 2024-10-01 10:53:15 <tibble> 0.379 11 0.104 0.204
#> 6 2 TRUE c06 2024-10-01 10:53:15 <tibble> 0.423 12 0.089 0.174
#> 7 3 TRUE c03 2024-10-01 10:54:14 <tibble> 0.13 4 0.202 0.396
#> 8 3 TRUE c04 2024-10-01 10:54:14 <tibble> 0.385 11 0.171 0.335
#> 9 3 TRUE c06 2024-10-01 10:54:14 <tibble> 0.426 12 0.131 0.256
#> 10 4 TRUE c03 2024-10-01 10:55:15 <tibble> 0.258 7 0.902 1.77
#> # ℹ 29 more rows
#> # ℹ 2 more variables: d_r <dbl>, d_f <lgl>
Two columns (d_r
and d_f
) were added to
heart_rates
, providing the necessary information to screen
and handle heart rate doublings.
#> ℹ loading data for analysis
#> ✔ loading data for analysis [417ms]
#>
#> ℹ computing heart rates
#> ✔ computing heart rates [1.3s]
#>
#> ℹ finalizing
#> ✔ finalizing [47ms]
#>
#> → completed: 2025-09-14 10:30:26
#> → [elapsed: 0.03 mins]
#> ℹ loading data for analysis
#> ✔ loading data for analysis [416ms]
#>
#> ℹ computing heart rates
#> ✔ computing heart rates [1.2s]
#>
#> ℹ finalizing
#> ✔ finalizing [48ms]
#>
#> → completed: 2025-09-14 10:30:28
#> → [elapsed: 0.03 mins]
Note that by default pulse_doublecheck()
runs with
correct = TRUE
. When that is the case, values for
hz
in the output of pulse_doublecheck()
have
already been halved whenever heart rate doublings have been
identified.
pulse_choose_keep()
The final step involves matching key quality indicators against
user-provided thresholds to assess whether or not to keep each heart
rate estimate.
This is achieved by running the output of
pulse_doublecheck()
through
pulse_choose_keep()
, resulting in the addition of the
column $keep
, which can be used to filter out poor
estimates.
heart_rates <- pulse_choose_keep(
heart_rates = heart_rates
)
heart_rates
#> # A tibble: 39 × 12
#> i smoothed id time data hz n sd ci
#> <int> <lgl> <chr> <dttm> <list> <dbl> <int> <dbl> <dbl>
#> 1 1 TRUE c03 2024-10-01 10:52:14 <tibble> 0.313 9 0.586 1.15
#> 2 1 TRUE c04 2024-10-01 10:52:14 <tibble> 0.39 11 0.303 0.595
#> 3 1 TRUE c06 2024-10-01 10:52:14 <tibble> 0.405 12 0.045 0.089
#> 4 2 TRUE c03 2024-10-01 10:53:15 <tibble> 0.136 4 1.8 3.53
#> 5 2 TRUE c04 2024-10-01 10:53:15 <tibble> 0.379 11 0.104 0.204
#> 6 2 TRUE c06 2024-10-01 10:53:15 <tibble> 0.423 12 0.089 0.174
#> 7 3 TRUE c03 2024-10-01 10:54:14 <tibble> 0.13 4 0.202 0.396
#> 8 3 TRUE c04 2024-10-01 10:54:14 <tibble> 0.385 11 0.171 0.335
#> 9 3 TRUE c06 2024-10-01 10:54:14 <tibble> 0.426 12 0.131 0.256
#> 10 4 TRUE c03 2024-10-01 10:55:15 <tibble> 0.258 7 0.902 1.77
#> # ℹ 29 more rows
#> # ℹ 3 more variables: d_r <dbl>, d_f <lgl>, keep <lgl>
Note that 7 entries were classified as not meeting the selection
criteria.
If we filter those out, we’re left with 32 entries that can be used for
subsequent analyses.
dplyr::filter(heart_rates, keep)
#> # A tibble: 32 × 12
#> i smoothed id time data hz n sd ci
#> <int> <lgl> <chr> <dttm> <list> <dbl> <int> <dbl> <dbl>
#> 1 1 TRUE c03 2024-10-01 10:52:14 <tibble> 0.313 9 0.586 1.15
#> 2 1 TRUE c04 2024-10-01 10:52:14 <tibble> 0.39 11 0.303 0.595
#> 3 1 TRUE c06 2024-10-01 10:52:14 <tibble> 0.405 12 0.045 0.089
#> 4 2 TRUE c04 2024-10-01 10:53:15 <tibble> 0.379 11 0.104 0.204
#> 5 2 TRUE c06 2024-10-01 10:53:15 <tibble> 0.423 12 0.089 0.174
#> 6 3 TRUE c03 2024-10-01 10:54:14 <tibble> 0.13 4 0.202 0.396
#> 7 3 TRUE c04 2024-10-01 10:54:14 <tibble> 0.385 11 0.171 0.335
#> 8 3 TRUE c06 2024-10-01 10:54:14 <tibble> 0.426 12 0.131 0.256
#> 9 4 TRUE c04 2024-10-01 10:55:15 <tibble> 0.338 10 0.131 0.257
#> 10 4 TRUE c06 2024-10-01 10:55:15 <tibble> 0.422 13 0.082 0.16
#> # ℹ 22 more rows
#> # ℹ 3 more variables: d_r <dbl>, d_f <lgl>, keep <lgl>
Users can also execute the entire heartbeatr
workflow
with a single call to the wrapper function PULSE()
.
# the code below takes approximately 30 seconds
# to run on a machine with the following specs:
# - MacBook Air 2022
# - chip: Apple M2
# - memory: 16 GB RAM
# - macOS: Sequoia 15.3.1
heart_rates_PULSE <- PULSE(
paths = pulse_example(),
window_width_secs = 30,
window_shift_secs = 60,
bandwidth = 0.75,
raw_v_smoothed = FALSE,
discard_channels = discard
)
#> ℹ loading data for analysis
#> ✔ loading data for analysis [415ms]
#>
#> ℹ computing heart rates
#> ✔ computing heart rates [1.2s]
#>
#> ℹ finalizing
#> ✔ finalizing [48ms]
#>
#> → completed: 2025-09-14 10:30:30
#> → [elapsed: 0.03 mins]
heart_rates_PULSE
#> # A tibble: 39 × 12
#> i smoothed id time data hz n sd ci
#> <int> <lgl> <chr> <dttm> <list> <dbl> <int> <dbl> <dbl>
#> 1 1 TRUE c03 2024-10-01 10:52:14 <tibble> 0.313 9 0.586 1.15
#> 2 1 TRUE c04 2024-10-01 10:52:14 <tibble> 0.39 11 0.303 0.595
#> 3 1 TRUE c06 2024-10-01 10:52:14 <tibble> 0.405 12 0.045 0.089
#> 4 2 TRUE c03 2024-10-01 10:53:15 <tibble> 0.136 4 1.8 3.53
#> 5 2 TRUE c04 2024-10-01 10:53:15 <tibble> 0.379 11 0.104 0.204
#> 6 2 TRUE c06 2024-10-01 10:53:15 <tibble> 0.423 12 0.089 0.174
#> 7 3 TRUE c03 2024-10-01 10:54:14 <tibble> 0.13 4 0.202 0.396
#> 8 3 TRUE c04 2024-10-01 10:54:14 <tibble> 0.385 11 0.171 0.335
#> 9 3 TRUE c06 2024-10-01 10:54:14 <tibble> 0.426 12 0.131 0.256
#> 10 4 TRUE c03 2024-10-01 10:55:15 <tibble> 0.258 7 0.902 1.77
#> # ℹ 29 more rows
#> # ℹ 3 more variables: d_r <dbl>, d_f <lgl>, keep <lgl>
Note that the final output is exactly the same and that there are no
performance advantages from using a single call to PULSE()
compared to using the step-by-step workflow.
Despite its convenience, there are still moments when the step-by-step may come handy:
The most basic visualization of the output of the
heartbeatr
workflow is a plot heart rate over time.
While not meant to provide the highest quality graphics, the
heartbeatr
package includes two main plotting functions to
ensure that users can quickly inspect the results.
pulse_plot()
Inspecting the entire processed dataset can be done with a simple call.
It is usually important to filter out estimates that didn’t meet the quality thresholds.
pulse_plot(dplyr::filter(heart_rates, keep))
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in sqrt(sum.squares/one.delta): NaNs produced
#> Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
Function parameters and general ggplot2
syntax can be
used to further customize the output of pulse_plot()
.
pulse_plot(
dplyr::filter(heart_rates, keep),
facets = FALSE,
bpm = TRUE,
smooth = FALSE,
points = FALSE
)
pulse_plot(filter(heart_rates, keep)) +
ggplot2::geom_vline(ggplot2::aes(xintercept = mean(heart_rates$time))) +
ggplot2::theme_bw() +
ggplot2::ggtitle("example")
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in sqrt(sum.squares/one.delta): NaNs produced
#> Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
It is also possible to focus on a single id
.
pulse_plot(dplyr::filter(heart_rates, keep), ID = "c03")
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in sqrt(sum.squares/one.delta): NaNs produced
#> Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
pulse_plot_raw()
Often, it is crucial to check more closely the data underlying a particular heart rate estimate.
There are to ways to plot a specific channel + time window:
$i
If we want to check channel “c03” at around 2024-10-01 10:58…
dplyr::filter(heart_rates, id == "c03")
#> # A tibble: 13 × 12
#> i smoothed id time data hz n sd ci
#> <int> <lgl> <chr> <dttm> <list> <dbl> <int> <dbl> <dbl>
#> 1 1 TRUE c03 2024-10-01 10:52:14 <tibble> 0.313 9 0.586 1.15
#> 2 2 TRUE c03 2024-10-01 10:53:15 <tibble> 0.136 4 1.8 3.53
#> 3 3 TRUE c03 2024-10-01 10:54:14 <tibble> 0.13 4 0.202 0.396
#> 4 4 TRUE c03 2024-10-01 10:55:15 <tibble> 0.258 7 0.902 1.77
#> 5 5 TRUE c03 2024-10-01 10:56:15 <tibble> 0.279 8 0.541 1.06
#> 6 6 TRUE c03 2024-10-01 10:57:15 <tibble> 0.232 6 1.18 2.32
#> 7 7 TRUE c03 2024-10-01 10:58:15 <tibble> 0.231 7 0.143 0.28
#> 8 8 TRUE c03 2024-10-01 10:59:15 <tibble> 0.202 5 2.05 4.02
#> 9 9 TRUE c03 2024-10-01 11:00:15 <tibble> 0.186 5 0.035 0.069
#> 10 10 TRUE c03 2024-10-01 11:01:14 <tibble> 0.21 6 1.83 3.59
#> 11 11 TRUE c03 2024-10-01 11:02:14 <tibble> 0.232 6 1.97 3.86
#> 12 12 TRUE c03 2024-10-01 11:03:15 <tibble> 0.199 6 0.889 1.74
#> 13 13 TRUE c03 2024-10-01 11:04:14 <tibble> 0.182 4 0.104 0.204
#> # ℹ 3 more variables: d_r <dbl>, d_f <lgl>, keep <lgl>
… we can use target_i
… or we can use target_time
.
Note that both approaches yield the same result. Also, when using
target_time
, there’s no need to provide a timestamp
accurate to the second. Instead, pulse_plot_raw()
will
return the time window that is closest to the timestamp provided.
It is also possible to use the range
parameter to check
the raw data for a group of estimates.
bandwidth
)The heartbeatr
is a generic workflow that is not
optimized for any organism in particular. This makes it suitable to
process datasets derived from distinct species with different basal
heart rates and other circulatory system characteristics, all of which
could negatively impact the performance of a highly specific algorithm.
It also ensures that a wide range of experimental conditions can be
accommodated.
However, this also means that users should exercise caution when
selecting key function parameters, chiefly among them the smoothing
parameter bandwidth
, as that can have important
implications on the performance of the data processing. Other parameters
such as window_width_secs
, window_shift_secs
,
min_data_points
, and interpolation_freq
are
also important, however in their case, reading through each function’s
manual should be sufficient to make good guessess.
Regarding the trickier bandwidth
parameter, it
establishes the radius used for the operation of the smoothing function
(ksmooth
, from the stats
package). Higher
values remove more noise from the signal, improving the ability of the
algorithm to select “true” peaks. However, values too high will end up
masking out real wave crests, eventually leading to a degradation of
performance.
Because of inter- and intra-specific variability in heart rates,
variability in the placement of sensors, and the co-variation of these
factors with the rate of the heart being recorded (more stressful
treatments lead to higher heart rates, where each heartbeat wave
encompasses a narrower span of time, a situation that often amplifies
limitations of the algorithm), it is not possible to provide users with
definitive advice of how to set the bandwidth
parameter.
Furthermore, the bandwidth
of a Kernel smoother is a
free parameter, meaning that it cannot, by definition, be predicted
precisely or constrained a priori, and must always be estimated
experimentally for each dataset.
Therefore, the recommended approach is the following:
subset
and subset_seed
parameters
of the PULSE
function to process a subset of your data (so
that the processing runs quicker) using multiple combinations of the
parameters you want to test (in the example below,
bandwidth
is set to 0.2
and 1.5
).
subset_seed
to a fixed numeric
value (e.g., 1
) so that the subsets generated always
contain the same data points, which will facilitate the comparison of
performances.raw_v_smoothed
to
FALSE
so that only the smoothed version of the raw data is
used.x_02 <- PULSE(
paths = pulse_example(), bandwidth = 0.2,
raw_v_smoothed = FALSE, discard_channels = paste0("c0", 1:9),
subset = 10, subset_seed = 1, subset_reindex = TRUE, show_progress = FALSE)
x_15 <- PULSE(
paths = pulse_example(), bandwidth = 1.5,
raw_v_smoothed = FALSE, discard_channels = paste0("c0", 1:9),
subset = 10, subset_seed = 1, subset_reindex = TRUE, show_progress = FALSE)
# when bandwith = 0.5, only one peak is identified in each heartbeat, as it should
pulse_plot_raw(x_15, ID = "c10", target_i = 1)
subset
set to
0
) to process the entire datasetHRV (Heart Rate Variability) analyses focus on the variability of the intervals between consecutive heartbeats and are extremely important for assessing cardiac performance and heart conditions. There are numerous studies focusing on this metric, especially those addressing human health.
The authors of this package want to make clear that the SD provided
by the heartbeatr
workflow - in effect, a measure of heart
rate variability - SHOULD NOT be used for HRV analyses
on data collected from shelled invertebrates.
This recommendation is justified because, while photoplethysmographic data is often used for HRV analyses, that is typically the case when such data is collected in a controlled manner, such as from a person’s finger. In such cases, the quality of the raw signal tends to be sufficiently dependable, and therefore, analyses can go beyond the mere study of the heart rate frequency to also focus on variations of the timing of the intervals between consecutive heartbeats over short periods of time (a.k.a., HRV).
However, while we recognize the research value of such analyses, we cannot recommend a similar approach based on data collected with the PULSE system from shelled invertebrates. The reason behind this is that many factors often impact the quality of the signal captured, including the placement of the sensor on the shell, which may be “looking” at different components of the circulatory system, the impact of movements of the animal’s body unrelated to cardiac performance, and many other quality-degrading factors.
As a result, we believe that making assertions about an organism’s physiological state based on the variability of the timing between consecutive peaks is beyond the capabilities of the PULSE-heartbeatr workflow. That is why we only recommend the use of the SD of the intervals between peaks during each split window for assisting in determining the quality of the computed heart rate estimates and filtering out low-quality data points – a task that is far less consequential and that therefore can accept a greater deal of uncertainty.
On top of that, even the quantification of the accuracy of the timing of individual heartbeats isn’t straightforward, as it would necessitate a reference dataset for which the timing of the heartbeats was determinatively known – something that very rarely would be available to researchers, as it would imply the acquisition of a simultaneous ECG, which is invasive and beyond most researchers’ technical capabilities (and likely not feasible even with today’s technology, given that the implanted ECG electrodes would need to remain in place for the entire duration of the experiment).
Before moving onto the proper analysis of the results, it is often
important to perform two final tweeks to the output of the
heartbeatr
workflow.
Even individuals from the same species have different basal heart rates. This means that it is often better to compare the rate of change of heart rate instead of the absolute values themselves. To do that, the heart rate estimates must first be normalized using a stable reference period, usually during acclimation.
heart_rates_norm <- pulse_normalize(
dplyr::filter(heart_rates, keep),
t0 = "2024-10-01 10:52",
span_mins = 5
)
heart_rates_norm
#> # A tibble: 32 × 13
#> i smoothed id time data hz n sd ci
#> <int> <lgl> <chr> <dttm> <list> <dbl> <int> <dbl> <dbl>
#> 1 1 TRUE c03 2024-10-01 10:52:14 <tibble> 0.313 9 0.586 1.15
#> 2 1 TRUE c04 2024-10-01 10:52:14 <tibble> 0.39 11 0.303 0.595
#> 3 1 TRUE c06 2024-10-01 10:52:14 <tibble> 0.405 12 0.045 0.089
#> 4 2 TRUE c04 2024-10-01 10:53:15 <tibble> 0.379 11 0.104 0.204
#> 5 2 TRUE c06 2024-10-01 10:53:15 <tibble> 0.423 12 0.089 0.174
#> 6 3 TRUE c03 2024-10-01 10:54:14 <tibble> 0.13 4 0.202 0.396
#> 7 3 TRUE c04 2024-10-01 10:54:14 <tibble> 0.385 11 0.171 0.335
#> 8 3 TRUE c06 2024-10-01 10:54:14 <tibble> 0.426 12 0.131 0.256
#> 9 4 TRUE c04 2024-10-01 10:55:15 <tibble> 0.338 10 0.131 0.257
#> 10 4 TRUE c06 2024-10-01 10:55:15 <tibble> 0.422 13 0.082 0.16
#> # ℹ 22 more rows
#> # ℹ 4 more variables: d_r <dbl>, d_f <lgl>, keep <lgl>, hz_norm <dbl>
pulse_plot(heart_rates_norm, normalized = FALSE, facets = FALSE)
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in sqrt(sum.squares/one.delta): NaNs produced
#> Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
pulse_plot(heart_rates_norm, normalized = TRUE, facets = FALSE)
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : Chernobyl! trL>n 6
#> Warning in sqrt(sum.squares/one.delta): NaNs produced
#> Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
Note how all channels begin with the same normalized heart rate (around 1) and the different trajectories become evident.
Lastly, after filtering out the entries that didn’t meet the quality thresholds, we were left with several gaps in the final dataset. One way to handle this is by binning data over wider time windows.
heart_rates_sum <- pulse_summarise(
heart_rates_norm,
FUN = mean,
span_mins = 5,
min_data_points = 0)
heart_rates_sum
#> # A tibble: 9 × 8
#> i id time hz hz_norm n sd ci
#> <dbl> <chr> <dttm> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 1 c03 2024-10-01 10:50:00 0.222 0.920 2 0.129 0.254
#> 2 1 c04 2024-10-01 10:50:00 0.385 1.04 3 0.00551 0.0108
#> 3 1 c06 2024-10-01 10:50:00 0.418 0.991 3 0.0114 0.0223
#> 4 2 c04 2024-10-01 10:55:00 0.331 0.898 5 0.0163 0.0320
#> 5 2 c06 2024-10-01 10:55:00 0.426 1.01 5 0.00846 0.0166
#> 6 3 c03 2024-10-01 10:55:00 0.255 1.06 2 0.0339 0.0665
#> 7 4 c03 2024-10-01 11:00:00 0.184 0.765 2 0.00283 0.00554
#> 8 4 c04 2024-10-01 11:00:00 0.313 0.847 5 0.0245 0.0481
#> 9 4 c06 2024-10-01 11:00:00 0.422 1.00 5 0.0164 0.0322
pulse_plot(heart_rates_sum, normalized = FALSE, facets = FALSE)
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : span too small. fewer data values than degrees of freedom.
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : pseudoinverse used at 1.7278e+09
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : neighborhood radius 303
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : reciprocal condition number 0
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : There are other near singularities as well. 91809
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : span too small. fewer
#> data values than degrees of freedom.
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
#> 1.7278e+09
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius 303
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
#> number 0
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : There are other near
#> singularities as well. 91809
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : span too small. fewer data values than degrees of freedom.
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : pseudoinverse used at 1.7278e+09
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : neighborhood radius 303
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : reciprocal condition number 0
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : There are other near singularities as well. 91809
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : span too small. fewer
#> data values than degrees of freedom.
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
#> 1.7278e+09
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius 303
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
#> number 0
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : There are other near
#> singularities as well. 91809
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : span too small. fewer data values than degrees of freedom.
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : pseudoinverse used at 1.7278e+09
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : neighborhood radius 303
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : reciprocal condition number 0
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
#> : There are other near singularities as well. 91809
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : span too small. fewer
#> data values than degrees of freedom.
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
#> 1.7278e+09
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius 303
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
#> number 0
#> Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
#> else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : There are other near
#> singularities as well. 91809
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf
Now, there are estimates at regular intervals for all channels.