| Type: | Package |
| Title: | Tidy Differential Privacy |
| Version: | 0.1.0 |
| Description: | A tidy-style interface for applying differential privacy to data frames. Provides pipe-friendly functions to add calibrated noise, compute private statistics, and track privacy budgets using the epsilon-delta differential privacy framework. Implements the Laplace mechanism (Dwork et al. 2006 <doi:10.1007/11681878_14>) and the Gaussian mechanism for achieving differential privacy as described in Dwork and Roth (2014) <doi:10.1561/0400000042>. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | magrittr, stats |
| Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown |
| VignetteBuilder: | knitr |
| URL: | https://github.com/ttarler/tidydp |
| BugReports: | https://github.com/ttarler/tidydp/issues |
| NeedsCompilation: | no |
| Packaged: | 2025-11-23 17:20:57 UTC; ttarler |
| Author: | Thomas Tarler [aut, cre] |
| Maintainer: | Thomas Tarler <ttarler@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-11-27 19:00:02 UTC |
tidydp: Tidy Differential Privacy
Description
A tidy-style interface for applying differential privacy to data frames. Provides pipe-friendly functions to add calibrated noise, compute private statistics, and track privacy budgets using the epsilon-delta differential privacy framework.
Author(s)
Maintainer: Thomas Tarler ttarler@gmail.com
See Also
Useful links:
Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling 'rhs(lhs)'.
Add Gaussian Noise
Description
Adds Gaussian (normal) noise to a numeric value or vector for (epsilon, delta)-differential privacy. The Gaussian mechanism provides (epsilon, delta)-DP and is often used when delta > 0 is acceptable.
Usage
add_gaussian_noise(x, sensitivity, epsilon, delta = 1e-05)
Arguments
x |
Numeric value or vector to add noise to |
sensitivity |
The L2 sensitivity of the query |
epsilon |
Privacy parameter (smaller = more privacy) |
delta |
Privacy parameter (probability of privacy breach), typically very small |
Value
Numeric value or vector with Gaussian noise added
Add Laplace Noise
Description
Adds Laplace-distributed noise to a numeric value or vector for differential privacy. The Laplace mechanism is typically used for queries with sensitivity based on the maximum absolute difference a single record can make.
Usage
add_laplace_noise(x, sensitivity, epsilon)
Arguments
x |
Numeric value or vector to add noise to |
sensitivity |
The sensitivity of the query (maximum change from one record) |
epsilon |
Privacy parameter (smaller = more privacy, more noise) |
Value
Numeric value or vector with Laplace noise added
Check Privacy Budget
Description
Checks if a proposed operation would exceed the privacy budget
Usage
check_privacy_budget(budget, epsilon_required, delta_required = 0)
Arguments
budget |
A privacy budget object |
epsilon_required |
Epsilon required for the operation |
delta_required |
Delta required for the operation (default: 0) |
Value
Logical indicating if budget is sufficient
Examples
budget <- new_privacy_budget(epsilon_total = 1.0)
check_privacy_budget(budget, epsilon_required = 0.5)
Add Differentially Private Noise to Data Frame Columns
Description
Adds calibrated Laplace or Gaussian noise to specified numeric columns in a data frame to achieve differential privacy. This is the primary function for column-level privacy.
Usage
dp_add_noise(
data,
columns,
epsilon,
delta = NULL,
lower = NULL,
upper = NULL,
mechanism = NULL,
.budget = NULL
)
Arguments
data |
A data frame |
columns |
Character vector of column names to add noise to |
epsilon |
Privacy parameter (smaller = more privacy, more noise) |
delta |
Privacy parameter for Gaussian mechanism (default: NULL, uses Laplace) |
lower |
Named numeric vector of lower bounds for each column |
upper |
Named numeric vector of upper bounds for each column |
mechanism |
Either "laplace" or "gaussian" (auto-selected based on delta if NULL) |
.budget |
Optional privacy budget object to track expenditure |
Value
Data frame with noise added to specified columns
Examples
data <- data.frame(age = c(25, 30, 35, 40), income = c(50000, 60000, 70000, 80000))
private_data <- data %>%
dp_add_noise(
columns = c("age", "income"),
epsilon = 0.1,
lower = c(age = 0, income = 0),
upper = c(age = 100, income = 200000)
)
Differentially Private Count
Description
Computes a differentially private count of rows, optionally grouped by specified columns.
Usage
dp_count(data, epsilon, delta = NULL, group_by = NULL, .budget = NULL)
Arguments
data |
A data frame |
epsilon |
Privacy parameter |
delta |
Privacy parameter (default: NULL, uses Laplace mechanism) |
group_by |
Character vector of column names to group by (optional) |
.budget |
Optional privacy budget object to track expenditure |
Value
Data frame with (possibly grouped) counts
Examples
data <- data.frame(city = c("NYC", "LA", "NYC", "LA", "NYC"),
age = c(25, 30, 35, 40, 45))
# Overall count
dp_count(data, epsilon = 0.1)
# Grouped count
data %>% dp_count(epsilon = 0.1, group_by = "city")
Differentially Private Mean
Description
Computes a differentially private mean of a numeric column.
Usage
dp_mean(
data,
column,
epsilon,
delta = NULL,
lower = NULL,
upper = NULL,
group_by = NULL,
.budget = NULL
)
Arguments
data |
A data frame |
column |
Column name to compute mean of |
epsilon |
Privacy parameter |
delta |
Privacy parameter (default: NULL, uses Laplace mechanism) |
lower |
Lower bound of the data range |
upper |
Upper bound of the data range |
group_by |
Character vector of column names to group by (optional) |
.budget |
Optional privacy budget object to track expenditure |
Value
Data frame with (possibly grouped) private means
Examples
data <- data.frame(city = c("NYC", "LA", "NYC", "LA"),
income = c(50000, 60000, 70000, 80000))
data %>% dp_mean("income", epsilon = 0.1, lower = 0, upper = 200000, group_by = "city")
Differentially Private Sum
Description
Computes a differentially private sum of a numeric column.
Usage
dp_sum(
data,
column,
epsilon,
delta = NULL,
lower = NULL,
upper = NULL,
group_by = NULL,
.budget = NULL
)
Arguments
data |
A data frame |
column |
Column name to compute sum of |
epsilon |
Privacy parameter |
delta |
Privacy parameter (default: NULL, uses Laplace mechanism) |
lower |
Lower bound of the data range |
upper |
Upper bound of the data range |
group_by |
Character vector of column names to group by (optional) |
.budget |
Optional privacy budget object to track expenditure |
Value
Data frame with (possibly grouped) private sums
Examples
data <- data.frame(city = c("NYC", "LA", "NYC", "LA"),
sales = c(100, 200, 150, 250))
data %>% dp_sum("sales", epsilon = 0.1, lower = 0, upper = 1000, group_by = "city")
Create a New Privacy Budget
Description
Initializes a privacy budget tracker for managing epsilon and delta across multiple differentially private operations. The budget uses composition theorems to track cumulative privacy loss.
Usage
new_privacy_budget(epsilon_total, delta_total = 1e-05, composition = "basic")
Arguments
epsilon_total |
Total epsilon budget available |
delta_total |
Total delta budget available (default: 1e-5) |
composition |
Method for budget composition: "basic" or "advanced" (default: "basic") |
Value
A privacy budget object (list with class "privacy_budget")
Examples
budget <- new_privacy_budget(epsilon_total = 1.0, delta_total = 1e-5)
Print Privacy Budget
Description
Print Privacy Budget
Usage
## S3 method for class 'privacy_budget'
print(x, ...)
Arguments
x |
A privacy budget object |
... |
Additional arguments (unused) |
Value
Returns the privacy budget object invisibly. Called primarily for the side effect of printing budget information to the console, including total epsilon and delta budgets, amounts spent, remaining budget, composition method, and number of operations executed.
Calculate L1 Sensitivity for Count Queries
Description
For count queries, the sensitivity is 1 (adding/removing one record changes count by 1)
Usage
sensitivity_count()
Value
Numeric sensitivity value
Calculate L2 Sensitivity for Mean Queries
Description
For mean queries with bounded data
Usage
sensitivity_mean(lower, upper, n)
Arguments
lower |
Lower bound of the data range |
upper |
Upper bound of the data range |
n |
Sample size |
Value
Numeric sensitivity value
Calculate L1 Sensitivity for Sum Queries
Description
For sum queries with bounded data, the sensitivity is the maximum change in the sum when one record is substituted (changed from any value to any other value in the range). This uses the standard substitution model for differential privacy.
Usage
sensitivity_sum(lower, upper)
Arguments
lower |
Lower bound of the data range |
upper |
Upper bound of the data range |
Value
Numeric sensitivity value
Spend Privacy Budget
Description
Records a privacy expenditure and updates the budget
Usage
spend_privacy_budget(
budget,
epsilon_spent,
delta_spent = 0,
operation_name = NULL
)
Arguments
budget |
A privacy budget object |
epsilon_spent |
Epsilon spent on the operation |
delta_spent |
Delta spent on the operation (default: 0) |
operation_name |
Name/description of the operation (optional) |
Value
Updated privacy budget object