Type: | Package |
Title: | Causal Inference with Tree-Based Machine Learning Algorithms |
Version: | 0.1.20 |
Description: | Estimating heterogeneous treatment effects with tree-based machine learning algorithms and visualizing estimated results in flexible and presentation-ready ways. For more information, see Brand, Xu, Koch, and Geraldo (2021) <doi:10.1177/0081175021993503>. Our current package first started as a fork of the 'causalTree' package on 'GitHub' and we greatly appreciate the authors for their extremely useful and free package. |
Depends: | R (≥ 3.6.0) |
Imports: | Rcpp, grf, partykit, data.tree, Matching, dplyr, jsonlite, rpart, rpart.plot, shiny, stringr |
Suggests: | optmatch, haven, foreign, data.table, remotes, party |
License: | GPL-2 | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | yes |
Packaged: | 2025-01-13 22:59:20 UTC; xujiahui |
Author: | Jiahui Xu [cre, aut], Tanvi Shinkre [aut], Jennie Brand [aut] |
Maintainer: | Jiahui Xu <jiahuixu@ucla.edu> |
Repository: | CRAN |
Date/Publication: | 2025-01-13 23:40:02 UTC |
Include the Javascript Used in Shiny
Description
intermediate function used to include necessary javascript to visualize tree structures and estimated treatment effect in shiny
Usage
bundScript(...)
Arguments
... |
There is no required arguments in this function. But user could manipulate to include different css files. |
Value
No return value. It is used to pass the Javascript to Shiny.
Causal Effect Regression and Estimation Trees
Description
Fit a causalTree
model to get an rpart
object
Usage
causalTree(
formula,
data,
weights,
treatment,
subset,
na.action = na.causalTree,
split.Rule,
split.Honest,
HonestSampleSize,
split.Bucket,
bucketNum = 5,
bucketMax = 100,
cv.option,
cv.Honest,
minsize = 2L,
x = FALSE,
y = TRUE,
propensity,
control,
split.alpha = 0.5,
cv.alpha = 0.5,
cv.gamma = 0.5,
split.gamma = 0.5,
cost,
...
)
Arguments
formula |
a formula, with a response and features but
no interaction terms. If this a a data frome, that is taken as
the model frame (see |
data |
an optional data frame that includes the variables named in the formula. |
weights |
optional case weights. |
treatment |
a vector that indicates the treatment status of each observation. 1 represents treated and 0 represents control. Only binary treatment supported in this version. |
subset |
optional expression saying that only a subset of the rows of the data should be used in the fit. |
na.action |
the default action deletes all observations for which
|
split.Rule |
causalTree splitting options, one of |
split.Honest |
boolean option, |
HonestSampleSize |
number of observations anticipated to be used in honest re-estimation after building the tree. This enters the risk function used in both splitting and cross-validation. |
split.Bucket |
boolean option, |
bucketNum |
number of observations in each bucket when set
|
bucketMax |
Option to choose maximum number of buckets to use in
splitting when set |
cv.option |
cross validation options, one of |
cv.Honest |
boolean option, |
minsize |
in order to split, each leaf must have at least
|
x |
keep a copy of the |
y |
keep a copy of the dependent variable in the result. If
missing and |
propensity |
propensity score used in |
control |
a list of options that control details of the
|
split.alpha |
scale parameter between 0 and 1, used in splitting
risk evaluation function for |
cv.alpha |
scale paramter between 0 and 1, used in cross validation
risk evaluation function for |
cv.gamma , split.gamma |
optional parameters used in evaluating policies. |
cost |
a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose. |
... |
arguments to |
Details
CausalTree differs from rpart
function from rpart
package in splitting rules and cross validation methods. Please check
Athey and Imbens, Recursive Partitioning for Heterogeneous Causal
Effects (2016) for more details.
Value
An object of class rpart
. See rpart.object
.
References
Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.
Athey, S and G Imbens (2016) Recursive Partitioning for Heterogeneous Causal Effects. http://arxiv.org/abs/1504.01132
See Also
honest.causalTree
,
rpart.control
, rpart.object
,
summary.rpart
, rpart.plot
Examples
library("htetree")
library("rpart")
library("rpart.plot")
tree <- causalTree(y~ x1 + x2 + x3 + x4, data = simulation.1,
treatment = simulation.1$treatment,
split.Rule = "CT", cv.option = "CT", split.Honest = TRUE, cv.Honest = TRUE,
split.Bucket = FALSE, xval = 5,
cp = 0, minsize = 20, propensity = 0.5)
opcp <- tree$cptable[,1][which.min(tree$cptable[,4])]
opfit <- prune(tree, opcp)
rpart.plot(opfit)
Compute the "branches" to be drawn for an causalTree
object
Description
Compute the "branches" to be drawn for an causalTree
object
Usage
causalTree.branch(x, y, node, branch)
Arguments
x |
covariates |
y |
outcome |
node |
node of the fitted tree |
branch |
branch of the fitted tree |
Value
number of branches to be drawn
Intermediate function for causalTree
Description
Intermediate function for causalTree
Usage
causalTree.control(
minsplit = 20L,
minbucket = round(minsplit/3),
cp = 0,
maxcompete = 4L,
maxsurrogate = 5L,
usesurrogate = 2L,
xval = 10L,
surrogatestyle = 0L,
maxdepth = 30L,
...
)
Arguments
minsplit |
minimum number of splits |
minbucket |
minimum number of bucket |
cp |
default is 0 |
maxcompete |
maximum number of compete |
maxsurrogate |
maximum number of surrogate |
usesurrogate |
initial number of surrogate |
xval |
cross-validation |
surrogatestyle |
the style of surrogate |
maxdepth |
Maximum depth |
... |
arguments to |
Value
parameters used to in causalTree
Intermediate function for causalTree
Description
Intermediate function for causalTree
Usage
causalTree.matrix(frame)
Arguments
frame |
inherited from data.frame |
Value
A covariate matrix used in the causal regression.
Intermediate function for causalTree
Description
This routine sets up the callback code for user-written split routines in causalTree
Usage
causalTreecallback(mlist, nobs, init)
Arguments
mlist |
a list of user written methods |
nobs |
number of observations |
init |
function name |
Value
split method written by users
Intermediate function for causalTree
Description
Compute the x-y coordinates for a tree
Usage
causalTreeco(tree, parms)
Arguments
tree |
an |
parms |
parms |
Value
the x-y coordinates for a tree
Clear Temporary Files
Description
The files for shiny are saved in a temporary directory. The files can be cleared manually using the 'clearTemp()' function, or will automatically be cleared when you close R
Usage
clearTemp()
Value
no return value, to unlink files under the temp folder
Intermediate function for causalTree
Description
Run down the built tree and get the final leaf ids for estimation sample
Usage
est.causalTree(fit, x)
Arguments
fit |
an |
x |
covariates |
Value
Intermediate estimation results for an causalTree
object.
estimate causal Tree
Description
estimate causal Tree
Usage
estimate.causalTree(
object,
data,
weights,
treatment,
na.action = na.causalTree
)
Arguments
object |
A tree-structured fit |
data |
New data frame to be used for estimating effects within leaves. |
weights |
optional case weights. |
treatment |
The treatment status of observations in the new dataframe, where 1 represents treated and 0 represents control. |
na.action |
the default action deletes all observations for which
|
Details
When the leaf contains only treated or control cases, the function will trace back to the leaf's parent node recursively until the parent can be used to compute causal effect. Please see Athey and Imbens Machine Learning Methods for Estimating Heterogeneous Causal Effects (2015) for details.
Value
Intermediate estimation results for an causalTree
object
Intermediate function for causalTree
Description
Intermediate function for causalTree
Usage
formatg(x, digits = getOption("digits"), format = paste0("%.", digits, "g"))
Arguments
x |
input training data |
digits |
number of digits to be kept |
format |
format of exported vector |
Value
No return value, called for formatting the exported estimates
Get the Current Working Directory
Description
get the current work directory and set it as the default directory to save the shiny files temporarily
Usage
getDefaultPath()
Value
a temporary file path
Getting Distribution in Treatment and Control Groups
Description
Getting the density of distribution in treatment and control groups, which will be displayed in the
Usage
getDensities(treatment, outcome)
Arguments
treatment |
A character representing the name of treatment indicator. |
outcome |
A character representing the name of outcome variable. |
Value
vector of corresponding densities for each value of outcome vector
Causal Effect Regression and Estimation Trees: One-step honest estimation
Description
Fit a causalTree
model to get an honest causal tree,
with tree structure built on training sample (including cross-validation)
and leaf estimates taken from estimation sample.
Return an rpart
object.
Usage
honest.causalTree(
formula,
data,
weights,
treatment,
subset,
est_data,
est_weights,
est_treatment,
est_subset,
na.action = na.causalTree,
split.Rule,
split.Honest,
HonestSampleSize,
split.Bucket,
bucketNum = 10,
bucketMax = 40,
cv.option,
cv.Honest,
minsize = 2L,
model = FALSE,
x = FALSE,
y = TRUE,
propensity,
control,
split.alpha = 0.5,
cv.alpha = 0.5,
cv.gamma = 0.5,
split.gamma = 0.5,
cost,
...
)
Arguments
formula |
a formula, with a response and features but
no interaction terms. If this a a data frome, that is taken as
the model frame (see |
data |
an optional data frame that includes the variables named in the formula. |
weights |
optional case weights. |
treatment |
a vector that indicates the treatment status of each observation. 1 represents treated and 0 represents control. Only binary treatment supported in this version. |
subset |
optional expression saying that only a subset of the rows of the data should be used in the fit. |
est_data |
data frame to be used for leaf estimates; the estimation sample. Must contain the variables used in training the tree. |
est_weights |
optional case weights for estimation sample |
est_treatment |
treatment vector for estimation sample. Must be same length as estimation data. A vector indicates the treatment status of the data, 1 represents treated and 0 represents control. Only binary treatment supported in this version. |
est_subset |
optional expression saying that only a subset of the rows of the estimation data should be used in the fit of the re-estimated tree. |
na.action |
the default action deletes all observations for which
|
split.Rule |
causalTree splitting options, one of |
split.Honest |
boolean option, |
HonestSampleSize |
number of observations anticipated to be used in honest re-estimation after building the tree. This enters the risk function used in both splitting and cross-validation. |
split.Bucket |
boolean option, |
bucketNum |
number of observations in each bucket when set
|
bucketMax |
Option to choose maximum number of buckets to use in
splitting when set |
cv.option |
cross validation options, one of |
cv.Honest |
boolean option, |
minsize |
in order to split, each leaf must have at least
|
model |
model frame of |
x |
keep a copy of the |
y |
keep a copy of the dependent variable in the result. If
missing and |
propensity |
propensity score used in |
control |
a list of options that control details of the
|
split.alpha |
scale parameter between 0 and 1, used in splitting
risk evaluation function for |
cv.alpha |
scale paramter between 0 and 1, used in cross validation
risk evaluation function for |
cv.gamma , split.gamma |
optional parameters used in evaluating policies. |
cost |
a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose. |
... |
arguments to |
Value
An object of class rpart
. See rpart.object
.
References
Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.
Athey, S and G Imbens (2016) Recursive Partitioning for Heterogeneous Causal Effects. http://arxiv.org/abs/1504.01132
See Also
causalTree
,
estimate.causalTree
, rpart.object
,
summary.rpart
, rpart.plot
honest re-estimation and change the frame of object using estimation sample
Description
honest re-estimation and change the frame of object using estimation sample
Usage
honest.est.causalTree(fit, x, wt, treatment, y)
Arguments
fit |
an |
x |
input training data |
wt |
optional weights |
treatment |
treatment variable |
y |
outcome variable |
Value
An object of class rpart
. See rpart.object
.
honest re-estimation and change the frame of object using estimation sample
Description
honest re-estimation and change the frame of object using estimation sample
Usage
honest.est.rparttree(fit, x, wt, y)
Arguments
fit |
an |
x |
input training data |
wt |
optional weights |
y |
outcome variable |
Value
Intermediate estimation results for an honest estimation of
causalTree
.
Honest recursive partitioning Tree
Description
The recursive partitioning function, for R
Usage
honest.rparttree(
formula,
data,
weights,
subset,
est_data,
est_weights,
na.action = na.rpart,
method,
model = FALSE,
x = FALSE,
y = TRUE,
parms,
control,
cost,
...
)
Arguments
formula |
a formula, with a response and features but
no interaction terms. If this a a data frome, that is taken as
the model frame (see |
data |
an optional data frame that includes the variables named in the formula. |
weights |
optional case weights. |
subset |
optional expression saying that only a subset of the rows of the data should be used in the fit. |
est_data |
data frame to be used for leaf estimates; the estimation sample. Must contain the variables used in training the tree. |
est_weights |
optional case weights for estimation sample |
na.action |
the default action deletes all observations for which
|
method |
one of Alternatively, |
model |
model frame of |
x |
keep a copy of the |
y |
keep a copy of the dependent variable in the result. If
missing and |
parms |
optional parameters for the splitting function. |
control |
a list of options that control details of the
|
cost |
a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose. |
... |
arguments to |
Value
An object of class rpart
after running an honest recursive
partitioning tree.
.
Estimate Heterogeneous Treatment Effect via Causal Tree
Description
Estimate heterogeneous treatment effect via causal tree. In each leaf, the treatment effect is the difference of mean outcome in treatment group and control group.
Usage
hte_causalTree(
outcomevariable,
minsize = 20,
crossvalidation = 20,
data,
treatment_indicator,
ps_indicator,
covariates,
negative = FALSE,
drawplot = TRUE,
varlabel = NULL,
maintitle = "Heterogeneous Treatment Effect Estimation",
legend.x = 0.08,
legend.y = 0.25,
check = FALSE,
...
)
Arguments
outcomevariable |
a character representing the column name of the outcome variable. |
minsize |
the minimum number of observations in each leaf. The default is set as 20. |
crossvalidation |
number of cross validations. The default is set as 20. |
data |
a data frame containing the variables in the model. |
treatment_indicator |
a character representing the column name of the treatment indicator. |
ps_indicator |
a character representing the column name of the propensity score. |
covariates |
a vector of column names of all covariates (linear terms andpropensity score). |
negative |
a logical value indicating whether we expect the treatment effect to be negative. The default is set as FALSE. |
drawplot |
a logical value indicating whether to plot the model as part of the output. The default is set as TRUE. |
varlabel |
a named vector containing variable labels. |
maintitle |
a character string indicating the main title displayed when plotting the tree and results. The default is set as "Heterogeneous Treatment Effect Estimation". |
legend.x , legend.y |
x and y coordinate to position the legend. The default is set as (0.08, 0.25). |
check |
if TRUE, generates 100 trees and outputs most common tree structures and their frequency |
... |
further arguments passed to or from other methods. |
Value
predicted treatment effect and the associated tree
Examples
library(rpart)
library(htetree)
hte_causalTree(outcomevariable="outcome",
data=data.frame("confounder"=c(0, 1, 1, 0, 1, 1),
"treatment"=c(0,0,0,1,1,1),
"prop_score"=c(0.4, 0.4, 0.5, 0.6, 0.6, 0.7),
"outcome"=c(1, 2, 2, 1, 4, 4)),
treatment_indicator = "treatment",
ps_indicator = "prop_score",
covariates = "confounder")
Estimate Heterogeneous Treatment Effect via Random Forest
Description
Estimate heterogeneous treatment effect via random forest. In each leaf, the treatment effect is the difference of mean outcome weighted by inverse propensity scores in treatment group and control group.
Usage
hte_forest(
outcomevariable,
minsize = 20,
crossvalidation = 20,
data = edurose_mediation_20181126,
treatment_indicator = "compcoll25",
ps_indicator = "propsc_com25",
ps_linear = "propsc_com25lin",
covariates = c(linear_terms, ps_indicator),
negative = FALSE,
drawplot = TRUE,
legend.x = 0.08,
legend.y = 0.25,
gf,
...
)
Arguments
outcomevariable |
a character representing the column name of the outcome variable. |
minsize |
the minimum number of observations in each leaf. The default is set as 20. |
crossvalidation |
number of cross validations. The default is set as 20. |
data |
a data frame containing the variables in the model. |
treatment_indicator |
a character representing the column name of the treatment indicator. |
ps_indicator |
a character representing the column name of the propensity score. |
ps_linear |
a character representing name of a column that stores linearized propensity scores. |
covariates |
a vector of column names of all covariates (linear terms andpropensity score). |
negative |
a logical value indicating whether we expect the treatment effect to be negative. The default is set as FALSE. |
drawplot |
a logical value indicating whether to plot the model as part of the output. The default is set as TRUE. |
legend.x , legend.y |
x and y coordinate to position the legend. The default is set as (0.08, 0.25). |
gf |
a fitted generalized random forest object |
... |
further arguments passed to or from other methods. |
Value
A list with three elements. The first one is the predicted outcome
for each unit. The second is an causalTree
object with the tree split
information. The third is a data.frame
summarizing the prediction
results.
Estimate Heterogeneous Treatment Effect via Adjusted Causal Tree
Description
Estimate heterogeneous treatment effect via adjusted causal tree. In each leaf, the treatment effect is the difference of mean outcome weighted by inverse propensity scores in treatment group and control group.
Usage
hte_ipw(
outcomevariable,
minsize = 20,
crossvalidation = 20,
data,
treatment_indicator,
ps_indicator,
ps_linear = NULL,
covariates,
negative = FALSE,
drawplot = TRUE,
varlabel = NULL,
maintitle = "Heterogeneous Treatment Effect Estimation",
legend.x = 0.08,
legend.y = 0.25,
check = FALSE,
...
)
Arguments
outcomevariable |
a character representing the column name of the outcome variable. |
minsize |
the minimum number of observations in each leaf. The default is set as 20. |
crossvalidation |
number of cross validations. The default is set as 20. |
data |
a data frame containing the variables in the model. |
treatment_indicator |
a character representing the column name of the treatment indicator. |
ps_indicator |
a character representing the column name of the propensity score. |
ps_linear |
a character representing name of a column that stores linearized propensity scores. |
covariates |
a vector of column names of all covariates (linear terms andpropensity score). |
negative |
a logical value indicating whether we expect the treatment effect to be negative. The default is set as FALSE. |
drawplot |
a logical value indicating whether to plot the model as part of the output. The default is set as TRUE. |
varlabel |
a named vector containing variable labels. |
maintitle |
a character string indicating the main title displayed when plotting the tree and results. The default is set as "Heterogeneous Treatment Effect Estimation". |
legend.x , legend.y |
x and y coordinate to position the legend. The default is set as (0.08, 0.25). |
check |
if TRUE, generates 100 trees and outputs most common tree structures and their frequency |
... |
further arguments passed to or from other methods. |
Value
predicted treatment effect and the associated tree
Examples
library(rpart)
library(htetree)
hte_ipw(outcomevariable="outcome",
data=data.frame("confounder"=c(0, 1, 1, 0, 1, 1),
"treatment"=c(0,0,0,1,1,1), "prop_score"=c(0.4, 0.4, 0.5, 0.6, 0.6, 0.7),
"outcome"=c(1, 2, 2, 1, 4, 4)), treatment_indicator = "treatment",
ps_indicator = "prop_score", covariates = "confounder")
Estimate Heterogeneous Treatment Effect via Adjusted Causal Tree
Description
Estimate heterogeneous treatment effect via adjusted causal tree. In each leaf, the treatment effect estimated from nn matching.
Usage
hte_match(
outcomevariable,
minsize = 20,
crossvalidation = 20,
data,
treatment_indicator,
ps_indicator,
ps_linear = NULL,
covariates,
negative = FALSE,
drawplot = TRUE,
con.num = 1,
varlabel = NULL,
maintitle = "Heterogeneous Treatment Effect Estimation",
legend.x = 0.08,
legend.y = 0.25,
check = FALSE,
...
)
Arguments
outcomevariable |
a character representing the column name of the outcome variable. |
minsize |
the minimum number of observations in each leaf. The default is set as 20. |
crossvalidation |
number of cross validations. The default is set as 20. |
data |
a data frame containing the variables in the model. |
treatment_indicator |
a character representing the column name of the treatment indicator. |
ps_indicator |
a character representing the column name of the propensity score. |
ps_linear |
a character representing name of a column that stores linearized propensity scores. |
covariates |
a vector of column names of all covariates (linear terms andpropensity score). |
negative |
a logical value indicating whether we expect the treatment effect to be negative. The default is set as FALSE. |
drawplot |
a logical value indicating whether to plot the model as part of the output. The default is set as TRUE. |
con.num |
a number indicating the number of units from control groups to be used in matching. |
varlabel |
a named vector containing variable labels. |
maintitle |
a character string indicating the main title displayed when plotting the tree and results. The default is set as "Heterogeneous Treatment Effect Estimation". |
legend.x , legend.y |
x and y coordinate to position the legend. The default is set as (0.08, 0.25). |
check |
if TRUE, generates 100 trees and outputs most common tree structures and their frequency |
... |
further arguments passed to or from other methods. |
Value
predicted treatment effect and the associated tree
Examples
library(rpart)
library(htetree)
hte_match(outcomevariable="outcome",
data=data.frame("x1"=c(0, 1, 1, 0, 1, 1),"x2"=c(3, 2, 1, 5, 7, 1),
"treatment"=c(0,0,0,1,1,1), "prop_score"=c(0.4, 0.4, 0.5, 0.6, 0.6, 0.7),
"outcome"=c(1, 2, 2, 1, 4, 4)), treatment_indicator = "treatment",
ps_indicator = "prop_score", covariates = c("x1","x2"))
Visualize the Estimated Results
Description
The function hte_plot
takes a model created by causal tree, as
well as the adjusted version, and
plots the distribution of the outcome variable in treated
and control groups in each leaf of the tree.
This visualization aims to show how the predicted
treatment effect changes with each split in the tree.
Usage
hte_plot(
model,
data,
treatment_indicator = NULL,
outcomevariable,
propensity_score,
plot.title = "Visualization of the Tree"
)
Arguments
model |
a tree model constructed by |
data |
a data frame containing the variables in the model. |
treatment_indicator |
a character representing the column name for the treatment variable in the causal setup. |
outcomevariable |
a character representing the column name of the outcome variable. |
propensity_score |
a character representing the column name of the propensity score. |
plot.title |
character representing the main title of the plot. |
Value
no return value
Visualize the Estimated Results
Description
The function hte_plot_line
takes a model created by
causal tree, as well as the adjusted version, and plots the
different least squares models used to estimate heterogeneous
treatment effects(HTE) at each node. At each node, this
visualization aims to show how the estimated treatment effect
differs when using ordinary least squares and weighted least
squares methods. The weighted least squares method in this
package uses
inverse propensity scores as weights, in order to reduce
bias due to confounding variables.
Usage
hte_plot_line(
model,
data,
treatment_indicator = NULL,
outcomevariable,
propensity_score,
plot.title = "Visualization of the Tree",
gamma = 0,
lambda = 0,
...
)
Arguments
model |
a tree model constructed by |
data |
a data frame containing the variables in the model. |
treatment_indicator |
a character representing the column name for the treatment variable in the causal setup. |
outcomevariable |
a character representing the column name of the outcome variable. |
propensity_score |
a character representing the column name of the propensity score. |
plot.title |
character representing the main title of the plot. |
gamma , lambda |
numbers indicating the bias level used in sensitivity analysis |
... |
further arguments passed to or from other methods. |
Value
No return value, used for plotting the estimated results with lines.
Intermediate function for causalTree
Description
Intermediate function for causalTree
Usage
htetree.anova(y, offset, wt)
Arguments
y |
outcome variable |
offset |
this can be used to specify an a priori known
component
to be included in the linear predictor during fitting. This should be
|
wt |
optional weights |
Value
No return value.
Caclulate variable importance
Description
Each primary split is credited with the value of splits$improve Each surrogate split gets split$adj times the primary split's value
Usage
importance(fit)
Arguments
fit |
a fitted |
Value
same as the importance
function in rpart.
Causal Effect Regression and Estimation Forests (Tree Ensembles)
Description
Build a random causal forest by fitting a user selected number of
causalTree
models to get an ensemble of rpart
objects.
Usage
init.causalForest(
formula,
data,
treatment,
weights = FALSE,
cost = FALSE,
num.trees,
ncov_sample
)
## S3 method for class 'causalForest'
predict(object, newdata, predict.all = FALSE, type = "vector", ...)
causalForest(
formula,
data,
treatment,
na.action = na.causalTree,
split.Rule = "CT",
double.Sample = TRUE,
split.Honest = TRUE,
split.Bucket = FALSE,
bucketNum = 5,
bucketMax = 100,
cv.option = "CT",
cv.Honest = TRUE,
minsize = 2L,
propensity,
control,
split.alpha = 0.5,
cv.alpha = 0.5,
sample.size.total = floor(nrow(data)/10),
sample.size.train.frac = 0.5,
mtry = ceiling(ncol(data)/3),
nodesize = 1,
num.trees = nrow(data),
cost = FALSE,
weights = FALSE,
ncolx,
ncov_sample
)
Arguments
formula |
a formula, with a response and features but no
interaction terms. If this a a data frome, that is taken as the model frame
(see |
data |
an optional data frame that includes the variables named in the formula. |
treatment |
a vector that indicates the treatment status of each observation. 1 represents treated and 0 represents control. Only binary treatment supported in this version. |
weights |
optional case weights. |
cost |
a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose. |
num.trees |
Number of trees to be built in the causal forest |
ncov_sample |
Number of covariates randomly sampled to build each tree in the forest |
object |
a |
newdata |
new data to predict |
predict.all |
If TRUE, return predicted individual effect for each observations. Otherwise, return the average effect. |
type |
the type of returned object |
... |
arguments to |
na.action |
the default action deletes all observations for which
|
split.Rule |
causalTree splitting options, one of |
double.Sample |
boolean option, |
split.Honest |
boolean option, |
split.Bucket |
boolean option, |
bucketNum |
number of observations in each bucket when set
|
bucketMax |
Option to choose maximum number of buckets to use in
splitting when set |
cv.option |
cross validation options, one of |
cv.Honest |
boolean option, |
minsize |
in order to split, each leaf must have at least
|
propensity |
propensity score used in |
control |
a list of options that control details of the
|
split.alpha |
scale parameter between 0 and 1, used in splitting
risk evaluation function for |
cv.alpha |
scale paramter between 0 and 1, used in cross validation
risk evaluation function for |
sample.size.total |
Sample size used to build each tree in the forest (sampled randomly with replacement). |
sample.size.train.frac |
Fraction of the sample size used for building each tree (training). For eexample, if the sample.size.total is 1000 and frac =0.5 then, 500 samples will be used to build the tree and the other 500 samples will be used the evaluate the tree. |
mtry |
Number of data features used to build a tree (This variable is not used presently). |
nodesize |
Minimum number of observations for treated and control cases in one leaf node |
ncolx |
Total number of covariates |
Details
CausalForest builds an ensemble of CausalTrees (See Athey and Imbens,
Recursive Partitioning for Heterogeneous Causal
Effects (2016)), by repeated random sampling of the data with replacement.
Further, each tree is built using a randomly sampled subset of all available
covariates. A causal forest object is a list of trees. To predict, call R's
predict function with new test data and the causalForest object (estimated
on the training data) obtained after calling the causalForest function.
During the prediction phase, the average value over all tree predictions
is returned as the final prediction by default.
To return the predictions of each tree in the forest for each test
observation, set the flag predict.all=TRUE
CausalTree differs from rpart
function from rpart package in
splitting rules and cross validation methods. Please check Athey
and Imbens, Recursive Partitioning for Heterogeneous Causal
Effects (2016) and Stefan Wager and Susan Athey, Estimation and
Inference of Heterogeneous Treatment Effects using Random Forests
for more details.
Value
An object of class rpart
. See rpart.object
.
References
Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.
Athey, S and G Imbens (2016) Recursive Partitioning for Heterogeneous Causal Effects. http://arxiv.org/abs/1504.01132
Wager,S and Athey, S (2015) Estimation and Inference of Heterogeneous Treatment Effects using Random Forests http://arxiv.org/abs/1510.04342
See Also
causalTree
honest.causalTree
,
rpart.control
, rpart.object
,
summary.rpart
, rpart.plot
Visualize Causal Tree and the Estimated Results
Description
An intermediate function used for plotting
Usage
makeplots(
negative,
opfit. = opfit,
trainset,
covariates,
outcomevariable,
data. = data,
hte_effect_setup,
varlabel,
maintitle,
legend.x = 0.8,
legend.y = 0.25,
...
)
Arguments
negative |
a logical value indicating whether we expect the treatment effect to be negative. The default is set as FALSE. |
opfit. |
tree structure generated from causal tree algorithm. |
trainset |
a data frame only containing the variables used in the model and missings values are listwise deleted. |
covariates |
a vector of column names of all covariates (linear terms andpropensity score). |
outcomevariable |
a character representing the column name of the outcome variable. |
data. |
a data frame containing the variables in the model. |
hte_effect_setup |
a empty list to store the adjusted treatment effect. |
varlabel |
a named vector containing variable labels. |
maintitle |
a character string indicating the main title displayed when plotting the tree and results. The default is set as "Heterogeneous Treatment Effect Estimation". |
legend.x , legend.y |
x and y coordinate to position the legend. The default is set as (0.08, 0.25). |
... |
further arguments passed to or from other methods. |
Value
A plot visualizing the tree and estimated treatment effect in each node.
NN Matching in Leaves
Description
This intermediate function is used to adjust the heterogeneous treatment effect estimated in each leaf with NN matching.
Usage
matchinleaves(
trainset = match_data,
covariates = covariates,
outcomevariable = outcomevariable,
hte_effect_setup = hte_effect_setup,
treatment_indicator,
con.num = 1,
...
)
Arguments
trainset |
a data frame only containing the variables used in the model and missings values are listwise deleted. |
covariates |
a vector of column names of all covariates (linear terms andpropensity score). |
outcomevariable |
a character representing the column name of the outcome variable. |
hte_effect_setup |
a empty list to store the adjusted treatment effect. |
treatment_indicator |
a character representing the column name of the treatment indicator. |
con.num |
a number indicating the number of units from control groups to be used in matching |
... |
further arguments passed to or from other methods. |
Value
A list for summarizing the results after matching.
Intermediate function for causalTree
Description
get model frame of causalTree, same as rpart
Usage
## S3 method for class 'causalTree'
model.frame(formula, ...)
Arguments
formula |
a formula, with a response but no interaction terms. If this is a data frame, it is taken as the model frame (see model.frame). |
... |
arguments to |
Value
a model frame for causalTree
.
Intermediate function for causalTree
Description
requirement when missing values are included in sample.
Usage
na.causalTree(x)
Arguments
x |
covariates |
Value
No return value, used for handling missing values when thy are included in sample.
Intermediate function for hte_plot_line
Description
Plots the different least squares models used to estimate heterogeneous treatment effects(HTE) at each node. At each node, this visualization aims to show how the estimated treatment effect differs when using ordinary least squares and weighted least squares methods. The weighted least squares method in this package uses inverse propensity scores as weights, in order to reduce bias due to confounding variables.
Usage
plotOutcomes(
treatment,
outcome,
propscores,
confInt = TRUE,
colbyWt = FALSE,
ylab = "",
xlab = "",
title = "",
gamma = 0,
lambda = 0,
...
)
Arguments
treatment |
a character representing the column name for the treatment variable in the causal setup |
outcome |
a character representing the column name of the outcome variable. |
propscores |
a character representing the column name of the propensity score. |
confInt |
a logical value indicating whether adding the 95
confidence interval. The default is set as |
colbyWt |
a logical value indicating whether the points are are colored according to inverse propensity scores. The default is set as FALSE. |
xlab , ylab , title |
Characters representing the name for x axis, y axis, and main title for each node. |
gamma , lambda |
numbers indicating the bias level used in sensitivity analysis |
... |
further arguments passed to or from other methods. |
Value
A summary table after adjusting the estimates with inverse probability weighting (ipw).
Visualize Causal Tree and Treatment Effects via Shiny
Description
Visualize Causal Tree and Treatment Effects via Shiny
Usage
runDynamic(
model,
data,
outcomevariable,
treatment_indicator,
propensity_score = ""
)
Arguments
model |
a tree model constructed by |
data |
a data frame containing the variables in the model. |
outcomevariable |
a character representing the column name of the outcome variable. |
treatment_indicator |
a character representing the column name for the treatment variable in the causal setup. |
propensity_score |
a character representing the column name of the propensity score. |
Value
a Shiny page.
Save Javascript Embedded in Shiny App
Description
Save Javascript Embedded in Shiny App
Usage
saveBCSS(filePath)
Arguments
filePath |
a character string representing the path name to save the files temporarily. |
Value
No return value. It is used to save necessary files temporarily to run Shiny App.
Save Necessary Files to Run Shiny App
Description
This function is to save files necessary to run Shiny app to visualize causal tree and the estimated heterogeneous treatment effects in an interactive way.
Usage
saveFiles(
model,
data,
outcomevariable,
treatment_indicator,
propensity_score = "",
filePath = ""
)
Arguments
model |
a tree model constructed by |
data |
a data frame containing the variables in the model. |
outcomevariable |
a character representing the column name of the outcome variable. |
treatment_indicator |
a character representing the column name for the treatment variable in the causal setup. |
propensity_score |
a character representing the column name of the propensity score. |
filePath |
a character string representing the path name to save the files temporarily. |
Value
No return value. It is used to save necessary files temporarily to run Shiny App.
Save CSS File Embedded in Shiny App
Description
Save CSS File Embedded in Shiny App
Usage
saveGCSS(filePath)
Arguments
filePath |
a character string representing the path name to save the files temporarily. |
Value
No return value. It is used to save necessary files temporarily to run Shiny App.
Save HTML Index Embedded in Shiny App
Description
Save HTML Index Embedded in Shiny App
Usage
saveInd(filePath)
Arguments
filePath |
a character string representing the path name to save the files temporarily. |
Value
No return value. It is used to save necessary files temporarily to run Shiny App.
Save Shiny Server Temporarily
Description
Save Shiny Server Temporarily
Usage
saveServ(filePath)
Arguments
filePath |
a character string representing the path name to save the files temporarily. |
Value
No return value. It is used to save necessary files temporarily to run Shiny App.
Save Shiny UI Temporarily
Description
Save Shiny UI Temporarily
Usage
saveUI(filePath)
Arguments
filePath |
a character string representing the path name to save the files temporarily. |
Value
No return value. It is used to save necessary files temporarily to run Shiny App.
A Simulated Dataset
Description
A simulated dataset inherited from causalTree
package
Usage
simulation.1
Format
## 'simulation.1' A data frame with 500 observations on the following 12 variables.
x1
a numeric vector
x2
a numeric vector
x3
a numeric vector
x4
a numeric vector
x5
a numeric vector
x6
a numeric vector
x7
a numeric vector
x8
a numeric vector
x9
a numeric vector
x10
a numeric vector
y
a numeric vector
treatment
a numeric vector