| Type: | Package |
| Title: | Factor-Augmented Clustering Tree |
| Date: | 2025-11-10 |
| Version: | 0.1.0 |
| Description: | Implements the Factor-Augmented Clustering Tree (FACT) algorithm for clustering time series data. The method constructs a classification tree where splits are determined by covariates, and the splitting criterion is based on a group factor model representation of the time series within each node. Both threshold-based and permutation-based tests are supported for splitting decisions, with an option for parallel computation. For methodological details, see Hu, Li, Luo, and Wang (2025, in preparation), Factor-Augmented Clustering Tree for Time Series. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| Depends: | R (≥ 3.5.0) |
| RoxygenNote: | 7.3.3 |
| Imports: | irlba, foreach, doParallel, parallel, doRNG, mvtnorm |
| Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
| NeedsCompilation: | no |
| Packaged: | 2025-12-05 03:53:21 UTC; hujiaqi |
| Author: | Jiaqi Hu [cre, aut], Ting Li [ctb] (Assisted with methodology), Zidan Luo [ctb] (Assisted with methodology), Xueqin Wang [ctb] (Assisted with methodology) |
| Maintainer: | Jiaqi Hu <hujiaqi@mail.ustc.edu.cn> |
| Repository: | CRAN |
| Date/Publication: | 2025-12-10 21:30:12 UTC |
Correlation-Based Clustering Tree
Description
Builds a binary tree for clustering time series data based on covariates. The splitting criterion minimizes the average absolute Pearson correlation between time series across child nodes.
Usage
COR(X, Y, control = list())
Arguments
X |
A numeric matrix of covariates with dimension |
Y |
A numeric matrix of time series data with dimension |
control |
A list of control parameters for tree construction:
|
Details
The algorithm recursively partitions the data by finding splits that minimize the average absolute correlation between time series in different child nodes. Statistical significance of each split is assessed via a permutation test.
At each node, the optimal split is found by exhaustively searching over all covariates and candidate split points. The permutation test shuffles the time series labels to generate a null distribution for the test statistic.
Value
An object of class "FACT" containing:
frameA data frame describing the tree structure, with one row per node containing split variable, split value, test statistic, and p-value. A smaller test statistic suggests more heterogeneity between child nodes.
membershipAn integer vector of length
Nindicating the terminal node assignment for each observation.controlThe control parameters used.
termsMetadata including covariate names and data dimensions.
See Also
FACT for factor model-based clustering,
gendata for generating synthetic data,
print.FACT and plot.FACT for visualization.
Examples
# Generate synthetic data
data <- gendata(seed = 42, T = 100, N = c(50, 50, 50, 50))
# Build correlation-based tree
result <- COR(data$X, data$Y, control = list(R = 99, alpha = 0.05))
# Examine results
print(result)
plot(result)
table(result$membership, data$group)
Factor-Augmented Clustering Tree
Description
Builds a binary tree for clustering time series data based on covariates, using a group factor model framework. The splitting criterion evaluates whether child nodes exhibit distinct factor structures.
Usage
FACT(
X,
Y,
r_a = 8,
r_b = 4,
method = c("threshold", "permutation"),
control = list()
)
Arguments
X |
A numeric matrix of covariates with dimension |
Y |
A numeric matrix of time series data with dimension |
r_a |
A positive integer specifying the number of singular vectors to extract from each child node for constructing the projection matrices, default is 8. |
r_b |
A positive integer specifying the number of leading singular values
to sum for the split statistic. Must satisfy |
method |
Character string specifying the splitting decision rule:
|
control |
A list of control parameters for tree construction:
|
Details
The FACT algorithm clusters time series by recursively partitioning them based on their underlying factor structures. At each node, the method:
Searches for the optimal split across all covariates and candidate points.
Computes a test statistic based on the overlap of factor spaces between the two child nodes.
Decides whether to split using either a threshold rule or permutation test.
Value
An object of class "FACT" containing:
frameA data frame describing the tree structure, with one row per node. Includes split variable, split value, test statistic, and p-value (if applicable). A smaller test statistic indicates stronger evidence of heterogeneous factor structures between child nodes.
membershipAn integer vector of length
Nindicating the terminal node assignment for each observation.controlThe control parameters used.
termsMetadata including covariate names, data dimensions, and the values of
r_aandr_b.methodThe splitting method used.
References
Hu, J., Li, T., Luo, Z., & Wang, X. Factor-Augmented Clustering Tree for Time Series.
See Also
COR for correlation-based clustering,
gendata for generating synthetic data,
print.FACT and plot.FACT for visualization.
Examples
data <- gendata(seed = 123, T = 200, N = c(50, 50, 50, 50))
tree1 <- FACT(data$X, data$Y, r_a = 8, r_b = 4, method = "threshold")
print(tree1)
Calculate Split Statistic
Description
Computes the split statistic for a candidate partition based on the overlap of factor spaces between two child nodes.
Usage
calculate_split_statistic(Y_L, Y_R, r_a, r_b)
Arguments
Y_L |
Matrix of time series in the proposed left node. |
Y_R |
Matrix of time series in the proposed right node. |
r_a |
Number of singular vectors to compute. |
r_b |
Number of eigenvalues to sum for the statistic. |
Value
The calculated split statistic. Returns 'Inf' if a node is too small.
Find the Best Split for a Node
Description
Internal function to iterate through all possible split points in all covariates to find the split that minimizes the split statistic.
Usage
find_best_split(X_node, Y_node, indices, r_a, r_b, control)
Arguments
X_node |
Covariate matrix for the current node. |
Y_node |
Time series matrix for the current node. |
indices |
Original indices of the observations in the current node. |
r_a |
Parameter passed to 'calculate_split_statistic'. |
r_b |
Parameter passed to 'calculate_split_statistic'. |
control |
The parameters used in split. |
Value
A list containing the best split found, including the statistic, split variable, split value, and indices for the left and right children.
Find Best Split Using Correlation Criterion
Description
Searches for the optimal split point that minimizes the average absolute correlation between time series across child nodes.
Usage
find_best_split_COR(X_node, Y_node, indices, control)
Arguments
X_node |
Covariate matrix for the current node. |
Y_node |
Time series matrix for the current node. |
indices |
Original indices of the observations in the current node. |
control |
The parameters used in split. |
Value
A list containing the best split found, including the statistic, split variable, split value, and indices for the left and right children.
Generate Synthetic Group Factor Model Data
Description
Generates synthetic time series data with a multi-group factor structure,
along with associated covariates. Useful for Monte Carlo simulation.
the FACT and COR algorithms.
Usage
gendata(
seed = 1,
T = 100,
N = c(100, 100, 100, 100),
r0 = 2,
r = c(2, 2, 2, 2),
M = 4,
sigma = 1,
p = 10,
mu = 3,
type_F = "Independent",
type_X = "Uniform",
type_noise = "Gaussian"
)
Arguments
seed |
Integer. Random seed for reproducibility. Default: |
T |
Integer. Number of time periods (rows in |
N |
Integer vector of length |
r0 |
Integer. Number of global factors shared across all groups.
Default: |
r |
Integer vector of length |
M |
Integer. Number of groups. Default: |
sigma |
Numeric. Standard deviation of the idiosyncratic noise.
Default: |
p |
Integer. Number of covariates (columns in |
mu |
Numeric. Controls separation between group covariate distributions
when |
type_F |
Character. Correlation structure for local factors:
|
type_X |
Character. Distribution for generating covariates:
|
type_noise |
Character. Distribution for idiosyncratic errors:
|
Details
The data generating process follows a group factor model:
Y_m = G \Lambda_m' + F_m \Gamma_m' + E_m, \quad m = 1, \ldots, M
where:
-
G:T \times r_0matrix of global factors (shared across groups) -
\Lambda_m:N_m \times r_0global factor loadings for groupm -
F_m:T \times r_mmatrix of local factors for groupm -
\Gamma_m:N_m \times r_mlocal factor loadings for groupm -
E_m:T \times N_midiosyncratic error matrix
Both global and local factors follow AR(1) processes with coefficient 0.5. Factor loadings are drawn from standard normal distributions.
Value
A list containing:
YA
T \times Nnumeric matrix of time series, whereN = \sum N_m.XA
N \times pnumeric matrix of covariates.GThe
T \times r_0matrix of true global factors.r0Number of global factors.
rVector of local factor counts per group.
groupInteger vector of length
Nindicating true group membership (values 1 throughM).
Note
The default covariate generation (type_X = "Uniform" or "Gaussian")
assumes M = 4 groups with a specific hierarchical structure:
groups 1-2 vs 3-4 are separated by the first covariate, and within each pair,
groups are separated by additional covariates.
See Also
FACT for building factor-augmented clustering trees,
COR for correlation-based clustering.
Examples
data <- gendata(seed = 123, T = 200, N = c(100, 50, 50, 200), r0 = 1, r = c(2, 2, 2, 3), M = 4)
Y <- data$Y
X <- data$X
Plot a FACT Tree Object
Description
Generates a visual plot of the FACT tree structure.
Usage
## S3 method for class 'FACT'
plot(x, ...)
Arguments
x |
An object of class 'FACT', typically the result of a call to 'FACT()'. |
... |
Additional arguments (not used). |
Value
No return value, called for side effects (plotting).
Print a FACT Tree Object
Description
Provides a concise textual representation of the FACT tree structure.
Usage
## S3 method for class 'FACT'
print(x, ...)
Arguments
x |
An object of class 'FACT', typically the result of a call to 'FACT()'. |
... |
Additional arguments (not used). |
Value
Invisibly returns the original 'FACT' object.