--- title: "Fuzzy Spectral Clustering with Variable-Weighted Adjacency Matrices" author: "Jesse S. Ghashti and John R. J. Thompson" date: "`r format(Sys.Date())`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Fuzzy Spectral Clustering with Variable-Weighted Adjacency Matrices} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center", fig.width = 6, fig.height = 5, message = FALSE, warning = FALSE ) library(mclust) library(fclust) library(ggplot2) library(patchwork) library(mvtnorm) library(stats) library(knitr) library(np) library(MASS) library(rmarkdown) ``` # Introduction The **FuzzySpec** package implements the **FVIBES** (Fuzzy Variable-Importance Based Eigenspace Separation) algorithm, a fuzzy spectral clustering procedure that incorporates variable-weighted distance metrics and adaptive adjacency matrix constructions. This package accompanies the paper _Variable-Weighted Adjacency Constructions for Fuzzy Spectral Clustering_ by Ghashti, Hare, and Thompson (2025). The key features of this package include: - a variable-weighted distance metric that automatically determines variable importance using nonparametric kernel density estimation, - an adaptive adjacency construction framework with multiple options for building similarity graphs including locally-adaptive scaling (Zelnik-Manor and Perona, 2004), - clustering outputs that return fuzzy membership matrices rather than just hard cluster assignments, and - a synthetic dataset generation containing built-in generators to benchmark fuzzy clustering algorithms. ## Package Overview There are three primary functions needed to conduct FVIBES clustering: 1. Build an adjacency matrix from the data using `make.adjacency()` 2. Perform fuzzy spectral clustering using `fuzzy.spectral.clustering()` 3. Optionally, examine results results with 2D visualization function `plot.fuzzy()` or compare to true class labels using `clustering.accuracy()`. # Installation Install the latest release version of **FuzzySpec** from [GitHub](https://github.com/ghashti-j/FuzzySpec) or with the following: ```{r, eval = FALSE} library(devtools) install_github("ghashti-j/FuzzySpec") library(FuzzySpec) ``` # Sample Usage The basic steps using built-in function are provided below. 1. First we generate a synthetic dataset `spirals`, see the help file for `gen.fuzzy()` for more options and information. ```{r, fig.align='center'} set.seed(1) data <- FuzzySpec::gen.fuzzy(n = 300, dataset = "spirals", noise = 0.15) # data generation FuzzySpec::plot.fuzzy(data, plotFuzzy = TRUE, colorCluster = TRUE) # plot data generating process ``` 2. Build a variable-weighted locally-adaptive adjacency matrix, corresponding to the adjacency $\mathbf{W}^{(\text{vwla-id})}$ in Ghashti et al. (2025): ```{r, message = FALSE} W <- FuzzySpec::make.adjacency( data = data$X, method = "vw", # variable-weighted distances isLocWeighted = TRUE, # Locally-adaptive scaling scale = FALSE # scaling not required for kernel methods ) ``` 3. Perform fuzzy spectral clustering given the adjacency matrix $\mathbf{W}$, number of clusters `k = 3` and the commonly chosen fuzzy parameter `m = 1.5`. We display the first 5 rows of the membership matrix $\mathbf{U}$: ```{r} res <- FuzzySpec::fuzzy.spectral.clustering( W = W, k = 3, m = 1.5, method = "CM" ) res$u[1:5,] ``` 4. We can compare the hard clustering results to the true class labels: ```{r} acc <- FuzzySpec::clustering.accuracy(data$y, res$cluster) cat("Clustering accuracy:", round(acc, 3), "\n") ``` 5. We can compare the membership matrix $\mathbf{U}$ determined by FVIBES to the true probabilistic cluster memberships with function `fari`, which computes fuzzy generalizations of the Adjusted Rand Index (FARI) based on Frobenius inner products of membership matrices (Andrews, Brown and Hvingelby, 2022). ```{r} far <- FuzzySpec::fari(data$U, res$u) cat("FARI:", round(far, 3), "\n") ``` 6. Finally, we can visualize the clustering results with observations, where observations are assigned by hard cluster labels and sized by the membership matrix $\mathbf{U}$: ```{r, fig.align='center'} resDF <- list( X = data$X, U = res$u, y = factor(res$cluster), k = 3 ) FuzzySpec::plot.fuzzy(resDF, plotFuzzy = TRUE, colorCluster = TRUE) ``` ## Adjacency Construction See respective help files for each function when needed; here we provide a basic overview of function arguments for `make.adjacency()`. This function allows for flexible adjacency matrix constructions based on Ghashti et al. (2025). The parameters are as follows: * `method`: distance metric + `"eu"`: squared Euclidean distance + `"vw"`: variable-weighted distance using kernel density bandwidth estimation * `isLocWeighted`: scaling approach + `TRUE`: locally-adaptive scaling (Zelnik-Manor & Perona, 2004) + `FALSE`: global scaling with parameter `sig` * `isModWeighted`: apply similarity weightings + `ModMethod = "snn"`: shared nearest neighbors (Jarvis & Patrick, 1973) + `ModMethod = "sim"`: similarity-based weighting + `ModMethod = "both"`: combined SNN and SIM * `isSparse`: returns a sparse matrix when using weightings **References** * Andrews, J.L., Browne, R. and C.D. Hvingelby (2022). On Assessments of Agreement Between Fuzzy Partitions. _Journal of Classification, 39_, 326–342. * J.C. Bezdek (1981). _Pattern Recognition with Fuzzy Objective Function Algorithms_. Plenum Press, New York. * K. R. Coombes (2025). _Thresher: Threshing and Reaping for Principal Components_. R package version 1.1.5. * Ferraro, M.B., Giordani, P., and A. Serafini (2019). fclust: An R Package for Fuzzy Clustering. _The R Journal, 11_. * Jarvis, R. A., and A. E. Patrick (1973). Clustering using a similarity measure based on shared near neighbors. _IEEE Transactions on Computers, 22_(11), 1025-1034. * Ghashti, J. S., Hare, W., and J. R. J. Thompson (2025). Variable-weighted adjacency constructions for fuzzy spectral clustering. Submitted. * Hayfield, T., and J. S. Racine (2008). Nonparametric Econometrics: The np Package. _Journal of Statistical Software 27_(5). * McLachlan, G. and T. Krishnan (2008). _The EM algorithm and extensions_, Second Edition. John Wiley & Sons. * Ng, A., Jordan, M., and Y. Weiss (2001). On spectral clustering: Analysis and an algorithm. _Advances in Neural Information Processing Systems, 14_. * Scrucca, L., Fraley, C., Murphy, T.B., and A. E. Raftery (2023). _Model-Based Clustering, Classification, and Density Estimation Using mclust in R_. Chapman & Hall. * H. Wickham (2016). _ggplot2: Elegant Graphics for Data Analysis_. Springer--Verlag New York. * Zelnik-Manor, L., and P. Perona (2004). Self-tuning spectral clustering. _Advances in Neural Information Processing Systems, 17_. * Zhu, Q., Feng, J., and J. Huang (2016). Natural neighbor: A self-adaptive neighborhood method without parameter K. _Pattern Recognition Letters, 80_, 30-36.