---
title: "recor: Rearrangement Correlation Coefficient"
author: "Xinbo Ai"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{recor: Rearrangement Correlation Coefficient}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
**Pearson's $r$ is undoubtedly the gold measure for linear dependence. Now, it might be the gold measure also for nonlinear monotone dependence, if adjusted.**
## Overview
*recor* is an R package that implements the **Rearrangement Correlation Coefficient ($r^\#$)**, an adjusted version of Pearson's correlation coefficient designed to accurately measure arbitrary monotone dependence relationships (both linear and nonlinear). Based on cutting-edge statistical research, this package addresses the underestimation problem of traditional correlation coefficients in nonlinear monotone scenarios. The rearrangement correlation is derived from a tighter inequality than the classical Cauchy-Schwarz inequality, providing sharper bounds and expanded capture range.
## Features
- 🎯 **Extended Capture Range**: From linear to arbitrary monotone dependence.
- 📊 **High Precision Measurement**: More accurate strength measurement than classical coefficients.
- 🔄 **Backward Compatibility**: Reverts to Pearson's $r$ in linear scenarios, and to Spearman's $\rho$ when calculated on ranks.
- 🚀 **Efficient Implementation**: Optimized computation with C++ backend.
- 📈 **Multiple Input Support**: Automatically handles various input types (vector, matrix, data.frame) consistently with ```stats::cor()```.
## Quick Start
### Basic Usage
```r
library(recor)
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
recor(x, y)
#> [1] 1
# Nonlinear monotone relationship
x <- c(1, 2, 3, 4, 5)
y <- c(1, 8, 27, 65, 125) # y = x^3
recor(x, y) # Higher value than Pearson's r
#> [1] 1
cor(x, y)
#> [1] 0.944458
# Matrix example
set.seed(123)
mat <- matrix(rnorm(100), ncol = 5)
colnames(mat) <- LETTERS[1:5]
recor(mat) # 5x5 correlation matrix
#> A B C D E
#> A 1.00000000 -0.09511994 -0.1283021 0.1243721 -0.2328551
#> B -0.09511994 1.00000000 0.1022576 0.2381745 0.3780232
#> C -0.12830211 0.10225762 1.0000000 -0.1523651 -0.3603780
#> D 0.12437205 0.23817455 -0.1523651 1.0000000 -0.1289523
#> E -0.23285513 0.37802315 -0.3603780 -0.1289523 1.0000000
# Two matrices
mat1 <- matrix(rnorm(50), ncol = 5)
mat2 <- matrix(rnorm(50), ncol = 5)
recor(mat1, mat2) # 5x5 cross-correlation matrix
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.0001379295 0.019273397 -0.14776094 -0.01203410 0.14712263
#> [2,] -0.0850363746 0.135125063 -0.10799623 0.35026884 0.20233183
#> [3,] -0.2825948208 -0.020383616 -0.31990514 -0.33267352 -0.48254414
#> [4,] 0.4067584970 -0.008022853 0.08223935 0.02728547 0.37567963
#> [5,] 0.5566966868 -0.059564374 0.03296252 0.22249817 -0.03009148
# data.frame
recor(iris[, 1:4])
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length 1.0000000 -0.1210250 0.9156110 0.8445397
#> Sepal.Width -0.1210250 1.0000000 -0.4628225 -0.3909946
#> Petal.Length 0.9156110 -0.4628225 1.0000000 0.9694665
#> Petal.Width 0.8445397 -0.3909946 0.9694665 1.0000000
```
## Theoretical Foundation
### Mathematical Definition
The rearrangement correlation coefficient is based on rearrangement inequality theorems that provide tighter bounds than the Cauchy-Schwarz inequality. Mathematically, for samples $x$ and $y$, it is defined as:
${r^\# }\left( {x,y} \right) = \frac{{{s_{x,y}}}}{{\left| {{s_{{x^ \uparrow },{y^ \updownarrow }}}} \right|}}$
Where:
- ${{s_{x,y}}}$ is the sample covariance between $x$ and $y$
- ${{x^ \uparrow }}$ denotes the increasing rearrangement of $x$
- ${{y^ \updownarrow }}$ denotes either ${y^ \uparrow }$ (increasing rearrangement of $y$) if ${{s_{x,y}}} \ge 0$, or ${y^ \downarrow }$ (decreasing rearrangement of $y$) if ${{s_{x,y}}} < 0$.
### R Implementation
${r^\# }$ can be computed in R as follows:
```r
recor <- function(x, y = NULL) {
recor_vector <- function(x, y) {
numerator <- cov(x, y)
if (numerator >= 0) {
denominator <- abs(cov(
sort(x, decreasing = FALSE),
sort(y, decreasing = FALSE)
))
} else {
denominator <- abs(cov(
sort(x, decreasing = FALSE),
sort(y, decreasing = TRUE)
))
}
numerator / denominator
}
if (is.matrix(x) || is.data.frame(x)) {
x <- as.matrix(x)
if (is.null(y)) {
p <- ncol(x)
result <- matrix(1, nrow = p, ncol = p)
rownames(result) <- colnames(result) <- colnames(x)
for (i in 1:p) {
for (j in 1:p) {
if (i != j) {
result[i, j] <- result[j, i] <- recor_vector(x[, i], x[, j])
}
}
}
return(result)
} else if (is.matrix(y) || is.data.frame(y)) {
y <- as.matrix(y)
if (nrow(x) != nrow(y)) {
stop("The number of rows of x and y must be the same")
}
p <- ncol(x)
q <- ncol(y)
result <- matrix(0, nrow = p, ncol = q)
rownames(result) <- colnames(x)
colnames(result) <- colnames(y)
for (i in 1:p) {
for (j in 1:q) {
result[i, j] <- recor_vector(x[, i], y[, j])
}
}
return(result)
}
}
if (is.null(y)) {
stop("y is needed when x is a vector")
}
if (length(x) != length(y)) {
stop("x and y must have the same length")
}
if (length(x) < 2) {
stop("x and y must have at least two elements")
}
recor_vector(x, y)
}
```
It is to be noted that the above R implementation is for illustrative purposes only. The actual *recor* package employs a highly optimized C++ backend to ensure efficient computation.
### Intuitive Example
Do we need a new monotone measure given that rank-based measures such as Spearman's $\rho$ can already measure monotone dependence? The answser is YES in sense that r# has a higher resolution and is more accurate. To take a simple example, let $x = (4, 3, 2, 1)$ and
- $y_1 = (5, 4, 3, 2)$
- $y_2 = (5, 4, 3, 3.25)$
- $y_3 = (5, 4, 3, 3.50)$
- $y_4 = (5, 4, 3, 3.75)$
- $y_5 = (5, 4, 3, 4.50)$
Obviously, $y_1$ and $x$ behaves exactly in the same way, with their values getting small and small step
by step. The behavior of $y_2, y_3, y_4$ and $y_5$ are becoming more and more different from that of $x$. However, the $\rho$ values are all the same for $y_2, y_3, y_4$. In contrast, the $r^\#$ values can reveal all these differences exactly.
```r
x <- c(4, 3, 2, 1)
y_list <- list(y1 = c(5, 4, 3, 2.00),
y2 = c(5, 4, 3, 3.25),
y3 = c(5, 4, 3, 3.50),
y4 = c(5, 4, 3, 3.75),
y5 = c(5, 4, 3, 4.50))
# recor
lapply(y_list, recor, x)
#> $y1
#> [1] 1
#>
#> $y2
#> [1] 0.9259259
#>
#> $y3
#> [1] 0.8461538
#>
#> $y4
#> [1] 0.76
#>
#> $y5
#> [1] 0.3846154
#cor
lapply(y_list, cor, x, method = "spearman")
#> $y1
#> [1] 1
#>
#> $y2
#> [1] 0.8
#>
#> $y3
#> [1] 0.8
#>
#> $y4
#> [1] 0.8
#>
#> $y5
#> [1] 0.4
```
## References
Ai, X. (2024). Adjust Pearson's r to Measure Arbitrary Monotone Dependence. In *Advances in Neural Information Processing Systems* (Vol. 37, pp. 37385-37407).
## License
This project is licensed under GPL-3.
## Support & Feedback
- 📧 **Email Support**: axb@bupt.edu.cn
- 🐛 **Issue Reporting**: [GitHub Issues](https://github.com/byaxb/recor/issues)
- 📚 **Documentation**: [Complete Documentation](https://github.com/byaxb/recor/wiki)
## Citation
If you use this package in your research, please cite our work as:
```bibtex
@inproceedings{NEURIPS2024_41c38a83,
author = {Ai, Xinbo},
booktitle = {Advances in Neural Information Processing Systems},
editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
pages = {37385--37407},
publisher = {Curran Associates, Inc.},
title = {Adjust Pearson\textquotesingle s r to Measure Arbitrary Monotone Dependence},
url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/41c38a83bd97ba28505b4def82676ba5-Paper-Conference.pdf},
volume = {37},
year = {2024}
}
```
---
*recor: Making Correlation Measurement More Accurate*