Type: | Package |
Title: | Lab for Matrix Completion and Imputation of Discrete Rating Data |
Version: | 0.1.0 |
Date: | 2025-07-25 |
Description: | Collection of methods for rating matrix completion, which is a statistical framework for recommender systems. Another relevant application is the imputation of rating-scale survey data in the social and behavioral sciences. Note that matrix completion and imputation are synonymous terms used in different streams of the literature. The main functionality implements robust matrix completion for discrete rating-scale data with a low-rank constraint on a latent continuous matrix (Archimbaud, Alfons, and Wilms (2025) <doi:10.48550/arXiv.2412.20802>). In addition, the package provides wrapper functions for 'softImpute' (Mazumder, Hastie, and Tibshirani, 2010, https://www.jmlr.org/papers/v11/mazumder10a.html; Hastie, Mazumder, Lee, Zadeh, 2015, https://www.jmlr.org/papers/v16/hastie15a.html) for easy tuning of the regularization parameter, as well as benchmark methods such as median imputation and mode imputation. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
Depends: | R (≥ 3.5.0) |
Imports: | Rcpp, softImpute |
LinkingTo: | Rcpp, RcppArmadillo |
URL: | https://github.com/aalfons/RMCLab |
BugReports: | https://github.com/aalfons/RMCLab/issues |
Author: | Andreas Alfons |
Maintainer: | Andreas Alfons <alfons@ese.eur.nl> |
RoxygenNote: | 7.3.2 |
LazyData: | true |
NeedsCompilation: | yes |
Packaged: | 2025-07-25 18:46:20 UTC; alfons |
Repository: | CRAN |
Date/Publication: | 2025-07-28 19:00:09 UTC |
Lab for Matrix Completion and Imputation of Discrete Rating Data
Description
Collection of methods for rating matrix completion, which is a statistical framework for recommender systems. Another relevant application is the imputation of rating-scale survey data in the social and behavioral sciences. Note that matrix completion and imputation are synonymous terms used in different streams of the literature. The main functionality implements robust matrix completion for discrete rating-scale data with a low-rank constraint on a latent continuous matrix (Archimbaud, Alfons, and Wilms (2025) <doi:10.48550/arXiv.2412.20802>). In addition, the package provides wrapper functions for 'softImpute' (Mazumder, Hastie, and Tibshirani, 2010, <https://www.jmlr.org/papers/v11/mazumder10a.html>; Hastie, Mazumder, Lee, Zadeh, 2015, <https://www.jmlr.org/papers/v16/hastie15a.html>) for easy tuning of the regularization parameter, as well as benchmark methods such as median imputation and mode imputation.
Details
The DESCRIPTION file:
Package: | RMCLab |
Type: | Package |
Title: | Lab for Matrix Completion and Imputation of Discrete Rating Data |
Version: | 0.1.0 |
Date: | 2025-07-25 |
Description: | Collection of methods for rating matrix completion, which is a statistical framework for recommender systems. Another relevant application is the imputation of rating-scale survey data in the social and behavioral sciences. Note that matrix completion and imputation are synonymous terms used in different streams of the literature. The main functionality implements robust matrix completion for discrete rating-scale data with a low-rank constraint on a latent continuous matrix (Archimbaud, Alfons, and Wilms (2025) <doi:10.48550/arXiv.2412.20802>). In addition, the package provides wrapper functions for 'softImpute' (Mazumder, Hastie, and Tibshirani, 2010, <https://www.jmlr.org/papers/v11/mazumder10a.html>; Hastie, Mazumder, Lee, Zadeh, 2015, <https://www.jmlr.org/papers/v16/hastie15a.html>) for easy tuning of the regularization parameter, as well as benchmark methods such as median imputation and mode imputation. |
License: | GPL (>= 3) |
Encoding: | UTF-8 |
Depends: | R (>= 3.5.0) |
Imports: | Rcpp, softImpute |
LinkingTo: | Rcpp, RcppArmadillo |
URL: | https://github.com/aalfons/RMCLab |
BugReports: | https://github.com/aalfons/RMCLab/issues |
Authors@R: | c(person("Andreas", "Alfons", email = "alfons@ese.eur.nl", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-2513-3788")), person("Aurore", "Archimbaud", role = "aut", comment = c(ORCID = "0000-0002-6511-9091"))) |
Author: | Andreas Alfons [aut, cre] (<https://orcid.org/0000-0002-2513-3788>), Aurore Archimbaud [aut] (<https://orcid.org/0000-0002-6511-9091>) |
Maintainer: | Andreas Alfons <alfons@ese.eur.nl> |
RoxygenNote: | 7.3.2 |
LazyData: | true |
Index of help topics:
MovieLensToy Toy example derived from the MovieLens 100K dataset RMCLab-package Lab for Matrix Completion and Imputation of Discrete Rating Data create_splits Create splits of observed data cells for hyperparameter tuning get_completed Extract the completed (imputed) data matrix get_lambda Extract the optimal value of the regularization parameter get_nb_iter Extract the number of iterations lambda_grid Construct grid of values for the regularization parameter median_impute Median imputation mode_impute Mode imputation rdmc Robust discrete matrix completion rdmc_tune Robust discrete matrix completion with hyperparameter tuning soft_impute Matrix completion via nuclear-norm regularization soft_impute_tune Matrix completion via nuclear-norm regularization with hyperparameter tuning validation_control Control objects for hyperparameter validation
Author(s)
Andreas Alfons [aut, cre] (<https://orcid.org/0000-0002-2513-3788>), Aurore Archimbaud [aut] (<https://orcid.org/0000-0002-6511-9091>)
Maintainer: Andreas Alfons <alfons@ese.eur.nl>
References
Archimbaud, A., Alfons, A., and Wilms, I. (2025) Robust Matrix Completion for Discrete Rating-Scale Data. arXiv:2412.20802. doi:10.48550/arXiv.2412.20802.
Hastie, T., Mazumder, R., Lee, J. D. and Zadeh, R. (2015) Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares. Journal of Machine Learning Research, 16(104), 3367–3402.
Mazumder, R., Hastie, T. and Tibshirani, R. (2010) Spectral Regularization Algorithms for Learning Large Incomplete Matrices. Journal of Machine Learning Research, 11(80), 2287–2322.
See Also
Useful links:
Toy example derived from the MovieLens 100K dataset
Description
This dataset is a toy example derived from the MovieLens 100K dataset. The original data were collected via the MovieLens website (movielens.umn.edu) over a seven-month period, from September 19, 1997, to April 22, 1998.
Usage
data("MovieLensToy")
Format
A matrix with 149 rows and 33 variables. Rows represent users and
columns respresent movies. Each cell contains a rating from 1 to 5,
or NA
if the movie was not rated by the user.
The row names correspond to the user ID and the column names to the movie ID in the original MovieLens 100K dataset.
Details
The original MovieLens 100K dataset contains 100,000 ratings on a scale from 1 to 5 provided by 943 users for 1,682 movies.
This toy version has been curated to be used in instantaneously computable examples of the package documentation. It includes only users who provided at least 200 ratings and movies that were rated at least 300 times, ensuring a denser and more informative subset for testing and demonstration purposes.
Source
GroupLens Research, https://grouplens.org/datasets/movielens/100k/
References
F. Maxwell Harper and Joseph A. Konstan (2015) The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4), Article 19. doi:10.1145/2827872.
Examples
data("MovieLensToy")
dim(MovieLensToy)
Create splits of observed data cells for hyperparameter tuning
Description
Split the observed cells of a data matrix into training and validation sets
for hyperparameter tuning. Methods are available for repeated holdout
validation and K
-fold cross-validation.
Usage
create_splits(indices, control)
holdout(indices, pct = 0.1, R = 10L)
cv_folds(indices, K = 5L)
Arguments
indices |
an integer vector giving the indices of observed cells in a data matrix. |
control |
a control object inheriting from class
|
pct |
numeric in the interval (0, 1); the percentage of observed cells in the data matrix to be randomly selected into the validation set (defaults to 0.1). |
R |
an integer giving the number of random splits into training and validation sets (defaults to 10). |
K |
an integer giving the number of cross-validation folds (defaults to 5). |
Details
Functions holdout()
and cv_folds()
are wrapper
functions that first call holdout_control()
and
cv_folds_control()
, respectively, before calling
create_splits()
.
Value
A list of index vectors giving the validation sets of the respective replication or cross-validation fold.
Author(s)
Andreas Alfons
See Also
holdout_control()
, cv_folds_control()
,
rdmc_tune()
, soft_impute_tune()
Examples
# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# set up validation sets so that methods use same data splits
set.seed(20250723)
observed <- which(!is.na(MovieLensToy))
holdout_splits <- holdout(observed, R = 5)
# robust discrete matrix completion with hyperparameter tuning
fit_RDMC <- rdmc_tune(
MovieLensToy,
lambda = fraction_grid(nb_lambda = 6),
splits = holdout_splits
)
# Soft-Impute with discretization step and hyperparameter tuning
fit_SI <- soft_impute_tune(
MovieLensToy,
lambda = fraction_grid(nb_lambda = 6, reverse = TRUE),
splits = holdout_splits
)
# extract optimal values of regularization parameter
get_lambda(fit_RDMC)
get_lambda(fit_SI)
Extract the completed (imputed) data matrix
Description
Extract the completed (i.e., imputed) data matrix from an object returned by a matrix completion algorithm.
Usage
get_completed(object, ...)
## S3 method for class 'rdmc_tuned'
get_completed(object, ...)
## S3 method for class 'rdmc'
get_completed(object, which, ...)
## S3 method for class 'soft_impute_tuned'
get_completed(object, discretized = NULL, ...)
## S3 method for class 'soft_impute'
get_completed(object, which, discretized = NULL, ...)
## S3 method for class 'median_impute'
get_completed(object, discretized = NULL, ...)
## S3 method for class 'mode_impute'
get_completed(object, ...)
get_imputed(object, ...)
Arguments
object |
an object returned by a matrix completion algorithm. |
... |
additional arguments to be passed down to methods. |
which |
an integer specifying the index of the regularization parameter for which to extract the completed data matrix. |
discretized |
a logical indicating if the completed data matrix with
or without the discretization step should be extracted. The default is
|
Value
The completed (i.e., imputed) data matrix.
Note
Matrix completion and imputation are synonymous terms used in
different streams of the literature, hence get_imputed()
is an
alias for get_completed()
with the same functionality.
Author(s)
Andreas Alfons and Aurore Archimbaud
See Also
rdmc_tune()
, soft_impute_tune()
,
median_impute()
, mode_impute()
Examples
# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# robust discrete matrix completion with hyperparameter tuning
set.seed(20250723)
fit <- rdmc_tune(MovieLensToy,
lambda = fraction_grid(nb_lambda = 6),
splits = holdout_control(R = 5))
# extract completed matrix with optimal regularization parameter
X_hat <- get_completed(fit)
head(X_hat)
# for more examples, see the help files of other functions for
# matrix completion and imputation methods
Extract the optimal value of the regularization parameter
Description
Extract the optimal value of the regularization parameter from an object returned by a matrix completion algorithm.
Usage
get_lambda(object, ...)
## S3 method for class 'rdmc_tuned'
get_lambda(object, ...)
## S3 method for class 'rdmc'
get_lambda(object, relative = TRUE, ...)
## S3 method for class 'soft_impute_tuned'
get_lambda(object, ...)
## S3 method for class 'soft_impute'
get_lambda(object, relative = TRUE, ...)
Arguments
object |
an object returned by a matrix completion algorithm with a regularization parameter. |
... |
additional arguments to be passed down to methods. |
relative |
logical; in case the values of the regularization parameter were given relative to a certain reference value computed from the data at hand, this allows to return the optimal value before or after multiplication with that reference value. |
Value
The optimal value of the regularization parameter.
Author(s)
Andreas Alfons
See Also
rdmc_tune()
, soft_impute_tune()
Examples
# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# robust discrete matrix completion with hyperparameter tuning
set.seed(20250723)
fit <- rdmc_tune(MovieLensToy,
lambda = fraction_grid(nb_lambda = 6),
splits = holdout_control(R = 5))
# extract optimal value of regularization parameter
get_lambda(fit)
# for more examples, see the help files of other functions for
# matrix completion and imputation methods
Extract the number of iterations
Description
Extract the number of iterations from an object returned by a matrix completion algorithm.
Usage
get_nb_iter(object, ...)
## S3 method for class 'rdmc_tuned'
get_nb_iter(object, ...)
## S3 method for class 'rdmc'
get_nb_iter(object, ...)
Arguments
object |
an object returned by an iterative matrix completion algorithm. |
... |
currently ignored. |
Value
The number of iterations performed in the iterative algorithm.
Author(s)
Andreas Alfons
See Also
Examples
# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# robust discrete matrix completion with hyperparameter tuning
set.seed(20250723)
fit <- rdmc_tune(MovieLensToy,
lambda = fraction_grid(nb_lambda = 6),
splits = holdout_control(R = 5))
# extract number of iterations with optimal regularization parameter
get_nb_iter(fit)
Construct grid of values for the regularization parameter
Description
Construct a grid of values for the regularization parameter in
rdmc()
or soft_impute()
.
Usage
fraction_grid(
min = 0.01,
max = 1,
nb_lambda = 10L,
log = TRUE,
reverse = FALSE
)
mult_grid(min = 0.05, factor = 1.5, nb_lambda = 10L)
Arguments
min |
numeric; the smallest value of the regularization parameter.
For |
max |
numeric; the largest value of the regularization parameter. It
must be in the interval ( |
nb_lambda |
a positive integer giving the number of values for the regularization parameter to be generated. |
log |
a logical indicating whether the grid of values should be on a
logarithmic scale (defaults to |
reverse |
a logical indicating whether the grid of values should be
in ascending order ( |
factor |
numeric; multiplication factor larger than 1 to be used to
construct the values of the regularization parameter. That is, the second
value is obtained by multiplying |
Details
Function fraction_grid()
generates a grid of values in the interval
(0, 1], either on a logarithmic or linear scale, which rdmc()
and soft_impute()
can relate to a certain reference value
computed from the data at hand.
Function mult_grid()
generates a multiplicative grid in which the
each value is obtained by multiplying the previous value with a specified
factor.
Value
A numeric vector of values for the regularization parameter.
See Also
rdmc()
, rdmc_tune()
,
soft_impute()
, soft_impute_tune()
Examples
fraction_grid()
fraction_grid(log = FALSE)
mult_grid(factor = 2)
Median imputation
Description
Perform median imputation. In case of discrete rating-scale data, a discretization step can be carried out afterwards to make sure that the imputed values are mapped to the rating scale of the observed values (as the median of a given column may lie in between two answer categories in case of an even number of observed values). This is done by randomly sampling from the largest answer category smaller than the median and the smallest answer category larger than the median (for each missing cell).
Usage
median_impute(X, discretize = TRUE, values = NULL)
Arguments
X |
a matrix or data frame with missing values. |
discretize |
a logical indicating whether to include a discretization
step after median imputation (defaults to |
values |
an optional numeric vector giving the possible values of
discrete ratings. This is ignored if |
Value
An object of class "median_impute"
with the following components:
medians |
a numeric vector containing the median of the observed values for each variable. |
X |
a numeric matrix containing the completed (i.e., imputed) data matrix. |
X_discretized |
a numeric matrix containing the completed (i.e.,
imputed) data matrix after the discretization step. This is only returned
if requested via |
The class structure is still experimental and may change in the future.
Use the accessor function get_completed()
to extract the
completed (i.e., imputed) data matrix.
Author(s)
Andreas Alfons
See Also
Examples
# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# median imputation with discretization step
fit <- median_impute(MovieLensToy, values = 1:5)
# extract discretized completed matrix
X_hat <- get_completed(fit)
head(X_hat)
Mode imputation
Description
Perform mode imputation for discrete data. In case of multiple modes in a given column, one of them is selected at random for each missing cell.
Usage
mode_impute(X, values = NULL)
Arguments
X |
a matrix or data frame of discrete data with missing values. |
values |
an optional numeric vector giving the possible values.
Currently, the possible values are assumed to be the same for all columns.
If |
Value
An object of class "mode_impute"
with the following components:
modes |
a list containing the mode(s) of the observed values for each variable. |
X |
a numeric matrix containing the completed (i.e., imputed) data matrix. |
The class structure is still experimental and may change in the future.
Use the accessor function get_completed()
to extract the
completed (i.e., imputed) data matrix.
Note
The mode is computed as the most frequent value, hence this function is only suitable for discrete data. It does not estimate the mode of a continuous density.
Author(s)
Andreas Alfons
See Also
Examples
# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# mode imputation
fit <- mode_impute(MovieLensToy, values = 1:5)
# extract completed matrix
X_hat <- get_completed(fit)
head(X_hat)
Robust discrete matrix completion
Description
Perform robust discrete matrix completion with a low-rank constraint on a latent continuous matrix, implemented via an ADMM algorithm.
Usage
rdmc(
X,
values = NULL,
lambda = fraction_grid(),
relative = TRUE,
loss = c("pseudo_huber", "absolute", "truncated"),
loss_const = NULL,
type = "svd",
svd_tol = 1e-04,
rank_max = NULL,
mu = 0.1,
delta = 1.05,
conv_tol = 1e-04,
max_iter = 100L,
L = NULL,
Theta = NULL
)
Arguments
X |
a matrix or data frame of discrete ratings with missing values. |
values |
an optional numeric vector giving the possible values of the
ratings. Currently, these are assumed to be the same for all columns. If
|
lambda |
a numeric vector giving values of the regularization
parameter. See |
relative |
a logical indicating whether the values of the
regularization parameter should be considered relative to a certain
reference value computed from the data at hand. If |
loss |
a character string specifying the robust loss function for the
loss part of the objective function. Possible values are
|
loss_const |
tuning constant for the loss function. For the
pseudo-Huber loss, the default value is the average step size between the
rating categories in |
type |
a character string specifying the type of algorithm for the
low-rank latent continuous matrix. Currently only |
svd_tol |
numeric tolerance for the soft-thresholded SVD step. Only
singular values larger than |
rank_max |
a positive integer giving a rank constraint in the soft-thresholded SVD step for the latent continuous matrix. The default is to use the minimum of the number of rows and columns. |
mu |
numeric; penalty parameter for the discrepancy between the discrete rating matrix and the latent low-rank continuous matrix. It is not recommended to change the default value of 0.1. |
delta |
numeric; update factor for penalty parameter |
conv_tol |
numeric; convergence tolerance for the relative change in the objective function. |
max_iter |
a positive integer specifying the maximum number of iterations. In practice, large gains can often be had in the first few iterations, with subsequent iterations yielding relatively small gains until convergence. Hence the default is to perform at most 10 iterations. |
L , Theta |
starting values for the algorithm. These are not expected
to be set by the user. Instead, it is recommended to call this function
with a grid of values for the regularization parameter |
Details
For the loss part of the objective function, the pseudo-Huber loss
(loss = "pseudo_huber"
) is given by
\rho(x) = \code{loss\_const}^2 (\sqrt{1 + (x/\code{loss\_const})^2} - 1).
The absolute loss
(loss = "absolute"
) is given by
\rho(x) = |x|,
and the truncated absolute loss (loss = "truncated"
) is defined as
\rho(x) = \min (|x|, \code{loss\_const}).
Value
An object of class "rdmc"
with the following components:
lambda |
a numeric vector containing the values of the regularization parameter. |
d_max |
a numeric value with which the values of the regularization
parameter were multiplied. If |
L |
in case of a single value of |
Z |
in case of a single value of |
Theta |
in case of a single value of |
objective |
a numeric vector containing the value of the objective function for each value of the regularization parameter. |
converged |
a logical vector indicating whether the algorithm converged for each value of the regularization parameter. |
nb_iter |
an integer vector containing the number of iterations for each value of the regularization parameter. |
X |
in case of a single value of |
The class structure is still experimental and may change in the future. The following accessor functions are available:
-
get_completed()
to extract the completed (i.e., imputed) data matrix for a specified value of the regularization parameter, -
get_lambda()
to extract the values of the regularization parameter, -
get_nb_iter()
to extract the number of iterations for each value of the regularization parameter.
Author(s)
Andreas Alfons and Aurore Archimbaud
References
Archimbaud, A., Alfons, A., and Wilms, I. (2025) Robust Matrix Completion for Discrete Rating-Scale Data. arXiv:2412.20802. doi:10.48550/arXiv.2412.20802.
See Also
rdmc_tune()
, fraction_grid()
Examples
# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# robust discrete matrix completion
fit <- rdmc(MovieLensToy)
# extract completed matrix with fifth value of
# regularization parameter
X_hat <- get_completed(fit, which = 5)
head(X_hat)
Robust discrete matrix completion with hyperparameter tuning
Description
Perform robust discrete matrix completion with a low-rank constraint on a latent continuous matrix, implemented via an ADMM algorithm. The regularization parameter is thereby selected via repeated holdout validation or cross-validation.
Usage
rdmc_tune(
X,
values = NULL,
lambda = fraction_grid(),
relative = TRUE,
splits = holdout_control(),
loss = c("pseudo_huber", "absolute", "truncated"),
loss_const = NULL,
...
)
Arguments
X |
a matrix or data frame of discrete ratings with missing values. |
values |
an optional numeric vector giving the possible values of the
ratings. Currently, these are assumed to be the same for all columns. If
|
lambda |
a numeric vector giving values of the regularization
parameter. See |
relative |
a logical indicating whether the values of the
regularization parameter should be considered relative to a certain
reference value computed from the data at hand. If |
splits |
an object inheriting from class |
loss |
a character string specifying the robust loss function for the
loss part of the objective function. Possible values are
|
loss_const |
tuning constant for the loss function. For the
pseudo-Huber loss, the default value is the average step size between the
rating categories in |
... |
additional arguments to be passed down to |
Details
For the loss part of the objective function, the pseudo-Huber loss
(loss = "pseudo_huber"
) is given by
\rho(x) = \code{loss\_const}^2 (\sqrt{1 + (x/\code{loss\_const})^2} - 1).
The absolute loss
(loss = "absolute"
) is given by
\rho(x) = |x|,
and the truncated absolute loss (loss = "truncated"
) is defined as
\rho(x) = \min (|x|, \code{loss\_const}).
Value
An object of class "rdmc_tuned"
with the following components:
lambda |
a numeric vector containing the values of the regularization parameter. |
tuning_loss |
a numeric vector containing the (average) values of the loss function on the validation set(s) for each value of the regularization parameter. |
lambda_opt |
numeric; the optimal value of the regularization parameter. |
fit |
an object of class |
The class structure is still experimental and may change in the future. The following accessor functions are available:
-
get_completed()
to extract the completed (i.e., imputed) data matrix with the optimal value of the regularization parameter, -
get_lambda()
to extract the optimal value of the regularization parameter, -
get_nb_iter()
to extract the number of iterations with the optimal value of the regularization parameter.
Author(s)
Andreas Alfons
References
Archimbaud, A., Alfons, A., and Wilms, I. (2025) Robust Matrix Completion for Discrete Rating-Scale Data. arXiv:2412.20802. doi:10.48550/arXiv.2412.20802.
See Also
rdmc()
, fraction_grid()
,
holdout_control()
, cv_folds_control()
,
create_splits()
Examples
# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# robust discrete matrix completion with hyperparameter tuning
set.seed(20250723)
fit <- rdmc_tune(MovieLensToy,
lambda = fraction_grid(nb_lambda = 6),
splits = holdout_control(R = 5))
# extract completed matrix with optimal regularization parameter
X_hat <- get_completed(fit)
head(X_hat)
# extract optimal value of regularization parameter
get_lambda(fit)
# extract number of iterations with optimal regularization parameter
get_nb_iter(fit)
Matrix completion via nuclear-norm regularization
Description
Convenience wrapper for softImpute()
that allows
to supply a grid of values for the regularization parameter. Other
noteworthy differences with the original function are that the columns of
the data matrix are centered internally, that some of the default values are
different, and that the output is structured differently. Moreover, in case
of discrete rating-scale data, the wrapper function allows to include a
discretization step after fitting the algorithm to map the imputed values to
the rating scale of the observed values.
Usage
soft_impute(
X,
lambda = fraction_grid(reverse = TRUE),
relative = TRUE,
type = c("svd", "als"),
rank.max = NULL,
thresh = 1e-05,
maxit = 100L,
trace.it = FALSE,
final.svd = TRUE,
discretize = TRUE,
values = NULL
)
Arguments
X |
a matrix or data frame with missing values. |
lambda |
a numeric vector giving values of the regularization
parameter. See |
relative |
a logical indicating whether the values of the
regularization parameter should be considered relative to a certain
reference value computed from the data at hand. If |
type |
a character string specifying the type of algorithm. Possible
values are |
rank.max |
a positive integer giving a rank constraint. See
|
thresh , maxit , trace.it , final.svd |
see
|
discretize |
a logical indicating whether to include a discretization
step after fitting the algorithm (defaults to |
values |
an optional numeric vector giving the possible values of
discrete ratings. This is ignored if |
Value
An object of class "soft_impute"
with the following components:
lambda |
a numeric vector containing the values of the regularization parameter. |
lambda0 |
a numeric value with which the values of the regularization
parameter were multiplied. If |
svd |
in case of a single value of |
X |
in case of a single value of |
X_discretized |
in case of a single value of |
The class structure is still experimental and may change in the future. The following accessor functions are available:
-
get_completed()
to extract the completed (i.e., imputed) data matrix for a specified value of the regularization parameter, -
get_lambda()
to extract the values of the regularization parameter.
Author(s)
Andreas Alfons and Aurore Archimbaud
References
Hastie, T., Mazumder, R., Lee, J. D. and Zadeh, R. (2015) Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares. Journal of Machine Learning Research, 16(104), 3367–3402.
Mazumder, R., Hastie, T. and Tibshirani, R. (2010) Spectral Regularization Algorithms for Learning Large Incomplete Matrices. Journal of Machine Learning Research, 11(80), 2287–2322.
See Also
soft_impute_tune()
, fraction_grid()
Examples
# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# Soft-Impute with discretization step
fit <- soft_impute(MovieLensToy)
# extract discretized completed matrix with fifth value
# of regularization parameter
X_hat <- get_completed(fit, which = 5)
head(X_hat)
Matrix completion via nuclear-norm regularization with hyperparameter tuning
Description
Perform matrix completion via nuclear-norm regularization based on
softImpute()
. The regularization parameter
is thereby selected via repeated holdout validation or cross-validation.
Note that this uses the convenience wrapper soft_impute()
,
whose default behavior is different from that of the original function.
Usage
soft_impute_tune(
X,
lambda = fraction_grid(reverse = TRUE),
relative = TRUE,
splits = holdout_control(),
...,
discretize = TRUE,
values = NULL
)
Arguments
X |
a matrix or data frame with missing values. |
lambda |
a numeric vector giving values of the regularization
parameter. See |
relative |
a logical indicating whether the values of the
regularization parameter should be considered relative to a certain
reference value computed from the data at hand. If |
splits |
an object inheriting from class |
... |
additional arguments to be passed down to
|
discretize |
a logical indicating whether to include a discretization
step after fitting the algorithm (defaults to |
values |
an optional numeric vector giving the possible values of
discrete ratings. This is ignored if |
Value
An object of class "soft_impute_tuned"
with the following components:
lambda |
a numeric vector containing the values of the regularization parameter. |
tuning_loss |
a numeric vector containing the (average) values of the loss function on the validation set(s) for each value of the regularization parameter. |
lambda_opt |
numeric; the optimal value of the regularization parameter. |
fit |
an object of class |
The class structure is still experimental and may change in the future. The following accessor functions are available:
-
get_completed()
to extract the imputed data matrix (with the optimal value of the regularization parameter), -
get_lambda()
to extract the optimal value of the regularization parameter.
Author(s)
Andreas Alfons
References
Hastie, T., Mazumder, R., Lee, J. D. and Zadeh, R. (2015) Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares. Journal of Machine Learning Research, 16(104), 3367–3402.
Mazumder, R., Hastie, T. and Tibshirani, R. (2010) Spectral Regularization Algorithms for Learning Large Incomplete Matrices. Journal of Machine Learning Research, 11(80), 2287–2322.
See Also
soft_impute()
, fraction_grid()
,
holdout_control()
, cv_folds_control()
,
create_splits()
Examples
# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# Soft-Impute with discretization step and hyperparameter tuning
set.seed(20250723)
fit <- soft_impute_tune(MovieLensToy,
lambda = fraction_grid(nb_lambda = 6,
reverse = TRUE),
splits = holdout_control(R = 5))
# extract discretized completed matrix with optimal
# regularization parameter
X_hat <- get_completed(fit)
head(X_hat)
# extract optimal value of regularization parameter
get_lambda(fit)
Control objects for hyperparameter validation
Description
Construct control objects for repeated holdout validation or
K
-fold cross-validation.
Usage
holdout_control(pct = 0.1, R = 10L)
cv_folds_control(K = 5L)
Arguments
pct |
numeric in the interval (0, 1); the percentage of observed cells in the data matrix to be randomly selected into the validation set (defaults to 0.1). |
R |
an integer giving the number of random splits into training and validation sets (defaults to 10). |
K |
an integer giving the number of cross-validation folds (defaults to 5). |
Value
An object inheriting from class "split_control"
containing
the relevant information for splitting the the observed cells of a data
matrix into training and validation sets for hyperparameter tuning.
The subclass "holdout_control"
returned by holdout_control()
is a list with components pct
and R
containing the
corresponding argument values after validity checks.
The subclass "cv_folds_control"
returned by cv_folds_control()
is a list with a single component K
containing the corresponding
argument value after validity checks.
See Also
rdmc_tune()
, soft_impute_tune()
Examples
# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# robust discrete matrix completion with hyperparameter tuning
set.seed(20250723)
fit <- rdmc_tune(MovieLensToy,
lambda = fraction_grid(nb_lambda = 6),
splits = holdout_control(R = 5))
# extract optimal value of regularization parameter
get_lambda(fit)