| Type: | Package | 
| Title: | The Generalized Semi-Supervised Elastic-Net | 
| Version: | 1.0.7 | 
| Date: | 2024-03-04 | 
| Description: | Implements the generalized semi-supervised elastic-net. This method extends the supervised elastic-net problem, and thus it is a practical solution to the problem of feature selection in semi-supervised contexts. Its mathematical formulation is presented from a general perspective, covering a wide range of models. We focus on linear and logistic responses, but the implementation could be easily extended to other losses in generalized linear models. We develop a flexible and fast implementation, written in 'C++' using 'RcppArmadillo' and integrated into R via 'Rcpp' modules. See Culp, M. 2013 <doi:10.1080/10618600.2012.657139> for references on the Joint Trained Elastic-Net. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| Imports: | Rcpp, methods, MASS | 
| Depends: | stats | 
| LinkingTo: | Rcpp, RcppArmadillo | 
| Suggests: | knitr, rmarkdown, glmnet, Metrics, testthat | 
| VignetteBuilder: | knitr | 
| URL: | https://github.com/jlaria/s2net | 
| BugReports: | https://github.com/jlaria/s2net/issues | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.0 | 
| NeedsCompilation: | yes | 
| Packaged: | 2024-03-31 08:37:38 UTC; root | 
| Author: | Juan C. Laria  | 
| Maintainer: | Juan C. Laria <juank.laria@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-03-31 10:30:02 UTC | 
Class s2net
Description
This is the main class of this library, implemented in C++ and exposed to R using Rcpp modules. 
It can be used in R directly, although some generic S4 methods have been implemented to make it easier to interact in R.
Methods
- predict
 signature(object = "Rcpp_s2net"): Seepredict_Rcpp_s2net
Fields
beta:Object of class
matrix. The fitted model coefficients.intercept:The model intercept.
Class-Based Methods
initialize(data, loss):- 
data-  
s2Dataobject lossLoss function: 0 = linear, 1 = logit
 setupFista(s2Fista):Configures the FISTA internal algorithm.
predict(newX, type):- 
newXNew data
matrixto make predictions.type0 = default, 1 = response, 2 = probs, 3 = class
 fit(params, frame, proj):- 
paramss2Paramsobjectframe0 = "JT", 1 = "ExtJT"
proj0 = no, 1 = yes, 2 = auto
 
Author(s)
Juan C. Laria
Examples
data("auto_mpg")
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL,  xU = auto_mpg$P1$xU)
# We create the C++ object calling the new method (constructor)
obj = new(s2net, train, 0) # 0 = regression 
obj
# We call directly the $fit method of obj, 
obj$fit(s2Params(lambda1 = 0.01, 
                   lambda2 = 0.01, 
                   gamma1 = 0.05, 
                   gamma2 = 100, 
                   gamma3 = 0.05), 1, 2)
# fitted model
obj$beta
# We can test the results using the unlabeled data
test = s2Data(xL = auto_mpg$P1$xU, yL = auto_mpg$P1$yU,  preprocess = train)
ypred = obj$predict(test$xL, 0)
## Not run: 
if(require(ggplot2)){
  ggplot() + 
    aes(x = ypred, y = test$yL) + geom_point() + 
    geom_abline(intercept = 0, slope = 1, linetype = 2)
}
## End(Not run)
Auto MPG Data Set
Description
This dataset was taken from the UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets/Auto+MPG, and processed for the semi-supervised setting (Ryan and Culp, 2015).
Usage
data("auto_mpg")
Format
There are two lists that contain partitions from a data frame with 398 observations on the following 9 variables.
mpga numeric vector
cylindersan ordered factor with levels
3<4<5<6<8displacementa numeric vector
horsepowera numeric vector
weighta numeric vector
accelerationa numeric vector
yeara numeric vector
origina factor
Details
This dataset is a slightly modified version of the dataset provided in the StatLib library. In line with the use by Ross Quinlan (1993) in predicting the attribute "mpg", 8 of the original instances were removed because they had unknown values for the "mpg" attribute. "The data concerns city-cycle fuel consumption in miles per gallon, to be predicted in terms of 3 multivalued discrete and 5 continuous attributes." (Quinlan, 1993)
Source
Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/]. Irvine, CA: University of California, School of Information and Computer Science.
References
Ryan, K. J., & Culp, M. V. (2015). On semi-supervised linear regression in covariate shift problems. The Journal of Machine Learning Research, 16(1), 3183-3217.
Examples
data(auto_mpg)
head(auto_mpg$P1$xL)
S3 Methods for s2netR objects.
Description
Generic predict method. Wrapper for the C++ class method s2net$predict.
Usage
## S3 method for class 's2netR'
predict(object, newX, type = "default", ...)
Arguments
object | 
 A   | 
newX | 
 A matrix with the data to make predictions. It should be in the same scale as the original data. See   | 
type | 
 Type of predictions. One of   | 
... | 
 other parameters passed to predict  | 
Value
A column matrix with predictions.
See Also
Examples
data("auto_mpg")
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL,  xU = auto_mpg$P1$xU)
model = s2netR(train, 
                s2Params(lambda1 = 0.1, 
                           lambda2 = 0,
                           gamma1 = 0.1,
                           gamma2 = 100,
                           gamma3 = 0.1),
                loss = "linear",
                frame = "ExtJT",
                proj = "auto",
                fista = s2Fista(5000, 1e-7, 1, 0.8))
valid = s2Data(auto_mpg$P1$xU, auto_mpg$P1$yU, preprocess = train)
ypred = predict(model, valid$xL)
## Not run: 
if(require(ggplot2)){
  ggplot() + 
    aes(x = ypred, y = valid$yL) + geom_point() + 
    geom_abline(intercept = 0, slope = 1, linetype = 2)
}
## End(Not run)
Predict method for s2net C++ class.
Description
This function provides an interface in R for the method predict in C++ class s2net.
Usage
predict_Rcpp_s2net(object, newX, type = "default")
Arguments
object | 
 An object of class   | 
newX | 
 Data to make predictions. Could be a   | 
type | 
 Type of predictions. One of   | 
Details
This method is included as a high-level wrapper of object$predict().
Value
Returns a column matrix with the same number of rows/observations as newX.
Author(s)
Juan C. Laria
See Also
Print methods for S3 objects
Description
Very simple print methods to show basic information about these simple S3 objects.
Usage
## S3 method for class 's2Data'
print(x, ...)
## S3 method for class 's2Fista'
print(x, ...)
Arguments
x | 
 S3 object of class   | 
... | 
 other parameters passed to print  | 
See Also
Data wrapper for s2net.
Description
This function preprocess the data to fit a semi-supervised linear joint trained model.
Usage
s2Data(xL, yL, xU = NULL, preprocess = T)
Arguments
xL | 
 The labeled data. Could be a   | 
yL | 
 The labels associated with   | 
xU | 
 The unlabeled data (optional). Could be a   | 
preprocess | 
 Should the input data be pre-processed? Possible values are: 
 
 Another object of class   | 
Value
Returns an object of S3 class s2Data with fields
xL | 
 Transformed labeled data  | 
yL | 
 Transformed labels. If   | 
xU | 
 Tranformed unlabeled data  | 
type | 
 Type of task. This one is inferred from the response labels.  | 
base | 
 Base category for classification   | 
In addition the following attributes are stored.
pr:rm_cols | 
 logical vector of removed columns  | 
pr:center | 
 column center  | 
pr:scale | 
 column scale  | 
pr:ycenter | 
 yL center. Regression  | 
pr:yscale | 
 yL scale. Regression  | 
Author(s)
Juan C. Laria
See Also
Examples
data("auto_mpg")
train = s2Data( xL = auto_mpg$P1$xL,
                  yL = auto_mpg$P1$yL,
                  xU = auto_mpg$P1$xU,
                  preprocess = TRUE )
show(train)
# Notice how ordered factor variable $cylinders is handled 
# .L (linear) .Q (quadratic) .C (cubic) and .^4
head(train$xL) 
#if you want to do validation with the unlabeled data
idx = sample(length(auto_mpg$P1$yU), 200)
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL, xU = auto_mpg$P1$xU[idx, ])
valid = s2Data(xL = auto_mpg$P1$xU[-idx, ], yL = auto_mpg$P1$yU[-idx], preprocess = train)
test = s2Data(xL = auto_mpg$P1$xU[idx, ], yL = auto_mpg$P1$yU[idx], preprocess = train)
train
valid
test
Hyper-parameter wrapper for FISTA.
Description
This is a very simple function that supplies the hyper-parameters for the Fast Iterative Soft-Threshold Algorithm (FISTA) that solves the s2net minimization problem.
Usage
s2Fista(MAX_ITER_INNER = 5000, TOL = 1e-07, t0 = 2, step = 0.1, use_warmstart = FALSE)
Arguments
MAX_ITER_INNER | 
 Number of iterations of FISTA  | 
TOL | 
 The relative tolerance. The algorith stops when the objective does not improve more than   | 
t0 | 
 The initial stepsize for backtracking.  | 
step | 
 The scale factor in the stepsize to backtrack until a valid step is found.  | 
use_warmstart | 
 Should we use a warm   | 
Value
Returns an object of S3 class s2Fista with the input arguments as fields.
References
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1), 183-202. doi:10.1137/080716542
See Also
Hyper-parameter wrapper for s2net
Description
This is a very simple function that collapses the input parameters into a named vector to supply to C++ methods.
Usage
s2Params(lambda1, lambda2 = 0, gamma1 = 0, gamma2 = 0, gamma3 = 0)
Arguments
lambda1 | 
 elastic-net regularization parameter -   | 
lambda2 | 
 elastic-net regularization parameter -   | 
gamma1 | 
 s2net weight hyper-parameter.  | 
gamma2 | 
 s2net covariance hyper-parameter (between 1 and   | 
gamma3 | 
 s2net shift hyper-parameter (between 0 and 1).  | 
Value
Returns a named vector of S3 class s2Params.
See Also
The Generalized Semi-Supervised Elastic-Net
Description
Implements the generalized semi-supervised elastic-net. This method extends the supervised elastic-net problem, and thus it is a practical solution to the problem of feature selection in semi-supervised contexts. Its mathematical formulation is presented from a general perspective, covering a wide range of models.  We focus on linear and logistic responses, but the implementation could be easily extended to other losses in generalized linear models. We develop a flexible and fast implementation, written in 'C++' using 'RcppArmadillo' and integrated into R via 'Rcpp' modules. See Culp, M. 2013 <doi:10.1080/10618600.2012.657139> for references on the Joint Trained Elastic-Net.
Details
The DESCRIPTION file:
| Package: | s2net | 
| Type: | Package | 
| Title: | The Generalized Semi-Supervised Elastic-Net | 
| Version: | 1.0.7 | 
| Date: | 2024-03-04 | 
| Authors@R: | c(person("Juan C.", "Laria",, role = c("aut", "cre"), email = "juank.laria@gmail.com", comment = c(ORCID = "0000-0001-7734-9647")), person("Line H.", "Clemmensen",, role = c("aut"), email = "lkhc@dtu.dk")) | 
| Description: | Implements the generalized semi-supervised elastic-net. This method extends the supervised elastic-net problem, and thus it is a practical solution to the problem of feature selection in semi-supervised contexts. Its mathematical formulation is presented from a general perspective, covering a wide range of models. We focus on linear and logistic responses, but the implementation could be easily extended to other losses in generalized linear models. We develop a flexible and fast implementation, written in 'C++' using 'RcppArmadillo' and integrated into R via 'Rcpp' modules. See Culp, M. 2013 <doi:10.1080/10618600.2012.657139> for references on the Joint Trained Elastic-Net. | 
| License: | GPL (>= 2) | 
| Imports: | Rcpp, methods, MASS | 
| Depends: | stats | 
| LinkingTo: | Rcpp, RcppArmadillo | 
| Suggests: | knitr, rmarkdown, glmnet, Metrics, testthat | 
| VignetteBuilder: | knitr | 
| URL: | https://github.com/jlaria/s2net | 
| BugReports: | https://github.com/jlaria/s2net/issues | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.0 | 
| Author: | Juan C. Laria [aut, cre] (<https://orcid.org/0000-0001-7734-9647>), Line H. Clemmensen [aut] | 
| Maintainer: | Juan C. Laria <juank.laria@gmail.com> | 
Index of help topics:
Rcpp_s2net-class        Class 's2net'
auto_mpg                Auto MPG Data Set
predict.s2netR          S3 Methods for 's2netR' objects.
predict_Rcpp_s2net      Predict method for 's2net' C++ class.
print.s2Data            Print methods for S3 objects
s2Data                  Data wrapper for 's2net'.
s2Fista                 Hyper-parameter wrapper for FISTA.
s2Params                Hyper-parameter wrapper for 's2net'
s2net                   The Generalized Semi-Supervised Elastic-Net
s2netR                  Trains a generalized extended linear joint
                        trained model using semi-supervised data.
simulate_extra          Simulate extrapolated data
simulate_groups         Simulate data (two groups design)
This package includes a very easy-to-use interface for handling data, with the s2Data function. The main function of the package is the s2netR function, which is a wrapper for the Rcpp_s2net (s2net) class. 
Author(s)
Juan C. Laria [aut, cre] (<https://orcid.org/0000-0001-7734-9647>), Line H. Clemmensen [aut]
References
Laria, J.C., L. Clemmensen (2019). A generalized elastic-net for semi-supervised learning of sparse features.
Sogaard Larsen, J. et. al. (2019). Semi-supervised covariate shift modelling of spectroscopic data.
Ryan, K. J., & Culp, M. V. (2015). On semi-supervised linear regression in covariate shift problems. The Journal of Machine Learning Research, 16(1), 3183-3217.
See Also
Examples
data("auto_mpg")
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL,  xU = auto_mpg$P1$xU)
model = s2netR(train, 
                s2Params(lambda1 = 0.1, 
                           lambda2 = 0,
                           gamma1 = 0.1,
                           gamma2 = 100,
                           gamma3 = 0.1))
# here we tell it to transform the valid data as we did with train.
valid = s2Data(auto_mpg$P1$xU, auto_mpg$P1$yU, preprocess = train) 
ypred = predict(model, valid$xL)
## Not run: 
if(require(ggplot2)){
  ggplot() + 
    aes(x = ypred, y = valid$yL) + geom_point() + 
    geom_abline(intercept = 0, slope = 1, linetype = 2)
}
## End(Not run)
Trains a generalized extended linear joint trained model using semi-supervised data.
Description
This function is a wrapper for the class s2net. It creates the C++ object and fits the model using input data.
Usage
s2netR(data, params, loss = "default", frame = "ExtJT", proj = "auto", 
        fista = NULL, S3 = TRUE)
Arguments
data | 
 A   | 
params | 
 A   | 
loss | 
 Loss function. One of   | 
frame | 
 The semi-supervised frame:   | 
proj | 
 Should the unlabeled data be shifted to remove the model's effect? One of   | 
fista | 
 Fista setup parameters. An object of class   | 
S3 | 
 Boolean: should the method return an S3 object (default) or a C++ object?  | 
Value
Returns an object of S3 class s2netR or a C++ object of class s2net
Author(s)
Juan C. Laria
References
Ryan, K. J., & Culp, M. V. (2015). On semi-supervised linear regression in covariate shift problems. The Journal of Machine Learning Research, 16(1), 3183-3217.
See Also
Examples
data("auto_mpg")
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL,  xU = auto_mpg$P1$xU)
model = s2netR(train, 
                s2Params(lambda1 = 0.1, 
                           lambda2 = 0,
                           gamma1 = 0.1,
                           gamma2 = 100,
                           gamma3 = 0.1),
                loss = "linear",
                frame = "ExtJT",
                proj = "auto",
                fista = s2Fista(5000, 1e-7, 1, 0.8))
valid = s2Data(auto_mpg$P1$xU, auto_mpg$P1$yU, preprocess = train)
ypred = predict(model, valid$xL)
## Not run: 
if(require(ggplot2)){
  ggplot() + 
    aes(x = ypred, y = valid$yL) + geom_point() + 
    geom_abline(intercept = 0, slope = 1, linetype = 2)
}
## End(Not run)
Simulate extrapolated data
Description
Simulated data scenarios described in the paper from Ryan and Culp (2015).
Usage
simulate_extra(n_source = 100, n_target = 100, p = 1000, shift = 10, 
               scenario = "same", response = "linear", sigma2 = 2.5)
Arguments
n_source | 
 Number of source samples (labeled)  | 
n_target | 
 Number of target samples (unlabeled)  | 
p | 
 Number of variables (   | 
shift | 
 The shift applied to the first 10 columns of xU.  | 
scenario | 
 Simulation scenario. One of   | 
response | 
 Type of response:   | 
sigma2 | 
 The variance of the error term, linear response case.  | 
Value
A list, with
- xL
 data frame with the labeled (source) data
- yL
 labels associated with
xL- xU
 data frame with the unlabeled (target) data
- yU
 labels associated with
xU(for validation/testing)
References
Ryan, K. J., & Culp, M. V. (2015). On semi-supervised linear regression in covariate shift problems. The Journal of Machine Learning Research, 16(1), 3183-3217.
See Also
Examples
set.seed(0)
data = simulate_extra()
train = s2Data(data$xL, data$yL, data$xU)
valid = s2Data(data$xU, data$yU, preprocess = train)
model = s2netR(train, s2Params(0.1))
ypred = predict(model, valid$xL)
plot(ypred, valid$yL)
Simulate data (two groups design)
Description
Simulated data scenario described in paper [citation here].
Usage
simulate_groups(n_source = 100, n_target = 100, p = 200, response = "linear")
Arguments
n_source | 
 Number of labeled observations  | 
n_target | 
 Number of unlabeled (target) observations  | 
p | 
 Number of variables  | 
response | 
 Type of response:   | 
Value
A list, with
- xL
 data frame with the labeled (source) data
- yL
 labels associated with
xL- xU
 data frame with the unlabeled (target) data
- yU
 labels associated with
xU(for validation/testing)
Author(s)
Juan C. Laria