R: Multivariate Gaussian Mixture Model (GMM)

spark.gaussianMixture {SparkR}

R Documentation

Multivariate Gaussian Mixture Model (GMM)

Description

Fits multivariate gaussian mixture model against a SparkDataFrame, similarly to R's mvnormalmixEM(). Users can call summary to print a summary of the fitted model, predict to make predictions on new data, and write.ml/read.ml to save/load fitted models.

Usage

spark.gaussianMixture(data, formula, ...)

## S4 method for signature 'SparkDataFrame,formula'
spark.gaussianMixture(data, formula, k = 2, maxIter = 100, tol = 0.01)

## S4 method for signature 'GaussianMixtureModel'
summary(object)

## S4 method for signature 'GaussianMixtureModel'
predict(object, newData)

## S4 method for signature 'GaussianMixtureModel,character'
write.ml(object, path, overwrite = FALSE)

Arguments

`data`	a SparkDataFrame for training.
`formula`	a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in spark.gaussianMixture.
`...`	additional arguments passed to the method.
`k`	number of independent Gaussians in the mixture model.
`maxIter`	maximum iteration number.
`tol`	the convergence tolerance.
`object`	a fitted gaussian mixture model.
`newData`	a SparkDataFrame for testing.
`path`	the directory where the model is saved.
`overwrite`	overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.

Value

spark.gaussianMixture returns a fitted multivariate gaussian mixture model.

summary returns summary of the fitted model, which is a list. The list includes the model's lambda (lambda), mu (mu), sigma (sigma), loglik (loglik), and posterior (posterior).

predict returns a SparkDataFrame containing predicted labels in a column named "prediction".

Note

spark.gaussianMixture since 2.1.0

summary(GaussianMixtureModel) since 2.1.0

predict(GaussianMixtureModel) since 2.1.0

write.ml(GaussianMixtureModel, character) since 2.1.0

Examples

## Not run: 
##D sparkR.session()
##D library(mvtnorm)
##D set.seed(100)
##D a <- rmvnorm(4, c(0, 0))
##D b <- rmvnorm(6, c(3, 4))
##D data <- rbind(a, b)
##D df <- createDataFrame(as.data.frame(data))
##D model <- spark.gaussianMixture(df, ~ V1 + V2, k = 2)
##D summary(model)
##D 
##D # fitted values on training data
##D fitted <- predict(model, df)
##D head(select(fitted, "V1", "prediction"))
##D 
##D # save fitted model to input path
##D path <- "path/to/model"
##D write.ml(model, path)
##D 
##D # can also read back the saved model and print
##D savedModel <- read.ml(path)
##D summary(savedModel)
## End(Not run)

[Package SparkR version 3.2.0 Index]