| subset {SparkR} | R Documentation | 
Return subsets of SparkDataFrame according to given conditions
subset(x, ...) ## S4 method for signature 'SparkDataFrame,numericOrcharacter' x[[i]] ## S4 replacement method for signature 'SparkDataFrame,numericOrcharacter' x[[i]] <- value ## S4 method for signature 'SparkDataFrame' x[i, j, ..., drop = F] ## S4 method for signature 'SparkDataFrame' subset(x, subset, select, drop = F, ...)
| x | a SparkDataFrame. | 
| ... | currently not used. | 
| i, subset | (Optional) a logical expression to filter on rows. For extract operator [[ and replacement operator [[<-, the indexing parameter for a single Column. | 
| value | a Column or an atomic vector in the length of 1 as literal value, or  | 
| j, select | expression for the single Column or a list of columns to select from the SparkDataFrame. | 
| drop | if TRUE, a Column will be returned if the resulting dataset has only one column. Otherwise, a SparkDataFrame will always be returned. | 
A new SparkDataFrame containing only the rows that meet the condition with selected columns.
[[ since 1.4.0
[[<- since 2.1.1
[ since 1.4.0
subset since 1.5.0
Other SparkDataFrame functions: SparkDataFrame-class,
agg, alias,
arrange, as.data.frame,
attach,SparkDataFrame-method,
broadcast, cache,
checkpoint, coalesce,
collect, colnames,
coltypes,
createOrReplaceTempView,
crossJoin, cube,
dapplyCollect, dapply,
describe, dim,
distinct, dropDuplicates,
dropna, drop,
dtypes, except,
explain, filter,
first, gapplyCollect,
gapply, getNumPartitions,
group_by, head,
hint, histogram,
insertInto, intersect,
isLocal, isStreaming,
join, limit,
localCheckpoint, merge,
mutate, ncol,
nrow, persist,
printSchema, randomSplit,
rbind, registerTempTable,
rename, repartition,
rollup, sample,
saveAsTable, schema,
selectExpr, select,
showDF, show,
storageLevel, str,
summary, take,
toJSON, unionByName,
union, unpersist,
withColumn, withWatermark,
with, write.df,
write.jdbc, write.json,
write.orc, write.parquet,
write.stream, write.text
Other subsetting functions: filter,
select
## Not run: 
##D   # Columns can be selected using [[ and [
##D   df[[2]] == df[["age"]]
##D   df[,2] == df[,"age"]
##D   df[,c("name", "age")]
##D   # Or to filter rows
##D   df[df$age > 20,]
##D   # SparkDataFrame can be subset on both rows and Columns
##D   df[df$name == "Smith", c(1,2)]
##D   df[df$age %in% c(19, 30), 1:2]
##D   subset(df, df$age %in% c(19, 30), 1:2)
##D   subset(df, df$age %in% c(19), select = c(1,2))
##D   subset(df, select = c(1,2))
##D   # Columns can be selected and set
##D   df[["age"]] <- 23
##D   df[[1]] <- df$age
##D   df[[2]] <- NULL # drop column
## End(Not run)