PrefixSpan
spark.prefixSpan.RdA parallel PrefixSpan algorithm to mine frequent sequential patterns.
spark.findFrequentSequentialPatterns returns a complete set of frequent sequential
patterns.
For more details, see
PrefixSpan.
Usage
spark.findFrequentSequentialPatterns(data, ...)
# S4 method for SparkDataFrame
spark.findFrequentSequentialPatterns(
  data,
  minSupport = 0.1,
  maxPatternLength = 10L,
  maxLocalProjDBSize = 32000000L,
  sequenceCol = "sequence"
)Arguments
- data
- A SparkDataFrame. 
- ...
- additional argument(s) passed to the method. 
- minSupport
- Minimal support level. 
- maxPatternLength
- Maximal pattern length. 
- maxLocalProjDBSize
- Maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing. 
- sequenceCol
- name of the sequence column in dataset. 
Value
A complete set of frequent sequential patterns in the input sequences of itemsets.
        The returned SparkDataFrame contains columns of sequence and corresponding
        frequency. The schema of it will be:
sequence: ArrayType(ArrayType(T)), freq: integer
where T is the item type
Examples
if (FALSE) {
df <- createDataFrame(list(list(list(list(1L, 2L), list(3L))),
                           list(list(list(1L), list(3L, 2L), list(1L, 2L))),
                           list(list(list(1L, 2L), list(5L))),
                           list(list(list(6L)))),
                      schema = c("sequence"))
frequency <- spark.findFrequentSequentialPatterns(df, minSupport = 0.5, maxPatternLength = 5L,
                                                  maxLocalProjDBSize = 32000000L)
showDF(frequency)
}