Spark Streaming (Legacy)#
Core Classes#
  | 
Main entry point for Spark Streaming functionality.  | 
  | 
A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see   | 
Streaming Management#
Add a [[org.apache.spark.streaming.scheduler.StreamingListener]] object for receiving system events related to streaming.  | 
|
  | 
Wait for the execution to stop.  | 
Wait for the execution to stop.  | 
|
  | 
Sets the context to periodically checkpoint the DStream operations for master fault-tolerance.  | 
Return either the currently active StreamingContext (i.e., if there is a context started but not stopped) or None.  | 
|
Either return the active StreamingContext (i.e.  | 
|
  | 
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.  | 
  | 
Set each DStreams in this context to remember RDDs it generated in the last given duration.  | 
Return SparkContext which is associated with this StreamingContext.  | 
|
Start the execution of the streams.  | 
|
  | 
Stop the execution of the streams, with option of ensuring all received data has been processed.  | 
  | 
Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams.  | 
  | 
Create a unified DStream from multiple DStreams of the same type and same slide duration.  | 
Input and Output#
Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as flat binary files with records of fixed length.  | 
|
  | 
Create an input stream from a queue of RDDs or list.  | 
  | 
Create an input from TCP source hostname:port.  | 
  | 
Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as text files.  | 
  | 
Print the first num elements of each RDD generated in this DStream.  | 
  | 
Save each RDD in this DStream as at text file, using string representation of elements.  | 
Transformations and Actions#
Persist the RDDs of this DStream with the default storage level (MEMORY_ONLY).  | 
|
  | 
Enable periodic checkpointing of RDDs of this DStream  | 
  | 
Return a new DStream by applying 'cogroup' between RDDs of this DStream and other DStream.  | 
  | 
Return a new DStream by applying combineByKey to each RDD.  | 
Return the StreamingContext associated with this DStream  | 
|
Return a new DStream in which each RDD has a single element generated by counting each RDD of this DStream.  | 
|
Return a new DStream in which each RDD contains the counts of each distinct value in each RDD of this DStream.  | 
|
  | 
Return a new DStream in which each RDD contains the count of distinct elements in RDDs in a sliding window over this DStream.  | 
  | 
Return a new DStream in which each RDD has a single element generated by counting the number of elements in a window over this DStream.  | 
Return a new DStream containing only the elements that satisfy predicate.  | 
|
  | 
Return a new DStream by applying a function to all elements of this DStream, and then flattening the results  | 
Return a new DStream by applying a flatmap function to the value of each key-value pairs in this DStream without changing the key.  | 
|
  | 
Apply a function to each RDD in this DStream.  | 
  | 
Return a new DStream by applying 'full outer join' between RDDs of this DStream and other DStream.  | 
Return a new DStream in which RDD is generated by applying glom() to RDD of this DStream.  | 
|
  | 
Return a new DStream by applying groupByKey on each RDD.  | 
  | 
Return a new DStream by applying groupByKey over a sliding window.  | 
  | 
Return a new DStream by applying 'join' between RDDs of this DStream and other DStream.  | 
  | 
Return a new DStream by applying 'left outer join' between RDDs of this DStream and other DStream.  | 
  | 
Return a new DStream by applying a function to each element of DStream.  | 
  | 
Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs of this DStream.  | 
  | 
Return a new DStream in which each RDD is generated by applying mapPartitionsWithIndex() to each RDDs of this DStream.  | 
Return a new DStream by applying a map function to the value of each key-value pairs in this DStream without changing the key.  | 
|
  | 
Return a copy of the DStream in which each RDD are partitioned using the specified partitioner.  | 
  | 
Persist the RDDs of this DStream with the given storage level  | 
  | 
Return a new DStream in which each RDD has a single element generated by reducing each RDD of this DStream.  | 
  | 
Return a new DStream by applying reduceByKey to each RDD.  | 
  | 
Return a new DStream by applying incremental reduceByKey over a sliding window.  | 
  | 
Return a new DStream in which each RDD has a single element generated by reducing all elements in a sliding window over this DStream.  | 
  | 
Return a new DStream with an increased or decreased level of parallelism.  | 
  | 
Return a new DStream by applying 'right outer join' between RDDs of this DStream and other DStream.  | 
  | 
Return all the RDDs between 'begin' to 'end' (both included)  | 
  | 
Return a new DStream in which each RDD is generated by applying a function on each RDD of this DStream.  | 
  | 
Return a new DStream in which each RDD is generated by applying a function on each RDD of this DStream and 'other' DStream.  | 
  | 
Return a new DStream by unifying data of another DStream with this DStream.  | 
  | 
Return a new "state" DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values of the key.  | 
  | 
Return a new DStream in which each RDD contains all the elements in seen in a sliding window of time over this DStream.  | 
Kinesis#
  | 
Create an input stream that pulls messages from a Kinesis stream.  |