pyspark.SparkContext.runJob#
- SparkContext.runJob(rdd, partitionFunc, partitions=None, allowLocal=False)[source]#
- Executes the given partitionFunc on the specified set of partitions, returning the result as an array of elements. - If ‘partitions’ is not specified, this will run over all partitions. - New in version 1.1.0. - Parameters
- rddRDD
- target RDD to run tasks on 
- partitionFuncfunction
- a function to run on each partition of the RDD 
- partitionslist, optional
- set of partitions to run on; some jobs may not want to compute on all partitions of the target RDD, e.g. for operations like first 
- allowLocalbool, default False
- this parameter takes no effect 
 
- rdd
- Returns
- list
- results of specified partitions 
 
 - See also - Examples - >>> myRDD = sc.parallelize(range(6), 3) >>> sc.runJob(myRDD, lambda part: [x * x for x in part]) [0, 1, 4, 9, 16, 25] - >>> myRDD = sc.parallelize(range(6), 3) >>> sc.runJob(myRDD, lambda part: [x * x for x in part], [0, 2], True) [0, 1, 16, 25]