pyspark.RDD.foldByKey#
- RDD.foldByKey(zeroValue, func, numPartitions=None, partitionFunc=<function portable_hash>)[source]#
- Merge the values for each key using an associative function “func” and a neutral “zeroValue” which may be added to the result an arbitrary number of times, and must not change the result (e.g., 0 for addition, or 1 for multiplication.). - New in version 1.1.0. - Parameters
- zeroValueV
- the initial value for the accumulated result of each partition 
- funcfunction
- a function to combine two V’s into a single one 
- numPartitionsint, optional
- the number of partitions in new - RDD
- partitionFuncfunction, optional, default portable_hash
- function to compute the partition index 
 
- Returns
 - Examples - >>> rdd = sc.parallelize([("a", 1), ("b", 1), ("a", 1)]) >>> from operator import add >>> sorted(rdd.foldByKey(0, add).collect()) [('a', 2), ('b', 1)]