pyspark.sql.functions.sum_distinct#
- pyspark.sql.functions.sum_distinct(col)[source]#
- Aggregate function: returns the sum of distinct values in the expression. - New in version 3.2.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- colColumnor column name
- target column to compute on. 
 
- col
- Returns
- Column
- the column for computed results. 
 
 - Examples - Example 1: Using sum_distinct function on a column with all distinct values - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1,), (2,), (3,), (4,)], ["numbers"]) >>> df.select(sf.sum_distinct('numbers')).show() +---------------------+ |sum(DISTINCT numbers)| +---------------------+ | 10| +---------------------+ - Example 2: Using sum_distinct function on a column with no distinct values - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1,), (1,), (1,), (1,)], ["numbers"]) >>> df.select(sf.sum_distinct('numbers')).show() +---------------------+ |sum(DISTINCT numbers)| +---------------------+ | 1| +---------------------+ - Example 3: Using sum_distinct function on a column with null and duplicate values - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(None,), (1,), (1,), (2,)], ["numbers"]) >>> df.select(sf.sum_distinct('numbers')).show() +---------------------+ |sum(DISTINCT numbers)| +---------------------+ | 3| +---------------------+ - Example 4: Using sum_distinct function on a column with all None values - >>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import StructType, StructField, IntegerType >>> schema = StructType([StructField("numbers", IntegerType(), True)]) >>> df = spark.createDataFrame([(None,), (None,), (None,), (None,)], schema=schema) >>> df.select(sf.sum_distinct('numbers')).show() +---------------------+ |sum(DISTINCT numbers)| +---------------------+ | NULL| +---------------------+