pyspark.sql.functions.count_distinct¶
- 
pyspark.sql.functions.count_distinct(col: ColumnOrName, *cols: ColumnOrName) → pyspark.sql.column.Column[source]¶
- Returns a new - Columnfor distinct count of- color- cols.- New in version 3.2.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- Returns
- Column
- distinct values of these two column values. 
 
 - Examples - >>> from pyspark.sql import types >>> df1 = spark.createDataFrame([1, 1, 3], types.IntegerType()) >>> df2 = spark.createDataFrame([1, 2], types.IntegerType()) >>> df1.join(df2).show() +-----+-----+ |value|value| +-----+-----+ | 1| 1| | 1| 2| | 1| 1| | 1| 2| | 3| 1| | 3| 2| +-----+-----+ >>> df1.join(df2).select(count_distinct(df1.value, df2.value)).show() +----------------------------+ |count(DISTINCT value, value)| +----------------------------+ | 4| +----------------------------+