pyspark.sql.DataFrame.intersectAll#
- DataFrame.intersectAll(other)[source]#
- Return a new - DataFramecontaining rows in both this- DataFrameand another- DataFramewhile preserving duplicates.- This is equivalent to INTERSECT ALL in SQL. As standard in SQL, this function resolves columns by position (not by name). - New in version 2.4.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- Returns
- DataFrame
- Combined DataFrame. 
 
 - Examples - Example 1: Intersecting two DataFrames with the same schema - >>> df1 = spark.createDataFrame([("a", 1), ("a", 1), ("b", 3), ("c", 4)], ["C1", "C2"]) >>> df2 = spark.createDataFrame([("a", 1), ("a", 1), ("b", 3)], ["C1", "C2"]) >>> result_df = df1.intersectAll(df2).sort("C1", "C2") >>> result_df.show() +---+---+ | C1| C2| +---+---+ | a| 1| | a| 1| | b| 3| +---+---+ - Example 2: Intersecting two DataFrames with different schemas - >>> df1 = spark.createDataFrame([(1, "A"), (2, "B")], ["id", "value"]) >>> df2 = spark.createDataFrame([(2, "B"), (3, "C")], ["id", "value"]) >>> result_df = df1.intersectAll(df2).sort("id", "value") >>> result_df.show() +---+-----+ | id|value| +---+-----+ | 2| B| +---+-----+ - Example 3: Intersecting all rows from two DataFrames with mismatched columns - >>> df1 = spark.createDataFrame([(1, 2), (1, 2), (3, 4)], ["A", "B"]) >>> df2 = spark.createDataFrame([(1, 2), (1, 2)], ["C", "D"]) >>> result_df = df1.intersectAll(df2).sort("A", "B") >>> result_df.show() +---+---+ | A| B| +---+---+ | 1| 2| | 1| 2| +---+---+