pyspark.sql.DataFrame.join¶
- 
DataFrame.join(other, on=None, how=None)[source]¶
- Joins with another - DataFrame, using the given join expression.- New in version 1.3.0. - Parameters
- otherDataFrame
- Right side of the join 
- onstr, list or Column, optional
- a string for the join column name, a list of column names, a join expression (Column), or a list of Columns. If on is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join. 
- howstr, optional
- default - inner. Must be one of:- inner,- cross,- outer,- full,- fullouter,- full_outer,- left,- leftouter,- left_outer,- right,- rightouter,- right_outer,- semi,- leftsemi,- left_semi,- anti,- leftantiand- left_anti.
 
- other
 - Examples - The following performs a full outer join between - df1and- df2.- >>> from pyspark.sql.functions import desc >>> df.join(df2, df.name == df2.name, 'outer').select(df.name, df2.height) .sort(desc("name")).collect() [Row(name='Bob', height=85), Row(name='Alice', height=None), Row(name=None, height=80)] - >>> df.join(df2, 'name', 'outer').select('name', 'height').sort(desc("name")).collect() [Row(name='Tom', height=80), Row(name='Bob', height=85), Row(name='Alice', height=None)] - >>> cond = [df.name == df3.name, df.age == df3.age] >>> df.join(df3, cond, 'outer').select(df.name, df3.age).collect() [Row(name='Alice', age=2), Row(name='Bob', age=5)] - >>> df.join(df2, 'name').select(df.name, df2.height).collect() [Row(name='Bob', height=85)] - >>> df.join(df4, ['name', 'age']).select(df.name, df.age).collect() [Row(name='Bob', age=5)]