pyspark.sql.DataFrameNaFunctions.drop#
- DataFrameNaFunctions.drop(how='any', thresh=None, subset=None)[source]#
- Returns a new - DataFrameomitting rows with null values.- DataFrame.dropna()and- DataFrameNaFunctions.drop()are aliases of each other.- New in version 1.3.1. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- howstr, optional, the values that can be ‘any’ or ‘all’, default ‘any’.
- If ‘any’, drop a row if it contains any nulls. If ‘all’, drop a row only if all its values are null. 
- thresh: int, optional, default None.
- If specified, drop rows that have less than thresh non-null values. This overwrites the how parameter. 
- subsetstr, tuple or list, optional
- optional list of column names to consider. 
 
- Returns
- DataFrame
- DataFrame with null only rows excluded. 
 
 - Examples - >>> from pyspark.sql import Row >>> df = spark.createDataFrame([ ... Row(age=10, height=80, name="Alice"), ... Row(age=5, height=None, name="Bob"), ... Row(age=None, height=None, name="Tom"), ... Row(age=None, height=None, name=None), ... ]) - Example 1: Drop the row if it contains any nulls. - >>> df.na.drop().show() +---+------+-----+ |age|height| name| +---+------+-----+ | 10| 80|Alice| +---+------+-----+ - Example 2: Drop the row only if all its values are null. - >>> df.na.drop(how='all').show() +----+------+-----+ | age|height| name| +----+------+-----+ | 10| 80|Alice| | 5| NULL| Bob| |NULL| NULL| Tom| +----+------+-----+ - Example 3: Drop rows that have less than thresh non-null values. - >>> df.na.drop(thresh=2).show() +---+------+-----+ |age|height| name| +---+------+-----+ | 10| 80|Alice| | 5| NULL| Bob| +---+------+-----+ - Example 4: Drop rows with non-null values in the specified columns. - >>> df.na.drop(subset=['age', 'name']).show() +---+------+-----+ |age|height| name| +---+------+-----+ | 10| 80|Alice| | 5| NULL| Bob| +---+------+-----+