pyspark.pandas.DataFrame.boxplot#

DataFrame.boxplot(**kwds)[source]#

Make a box plot of the Series columns.

Parameters

**kwdsoptional: Additional keyword arguments are documented in pyspark.pandas.Series.plot().
precision: scalar, default = 0.01: This argument is used by pandas-on-Spark to compute approximate statistics for building a boxplot. Use smaller values to get more precise statistics (matplotlib-only).

Returns

plotly.graph_objs.Figure: Return an custom object when backend!=plotly. Return an ndarray when subplots=True (matplotlib-only).

Notes

There are behavior differences between pandas-on-Spark and pandas.

pandas-on-Spark computes approximate statistics - expect differences between pandas and pandas-on-Spark boxplots, especially regarding 1st and 3rd quartiles.
The whis argument is only supported as a single number.
pandas-on-Spark doesn’t support the following argument(s) (matplotlib-only).
- bootstrap argument is not supported
- autorange argument is not supported

Examples

Draw a box plot from a DataFrame with four columns of randomly generated data.

For Series:

>>> data = np.random.randn(25, 4)
>>> df = ps.DataFrame(data, columns=list('ABCD'))
>>> df['A'].plot.box()  

This is an unsupported function for DataFrame type