pyspark.pandas.DataFrame.kde¶
- 
DataFrame.kde(bw_method=None, ind=None, **kwds)[source]¶
- Generate Kernel Density Estimate plot using Gaussian kernels. - Parameters
- bw_methodscalar
- The method used to calculate the estimator bandwidth. See KernelDensity in PySpark for more information. 
- indNumPy array or integer, optional
- Evaluation points for the estimated PDF. If None (default), 1000 equally spaced points are used. If ind is a NumPy array, the KDE is evaluated at the points passed. If ind is an integer, ind number of equally spaced points are used. 
- **kwargsoptional
- Keyword arguments to pass on to - pandas-on-Spark.Series.plot().
 
- Returns
- plotly.graph_objs.Figure
- Return an custom object when - backend!=plotly. Return an ndarray when- subplots=True(matplotlib-only).
 
 - Examples - A scalar bandwidth should be specified. Using a small bandwidth value can lead to over-fitting, while using a large bandwidth value may result in under-fitting: - >>> s = ps.Series([1, 2, 2.5, 3, 3.5, 4, 5]) >>> s.plot.kde(bw_method=0.3) - >>> s = ps.Series([1, 2, 2.5, 3, 3.5, 4, 5]) >>> s.plot.kde(bw_method=3) - The ind parameter determines the evaluation points for the plot of the estimated KDF: - >>> s = ps.Series([1, 2, 2.5, 3, 3.5, 4, 5]) >>> s.plot.kde(ind=[1, 2, 3, 4, 5], bw_method=0.3) - For DataFrame, it works in the same way as Series: - >>> df = ps.DataFrame({ ... 'x': [1, 2, 2.5, 3, 3.5, 4, 5], ... 'y': [4, 4, 4.5, 5, 5.5, 6, 6], ... }) >>> df.plot.kde(bw_method=0.3) - >>> df = ps.DataFrame({ ... 'x': [1, 2, 2.5, 3, 3.5, 4, 5], ... 'y': [4, 4, 4.5, 5, 5.5, 6, 6], ... }) >>> df.plot.kde(bw_method=3) - >>> df = ps.DataFrame({ ... 'x': [1, 2, 2.5, 3, 3.5, 4, 5], ... 'y': [4, 4, 4.5, 5, 5.5, 6, 6], ... }) >>> df.plot.kde(ind=[1, 2, 3, 4, 5, 6], bw_method=0.3)