pyspark.pandas.groupby.GroupBy.sum¶

GroupBy.sum(numeric_only: Optional[bool] = True, min_count: int = 0) → FrameLike[source]¶

Compute sum of group values

New in version 3.3.0.

Parameters

numeric_onlybool, default False: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. It takes no effect since only numeric columns can be support here.

New in version 3.4.0.
min_countint, default 0: The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

New in version 3.4.0.

See also

Notes

There is a behavior difference between pandas-on-Spark and pandas:

when there is a non-numeric aggregation column, it will be ignored
even if numeric_only is False.

Examples

>>> df = ps.DataFrame({"A": [1, 2, 1, 2], "B": [True, False, False, True],
...                    "C": [3, 4, 3, 4], "D": ["a", "a", "b", "a"]})

>>> df.groupby("A").sum().sort_index()
   B  C
A
1  1  6
2  1  8

>>> df.groupby("D").sum().sort_index()
   A  B   C
D
a  5  2  11
b  1  0   3

>>> df.groupby("D").sum(min_count=3).sort_index()
     A    B     C
D
a  5.0  2.0  11.0
b  NaN  NaN   NaN

pyspark.pandas.groupby.GroupBy.std

pyspark.pandas.groupby.GroupBy.var