pyspark.pandas.groupby.GroupBy.sum¶
- 
GroupBy.sum(numeric_only: Optional[bool] = True, min_count: int = 0) → FrameLike[source]¶
- Compute sum of group values - New in version 3.3.0. - Parameters
- numeric_onlybool, default False
- Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. It takes no effect since only numeric columns can be support here. - New in version 3.4.0. 
- min_countint, default 0
- The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA. - New in version 3.4.0. 
 
 - Notes - There is a behavior difference between pandas-on-Spark and pandas: - when there is a non-numeric aggregation column, it will be ignored
- even if numeric_only is False. 
 
 - Examples - >>> df = ps.DataFrame({"A": [1, 2, 1, 2], "B": [True, False, False, True], ... "C": [3, 4, 3, 4], "D": ["a", "a", "b", "a"]}) - >>> df.groupby("A").sum().sort_index() B C A 1 1 6 2 1 8 - >>> df.groupby("D").sum().sort_index() A B C D a 5 2 11 b 1 0 3 - >>> df.groupby("D").sum(min_count=3).sort_index() A B C D a 5.0 2.0 11.0 b NaN NaN NaN