pyspark.pandas.DataFrame.cumsum#
- DataFrame.cumsum(skipna=True)#
- Return cumulative sum over a DataFrame or Series axis. - Returns a DataFrame or Series of the same size containing the cumulative sum. - Note - the current implementation of cumsum uses Spark’s Window without specifying partition specification. This leads to moveing all data into a single partition in a single machine and could cause serious performance degradation. Avoid this method with very large datasets. - Parameters
- skipna: boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA. 
 
- Returns
- DataFrame or Series
 
 - See also - DataFrame.sum
- Return the sum over DataFrame axis. 
- DataFrame.cummax
- Return cumulative maximum over DataFrame axis. 
- DataFrame.cummin
- Return cumulative minimum over DataFrame axis. 
- DataFrame.cumsum
- Return cumulative sum over DataFrame axis. 
- DataFrame.cumprod
- Return cumulative product over DataFrame axis. 
- Series.sum
- Return the sum over Series axis. 
- Series.cummax
- Return cumulative maximum over Series axis. 
- Series.cummin
- Return cumulative minimum over Series axis. 
- Series.cumsum
- Return cumulative sum over Series axis. 
- Series.cumprod
- Return cumulative product over Series axis. 
 - Examples - >>> df = ps.DataFrame([[2.0, 1.0], [3.0, None], [1.0, 0.0]], columns=list('AB')) >>> df A B 0 2.0 1.0 1 3.0 NaN 2 1.0 0.0 - By default, iterates over rows and finds the sum in each column. - >>> df.cumsum() A B 0 2.0 1.0 1 5.0 NaN 2 6.0 1.0 - It works identically in Series. - >>> df.A.cumsum() 0 2.0 1 5.0 2 6.0 Name: A, dtype: float64