pyspark.pandas.DataFrame.cumprod#
- DataFrame.cumprod(skipna=True)#
- Return cumulative product over a DataFrame or Series axis. - Returns a DataFrame or Series of the same size containing the cumulative product. - Note - the current implementation of cumprod uses Spark’s Window without specifying partition specification. This leads to moveing all data into a single partition in a single machine and could cause serious performance degradation. Avoid this method with very large datasets. - Note - unlike pandas’, pandas-on-Spark’s emulates cumulative product by - exp(sum(log(...)))trick. Therefore, it only works for positive numbers.- Parameters
- skipna: boolean, default True
- Exclude NA/null values. If an entire row/column is NA, the result will be NA. 
 
- Returns
- DataFrame or Series
 
- Raises
- Exception: If the values is equal to or lower than 0.
 
 - See also - DataFrame.cummax
- Return cumulative maximum over DataFrame axis. 
- DataFrame.cummin
- Return cumulative minimum over DataFrame axis. 
- DataFrame.cumsum
- Return cumulative sum over DataFrame axis. 
- DataFrame.cumprod
- Return cumulative product over DataFrame axis. 
- Series.cummax
- Return cumulative maximum over Series axis. 
- Series.cummin
- Return cumulative minimum over Series axis. 
- Series.cumsum
- Return cumulative sum over Series axis. 
- Series.cumprod
- Return cumulative product over Series axis. 
 - Examples - >>> df = ps.DataFrame([[2.0, 1.0], [3.0, None], [4.0, 10.0]], columns=list('AB')) >>> df A B 0 2.0 1.0 1 3.0 NaN 2 4.0 10.0 - By default, iterates over rows and finds the sum in each column. - >>> df.cumprod() A B 0 2.0 1.0 1 6.0 NaN 2 24.0 10.0 - It works identically in Series. - >>> df.A.cumprod() 0 2.0 1 6.0 2 24.0 Name: A, dtype: float64