pyspark.pandas.Series.spark.apply¶
- 
spark.apply(func: Callable[[pyspark.sql.column.Column], pyspark.sql.column.Column]) → ps.Series¶
- Applies a function that takes and returns a Spark column. It allows to natively apply a Spark function and column APIs with the Spark column internally used in Series or Index. - Note - It forces to lose the index and end up with using default index. It is preferred to use - Series.spark.transform()or :meth:`DataFrame.spark.apply with specifying the inedx_col.- Note - It does not require to have the same length of the input and output. However, it requires to create a new DataFrame internally which will require to set compute.ops_on_diff_frames to compute even with the same origin DataFrame that is expensive, whereas - Series.spark.transform()does not require it.- Parameters
- funcfunction
- Function to apply the function against the data by using Spark columns. 
 
- Returns
- Series
 
- Raises
- ValueErrorIf the output from the function is not a Spark column.
 
 - Examples - >>> from pyspark import pandas as ps >>> from pyspark.sql.functions import count, lit >>> df = ps.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, columns=["a", "b"]) >>> df a b 0 1 4 1 2 5 2 3 6 - >>> df.a.spark.apply(lambda c: count(c)) 0 3 Name: a, dtype: int64 - >>> df.a.spark.apply(lambda c: c + df.b.spark.column) 0 5 1 7 2 9 Name: a, dtype: int64