pyspark.sql.GroupedData.apply#
- GroupedData.apply(udf)#
- It is an alias of - pyspark.sql.GroupedData.applyInPandas(); however, it takes a- pyspark.sql.functions.pandas_udf()whereas- pyspark.sql.GroupedData.applyInPandas()takes a Python native function.- New in version 2.3.0. - Changed in version 3.4.0: Support Spark Connect. - Parameters
- udfpyspark.sql.functions.pandas_udf()
- a grouped map user-defined function returned by - pyspark.sql.functions.pandas_udf().
 
- udf
 - See also - Notes - It is preferred to use - pyspark.sql.GroupedData.applyInPandas()over this API. This API will be deprecated in the future releases.- Examples - >>> from pyspark.sql.functions import pandas_udf, PandasUDFType >>> df = spark.createDataFrame( ... [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], ... ("id", "v")) >>> @pandas_udf("id long, v double", PandasUDFType.GROUPED_MAP) ... def normalize(pdf): ... v = pdf.v ... return pdf.assign(v=(v - v.mean()) / v.std()) ... >>> df.groupby("id").apply(normalize).show() +---+-------------------+ | id| v| +---+-------------------+ | 1|-0.7071067811865475| | 1| 0.7071067811865475| | 2|-0.8320502943378437| | 2|-0.2773500981126146| | 2| 1.1094003924504583| +---+-------------------+