pyspark.sql.DataFrame.pandas_api#
- DataFrame.pandas_api(index_col=None)[source]#
- Converts the existing DataFrame into a pandas-on-Spark DataFrame. - New in version 3.2.0. - Changed in version 3.5.0: Supports Spark Connect. - If a pandas-on-Spark DataFrame is converted to a Spark DataFrame and then back to pandas-on-Spark, it will lose the index information and the original index will be turned into a normal column. - This is only available if Pandas is installed and available. - Parameters
- index_col: str or list of str, optional
- Index column of table in Spark. 
 
- Returns
- PandasOnSparkDataFrame
 
 - See also - pyspark.pandas.frame.DataFrame.to_spark
 - Examples - >>> df = spark.createDataFrame( ... [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) - >>> df.pandas_api() age name 0 14 Tom 1 23 Alice 2 16 Bob - We can specify the index columns. - >>> df.pandas_api(index_col="age") name age 14 Tom 23 Alice 16 Bob