pyspark.pandas.read_parquet#
- pyspark.pandas.read_parquet(path, columns=None, index_col=None, pandas_metadata=False, **options)[source]#
- Load a parquet object from the file path, returning a DataFrame. - Parameters
- pathstring
- File path 
- columnslist, default=None
- If not None, only these columns will be read from the file. 
- index_colstr or list of str, optional, default: None
- Index column of table in Spark. 
- pandas_metadatabool, default: False
- If True, try to respect the metadata if the Parquet file is written from pandas. 
- optionsdict
- All other options passed directly into Spark’s data source. 
 
- Returns
- DataFrame
 
 - See also - DataFrame.to_parquet
- DataFrame.read_table
- DataFrame.read_delta
- DataFrame.read_spark_io
 - Examples - >>> ps.range(1).to_parquet('%s/read_spark_io/data.parquet' % path) >>> ps.read_parquet('%s/read_spark_io/data.parquet' % path, columns=['id']) id 0 0 - You can preserve the index in the roundtrip as below. - >>> ps.range(1).to_parquet('%s/read_spark_io/data.parquet' % path, index_col="index") >>> ps.read_parquet('%s/read_spark_io/data.parquet' % path, columns=['id'], index_col="index") ... id index 0 0