pyspark.sql.DataFrameReader#

class pyspark.sql.DataFrameReader(spark)[source]#

Interface used to load a DataFrame from external storage systems (e.g. file systems, key-value stores, etc). Use SparkSession.read to access this.

New in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Methods

`csv`(path[, schema, sep, encoding, quote, ...])	Loads a CSV file and returns the result as a `DataFrame`.
`format`(source)	Specifies the input data source format.
`jdbc`(url, table[, column, lowerBound, ...])	Construct a `DataFrame` representing the database table named `table` accessible via JDBC URL `url` and connection `properties`.
`json`(path[, schema, primitivesAsString, ...])	Loads JSON files and returns the results as a `DataFrame`.
`load`([path, format, schema])	Loads data from a data source and returns it as a `DataFrame`.
`option`(key, value)	Adds an input option for the underlying data source.
`options`(**options)	Adds input options for the underlying data source.
`orc`(path[, mergeSchema, pathGlobFilter, ...])	Loads ORC files, returning the result as a `DataFrame`.
`parquet`(paths, *options)	Loads Parquet files, returning the result as a `DataFrame`.
`schema`(schema)	Specifies the input schema.
`table`(tableName)	Returns the specified table as a `DataFrame`.
`text`(paths[, wholetext, lineSep, ...])	Loads text files and returns a `DataFrame` whose schema starts with a string column named "value", and followed by partitioned columns if there are any.
`xml`(path[, rowTag, schema, ...])	Loads a XML file and returns the result as a `DataFrame`.