pyspark.sql.DataFrameReader#
- class pyspark.sql.DataFrameReader(spark)[source]#
- Interface used to load a - DataFramefrom external storage systems (e.g. file systems, key-value stores, etc). Use- SparkSession.readto access this.- New in version 1.4.0. - Changed in version 3.4.0: Supports Spark Connect. - Methods - csv(path[, schema, sep, encoding, quote, ...])- Loads a CSV file and returns the result as a - DataFrame.- format(source)- Specifies the input data source format. - jdbc(url, table[, column, lowerBound, ...])- Construct a - DataFramerepresenting the database table named- tableaccessible via JDBC URL- urland connection- properties.- json(path[, schema, primitivesAsString, ...])- Loads JSON files and returns the results as a - DataFrame.- load([path, format, schema])- Loads data from a data source and returns it as a - DataFrame.- option(key, value)- Adds an input option for the underlying data source. - options(**options)- Adds input options for the underlying data source. - orc(path[, mergeSchema, pathGlobFilter, ...])- Loads ORC files, returning the result as a - DataFrame.- parquet(*paths, **options)- Loads Parquet files, returning the result as a - DataFrame.- schema(schema)- Specifies the input schema. - table(tableName)- Returns the specified table as a - DataFrame.- text(paths[, wholetext, lineSep, ...])- Loads text files and returns a - DataFramewhose schema starts with a string column named "value", and followed by partitioned columns if there are any.- xml(path[, rowTag, schema, ...])- Loads a XML file and returns the result as a - DataFrame.