pyspark.pandas.read_json¶
- 
pyspark.pandas.read_json(path: str, lines: bool = True, index_col: Union[str, List[str], None] = None, **options: Any) → pyspark.pandas.frame.DataFrame[source]¶
- Convert a JSON string to DataFrame. - Parameters
- pathstring
- File path 
- linesbool, default True
- Read the file as a json object per line. It should be always True for now. 
- index_colstr or list of str, optional, default: None
- Index column of table in Spark. 
- optionsdict
- All other options passed directly into Spark’s data source. 
 
 - Examples - >>> df = ps.DataFrame([['a', 'b'], ['c', 'd']], ... columns=['col 1', 'col 2']) - >>> df.to_json(path=r'%s/read_json/foo.json' % path, num_files=1) >>> ps.read_json( ... path=r'%s/read_json/foo.json' % path ... ).sort_values(by="col 1") col 1 col 2 0 a b 1 c d - >>> df.to_json(path=r'%s/read_json/foo.json' % path, num_files=1, lineSep='___') >>> ps.read_json( ... path=r'%s/read_json/foo.json' % path, lineSep='___' ... ).sort_values(by="col 1") col 1 col 2 0 a b 1 c d - You can preserve the index in the roundtrip as below. - >>> df.to_json(path=r'%s/read_json/bar.json' % path, num_files=1, index_col="index") >>> ps.read_json( ... path=r'%s/read_json/bar.json' % path, index_col="index" ... ).sort_values(by="col 1") col 1 col 2 index 0 a b 1 c d