pyspark.SparkContext.sequenceFile¶
- 
SparkContext.sequenceFile(path, keyClass=None, valueClass=None, keyConverter=None, valueConverter=None, minSplits=None, batchSize=0)[source]¶
- Read a Hadoop SequenceFile with arbitrary key and value Writable class from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI. The mechanism is as follows: - A Java RDD is created from the SequenceFile or other InputFormat, and the key and value Writable classes 
- Serialization is attempted via Pyrolite pickling 
- If this fails, the fallback is to call ‘toString’ on each key and value 
- PickleSerializeris used to deserialize pickled objects on the Python side
 - Parameters
- pathstr
- path to sequencefile 
- keyClass: str, optional
- fully qualified classname of key Writable class (e.g. “org.apache.hadoop.io.Text”) 
- valueClassstr, optional
- fully qualified classname of value Writable class (e.g. “org.apache.hadoop.io.LongWritable”) 
- keyConverterstr, optional
- fully qualified name of a function returning key WritableConverter 
- valueConverterstr, optional
- fully qualifiedname of a function returning value WritableConverter 
- minSplitsint, optional
- minimum splits in dataset (default min(2, sc.defaultParallelism)) 
- batchSizeint, optional
- The number of Python objects represented as a single Java object. (default 0, choose batchSize automatically)