pyspark.sql.functions.split¶
- 
pyspark.sql.functions.split(str: ColumnOrName, pattern: str, limit: int = - 1) → pyspark.sql.column.Column[source]¶
- Splits str around matches of the given pattern. - New in version 1.5.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- strColumnor str
- a string expression to split 
- patternstr
- a string representing a regular expression. The regex string should be a Java regular expression. 
- limitint, optional
- an integer which controls the number of times pattern is applied. - limit > 0: The resulting array’s length will not be more than limit, and the
- resulting array’s last entry will contain all input beyond the last matched pattern. 
 
- limit <= 0: pattern will be applied as many times as possible, and the resulting
- array can be of any size. 
 
 - Changed in version 3.0: split now takes an optional limit field. If not provided, default limit value is -1. 
 
- str
- Returns
- Column
- array of separated strings. 
 
 - Examples - >>> df = spark.createDataFrame([('oneAtwoBthreeC',)], ['s',]) >>> df.select(split(df.s, '[ABC]', 2).alias('s')).collect() [Row(s=['one', 'twoBthreeC'])] >>> df.select(split(df.s, '[ABC]', -1).alias('s')).collect() [Row(s=['one', 'two', 'three', ''])]