pyspark.sql.functions.array#
- pyspark.sql.functions.array(*cols)[source]#
Collection function: Creates a new array column from the input columns or column names.
New in version 1.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- Returns
Column
A new Column of array type, where each value is an array containing the corresponding values from the input columns.
Examples
Example 1: Basic usage of array function with column names.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("Alice", "doctor"), ("Bob", "engineer")], ... ("name", "occupation")) >>> df.select(sf.array('name', 'occupation').alias("arr")).show() +---------------+ | arr| +---------------+ |[Alice, doctor]| |[Bob, engineer]| +---------------+
Example 2: Usage of array function with Column objects.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("Alice", "doctor"), ("Bob", "engineer")], ... ("name", "occupation")) >>> df.select(sf.array(df.name, df.occupation).alias("arr")).show() +---------------+ | arr| +---------------+ |[Alice, doctor]| |[Bob, engineer]| +---------------+
Example 3: Single argument as list of column names.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("Alice", "doctor"), ("Bob", "engineer")], ... ("name", "occupation")) >>> df.select(sf.array(['name', 'occupation']).alias("arr")).show() +---------------+ | arr| +---------------+ |[Alice, doctor]| |[Bob, engineer]| +---------------+
Example 4: Usage of array function with columns of different types.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame( ... [("Alice", 2, 22.2), ("Bob", 5, 36.1)], ... ("name", "age", "weight")) >>> df.select(sf.array(['age', 'weight']).alias("arr")).show() +-----------+ | arr| +-----------+ |[2.0, 22.2]| |[5.0, 36.1]| +-----------+
Example 5: array function with a column containing null values.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("Alice", None), ("Bob", "engineer")], ... ("name", "occupation")) >>> df.select(sf.array('name', 'occupation').alias("arr")).show() +---------------+ | arr| +---------------+ | [Alice, NULL]| |[Bob, engineer]| +---------------+