A set of methods for aggregations on a DataFrame, created by [[Dataset#groupBy groupBy]], [[Dataset#cube cube]] or [[Dataset#rollup rollup]] (and also pivot).

Constructors

Properties

groupingExprs: string[] | Column[]
groupType: GroupType
pivotValue: undefined | Aggregate_Pivot = undefined
groupingSets: Column[][] = []

Methods

  • Pivots a column of the current DataFrame and performs the specified aggregation.

    This method is only supported after a groupBy operation. There are two versions of pivot: one with explicit pivot values and one without.

    Parameters

    • pivotColumn: string | Column

      Column name or Column to pivot on

    • Optionalvalues: any[]

      Optional list of values that will be translated to columns in the output DataFrame

    Returns RelationalGroupedDataset

    A new RelationalGroupedDataset with pivot configuration

    // Pivot without values (Spark will compute distinct values)
    df.groupBy("year").pivot("course").sum("earnings")

    // Pivot with explicit values (more efficient)
    df.groupBy("year").pivot("course", ["dotNET", "Java"]).sum("earnings")

typedrel

  • Apply a function to each group of the DataFrame.

    This method applies a user-defined function to each group. The function receives the group key and an iterator of rows for that group, and should return an iterator of rows.

    Parameters

    • pythonCode: string

      Python code as a string defining the group processing function

    • outputSchema: StructType

      The output schema for the transformed DataFrame

    • pythonVersion: string = '3.11'

      Python version (default: '3.11')

    Returns DataFrame

    A new DataFrame with the function applied to each group

    const pythonCode = `
    def group_func(key, rows):
    total = sum(row.value for row in rows)
    yield (key.category, total)
    `;
    const schema = DataTypes.createStructType([
    DataTypes.createStructField('category', DataTypes.StringType, false),
    DataTypes.createStructField('total', DataTypes.IntegerType, false),
    ]);
    const result = df.groupBy('category').groupMap(pythonCode, schema);