Class Aggregator<IN,BUF,OUT>

Object
org.apache.spark.sql.expressions.Aggregator<IN,BUF,OUT>
All Implemented Interfaces:
Serializable, org.apache.spark.sql.internal.UserDefinedFunctionLike

public abstract class Aggregator<IN,BUF,OUT> extends Object implements Serializable, org.apache.spark.sql.internal.UserDefinedFunctionLike
A base class for user-defined aggregations, which can be used in Dataset operations to take all of the elements of a group and reduce them to a single value.

For example, the following aggregator extracts an int from a specific class and adds them up:


   case class Data(i: Int)

   val customSummer =  new Aggregator[Data, Int, Int] {
     def zero: Int = 0
     def reduce(b: Int, a: Data): Int = b + a.i
     def merge(b1: Int, b2: Int): Int = b1 + b2
     def finish(r: Int): Int = r
     def bufferEncoder: Encoder[Int] = Encoders.scalaInt
     def outputEncoder: Encoder[Int] = Encoders.scalaInt
   }.toColumn()

   val ds: Dataset[Data] = ...
   val aggregated = ds.select(customSummer)
 

Based loosely on Aggregator from Algebird: https://github.com/twitter/algebird

Since:
1.6.0
See Also:
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    abstract Encoder<BUF>
    Specifies the Encoder for the intermediate value type.
    abstract OUT
    finish(BUF reduction)
    Transform the output of the reduction.
    abstract BUF
    merge(BUF b1, BUF b2)
    Merge two intermediate values.
    abstract Encoder<OUT>
    Specifies the Encoder for the final output value type.
    abstract BUF
    reduce(BUF b, IN a)
    Combine two values to produce a new value.
    Returns this Aggregator as a TypedColumn that can be used in Dataset operations.
    abstract BUF
    A zero value for this aggregation.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.spark.sql.internal.UserDefinedFunctionLike

    name
  • Constructor Details

    • Aggregator

      public Aggregator()
  • Method Details

    • bufferEncoder

      public abstract Encoder<BUF> bufferEncoder()
      Specifies the Encoder for the intermediate value type.
      Returns:
      (undocumented)
      Since:
      2.0.0
    • finish

      public abstract OUT finish(BUF reduction)
      Transform the output of the reduction.
      Parameters:
      reduction - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.6.0
    • merge

      public abstract BUF merge(BUF b1, BUF b2)
      Merge two intermediate values.
      Parameters:
      b1 - (undocumented)
      b2 - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.6.0
    • outputEncoder

      public abstract Encoder<OUT> outputEncoder()
      Specifies the Encoder for the final output value type.
      Returns:
      (undocumented)
      Since:
      2.0.0
    • reduce

      public abstract BUF reduce(BUF b, IN a)
      Combine two values to produce a new value. For performance, the function may modify b and return it instead of constructing new object for b.
      Parameters:
      b - (undocumented)
      a - (undocumented)
      Returns:
      (undocumented)
      Since:
      1.6.0
    • toColumn

      public TypedColumn<IN,OUT> toColumn()
      Returns this Aggregator as a TypedColumn that can be used in Dataset operations.
      Returns:
      (undocumented)
      Since:
      1.6.0
    • zero

      public abstract BUF zero()
      A zero value for this aggregation. Should satisfy the property that any b + zero = b.
      Returns:
      (undocumented)
      Since:
      1.6.0