Class DataFrameStatFunctions

Statistic functions for DataFrames.

Index

Constructors

constructor

Methods

cov corr crosstab freqItems sampleBy approxQuantile

Constructors

constructor

new DataFrameStatFunctions(df: DataFrame): DataFrameStatFunctions
Parameters
- df: DataFrame
Returns DataFrameStatFunctions
- Defined in src/org/apache/spark/sql/DataFrameStatFunctions.ts:25

Methods

cov

cov(col1: string, col2: string): Promise<number>
Calculate the sample covariance of two numerical columns of a DataFrame.
Parameters
- col1: string
  the name of the first column
- col2: string
  the name of the second column
Returns Promise<number>
the covariance of the two columns
- Defined in src/org/apache/spark/sql/DataFrameStatFunctions.ts:34

corr

corr(col1: string, col2: string, method?: string): Promise<number>
Calculates the correlation of two columns of a DataFrame. Currently only supports the Pearson Correlation Coefficient.
Parameters
- col1: string
  the name of the first column
- col2: string
  the name of the second column
- Optionalmethod: string
  Optional. Currently only supports 'pearson'
Returns Promise<number>
the Pearson Correlation Coefficient as a double
- Defined in src/org/apache/spark/sql/DataFrameStatFunctions.ts:49

crosstab

crosstab(col1: string, col2: string): DataFrame
Computes a pair-wise frequency table of the given columns. Also known as a contingency table. The first column of each row will be the distinct values of col1 and the column names will be the distinct values of col2. The name of the first column will be col1_col2. Counts will be returned as Longs. Pairs that have no occurrences will have zero as their counts.
Parameters
- col1: string
  The name of the first column. Distinct items will make the first item of each row.
- col2: string
  The name of the second column. Distinct items will make the column names.
Returns DataFrame
A DataFrame containing for the contingency table.
- Defined in src/org/apache/spark/sql/DataFrameStatFunctions.ts:65

freqItems

freqItems(cols: string[], support?: number): DataFrame
Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in "https://doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou". The support should be greater than 1e-4.
Parameters
- cols: string[]
  the names of the columns to search frequent items in
- Optionalsupport: number
  Optional. The minimum frequency for an item to be considered frequent. Should be greater than 1e-4. Default is 1% (0.01).
Returns DataFrame
A Local DataFrame with the frequent items in each column.
- Defined in src/org/apache/spark/sql/DataFrameStatFunctions.ts:79

sampleBy

sampleBy(col: Column, fractions: Map<any, number>, seed?: number): DataFrame
Returns a stratified sample without replacement based on the fraction given on each stratum.
Parameters
- col: Column
  column that defines strata
- fractions: Map<any, number>
  sampling fraction for each stratum. If a stratum is not specified, we treat its fraction as zero.
- Optionalseed: number
  random seed
Returns DataFrame
a new DataFrame that represents the stratified sample
- Defined in src/org/apache/spark/sql/DataFrameStatFunctions.ts:92

approxQuantile

approxQuantile(
    cols: string[],
    probabilities: number[],
    relativeError: number,
): DataFrame
Calculates the approximate quantiles of numerical columns of a DataFrame.

The result will be a DataFrame with the same number of columns as cols, where each column contains the approximate quantiles for the corresponding input column.
Parameters
- cols: string[]
  the names of the numerical columns
- probabilities: number[]
  a list of quantile probabilities. Each number must belong to [0, 1]. For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
- relativeError: number
  The relative target precision to achieve (greater than or equal to 0). If set to zero, the exact quantiles are computed, which could be very expensive. Note that values greater than 1 are accepted but give the same result as 1.
Returns DataFrame
a DataFrame with the approximate quantiles
- Defined in src/org/apache/spark/sql/DataFrameStatFunctions.ts:110

Class DataFrameStatFunctions

Index

Constructors

Methods

Constructors

constructor

Parameters

Returns DataFrameStatFunctions

Methods

cov

Parameters

Returns Promise<number>

corr

Parameters

Returns Promise<number>

crosstab

Parameters

Returns DataFrame

freqItems

Parameters

Returns DataFrame

sampleBy

Parameters

Returns DataFrame

approxQuantile

Parameters

Returns DataFrame

Settings

On This Page