Class SparkSession

The entry point to programming Spark with the Dataset and DataFrame API.

Remarks

A SparkSession can be used to create DataFrames, register DataFrames as tables, execute SQL over tables, cache tables, and read data from various sources. A SparkSession is created using the SparkSession.builder method.

The SparkSession provides access to:

sql - Execute SQL queries and return results as DataFrames
read - Read data from external sources (CSV, JSON, Parquet, tables, etc.)
createDataFrame - Create DataFrames from in-memory data
table - Load tables as DataFrames
range - Generate DataFrames with sequences of numbers
catalog - Access the Spark catalog for metadata operations
udf - Register and manage user-defined functions
conf - Configure Spark runtime settings

Example

// Create a SparkSession
const spark = await SparkSession.builder()
  .remote("sc://localhost:15002")
  .appName("MyApp")
  .getOrCreate();

// Execute SQL
const df = await spark.sql("SELECT * FROM users WHERE age > 21");
await df.show();

// Read data
const csvDf = spark.read.csv("data.csv");

// Create DataFrame from data
const data = [new Row(schema, ["Alice", 30]), new Row(schema, ["Bob", 25])];
const df2 = spark.createDataFrame(data, schema);

Since

1.0.0

Index

builder

config

conf

core

session_id version emptyDataFrame sql createDataFrame createDataFrameFromArrowTable range

io

read table

Constructors

constructor

new SparkSession(client: Client): SparkSession
Parameters
- client: Client
Returns SparkSession
- Defined in src/org/apache/spark/sql/SparkSession.ts:91

Properties

`Readonly`catalog

catalog: Catalog = ...

`Readonly`udf

udf: UDFRegistration = ...

client

client: Client

Methods

planFromRelation

planFromRelation(relation: Relation): LogicalPlan
Parameters
- relation: Relation
Returns LogicalPlan
- Defined in src/org/apache/spark/sql/SparkSession.ts:435

dataFrameFromRelation

dataFrameFromRelation(relation: Relation): DataFrame
Parameters
- relation: Relation
Returns DataFrame
- Defined in src/org/apache/spark/sql/SparkSession.ts:454

builder

`Static`builder

builder(): SparkSessionBuilder
Creates a SparkSessionBuilder for constructing a SparkSession.

Returns SparkSessionBuilder
A new SparkSessionBuilder instance
Example
```
const spark = await SparkSession.builder()
  .remote("sc://localhost:15002")
  .appName("MyApplication")
  .config("spark.sql.shuffle.partitions", "10")
  .getOrCreate();
```
- Defined in src/org/apache/spark/sql/SparkSession.ts:111

config

conf

get conf(): RuntimeConfig
Runtime configuration interface for this SparkSession.

Returns RuntimeConfig
The RuntimeConfig instance for getting/setting Spark configuration
Example
```
await spark.conf.set("spark.sql.shuffle.partitions", "200");
const value = await spark.conf.get("spark.sql.shuffle.partitions");
```
- Defined in src/org/apache/spark/sql/SparkSession.ts:158

core

session_id

session_id(): string
Returns the session ID for this SparkSession.

Returns string
The unique session identifier
- Defined in src/org/apache/spark/sql/SparkSession.ts:121

version

version(): Promise<string>
Returns the Spark version this SparkSession is connected to.

Returns Promise<string>
A promise that resolves to the Spark version string
Example
```
const version = await spark.version();
console.log(`Connected to Spark ${version}`);
```
- Defined in src/org/apache/spark/sql/SparkSession.ts:136

emptyDataFrame

get emptyDataFrame(): DataFrame
Returns an empty DataFrame.

Returns DataFrame
A DataFrame with no rows or columns
Example
```
const empty = spark.emptyDataFrame;
const isEmpty = await empty.isEmpty(); // true
```
- Defined in src/org/apache/spark/sql/SparkSession.ts:173

sql

sql(sqlStr: string): Promise<DataFrame>
Executes a SQL query and returns the result as a DataFrame.
Parameters
- sqlStr: string
  The SQL query string to execute
Returns Promise<DataFrame>
A promise that resolves to a DataFrame containing the query results
Example
```
const df = await spark.sql("SELECT * FROM users WHERE age > 21");
await df.show();

// Use with table joins
const joined = await spark.sql(`
  SELECT u.name, o.amount 
  FROM users u 
  JOIN orders o ON u.id = o.user_id
`);
```
- Defined in src/org/apache/spark/sql/SparkSession.ts:199

createDataFrame

createDataFrame(data: Row[], schema: StructType): DataFrame
Creates a DataFrame from an array of Row objects with a specified schema.
Parameters
- data: Row[]
  An array of Row objects
- schema: StructType
  The schema defining the structure of the DataFrame
Returns DataFrame
A new DataFrame containing the provided data
Example
```
const schema = new StructType([
  new StructField("name", DataTypes.StringType),
  new StructField("age", DataTypes.IntegerType)
]);

const data = [
  new Row(schema, { name: "Alice", age: 30 }),
  new Row(schema, { name: "Bob", age: 25 })
];

const df = spark.createDataFrame(data, schema);
await df.show();
```
- Defined in src/org/apache/spark/sql/SparkSession.ts:266

createDataFrameFromArrowTable

createDataFrameFromArrowTable(table: Table, schema: StructType): DataFrame
Creates a DataFrame from an Apache Arrow Table.
Parameters
- table: Table
  An Apache Arrow Table object
- schema: StructType
  The Spark schema defining the structure of the DataFrame
Returns DataFrame
A new DataFrame containing the Arrow table data
Example
```
import { Table } from 'apache-arrow';

// Assuming you have an Arrow table
const arrowTable = ...; // Arrow Table
const schema = new StructType([...]);

const df = spark.createDataFrameFromArrowTable(arrowTable, schema);
await df.show();
```
- Defined in src/org/apache/spark/sql/SparkSession.ts:292

range

range(end: number | bigint): DataFrame
Creates a DataFrame with a single column of Long values starting from 0 to end (exclusive).
Parameters
- end: number | bigint
  The end value (exclusive)
Returns DataFrame
A DataFrame with a single column named "id" containing values from 0 to end-1
- Defined in src/org/apache/spark/sql/SparkSession.ts:344
range(start: number | bigint, end: number | bigint): DataFrame
Creates a DataFrame with a single column of Long values.
Parameters
- start: number | bigint
  The start value (inclusive)
- end: number | bigint
  The end value (exclusive)
Returns DataFrame
A DataFrame with a single column named "id" containing values from start to end-1
- Defined in src/org/apache/spark/sql/SparkSession.ts:354
range(
    start: number | bigint,
    end: number | bigint,
    step: number | bigint,
): DataFrame
Creates a DataFrame with a single column of Long values with a custom step.
Parameters
- start: number | bigint
  The start value (inclusive)
- end: number | bigint
  The end value (exclusive)
- step: number | bigint
  The increment between consecutive values
Returns DataFrame
A DataFrame with a single column named "id"
- Defined in src/org/apache/spark/sql/SparkSession.ts:365
range(
    start: number | bigint,
    end: number | bigint,
    step: number | bigint,
    numPartitions: number,
): DataFrame
Creates a DataFrame with a single column of Long values with custom step and partitions.
Parameters
- start: number | bigint
  The start value (inclusive)
- end: number | bigint
  The end value (exclusive)
- step: number | bigint
  The increment between consecutive values
- numPartitions: number
  The number of partitions for the resulting DataFrame
Returns DataFrame
A DataFrame with a single column named "id"
Example
```
// Generate 0 to 9
const df1 = spark.range(10);

// Generate 5 to 14
const df2 = spark.range(5, 15);

// Generate 0, 2, 4, 6, 8
const df3 = spark.range(0, 10, 2);

// Generate with 4 partitions
const df4 = spark.range(0, 100, 1, 4);
```
- Defined in src/org/apache/spark/sql/SparkSession.ts:392

io

read

get read(): DataFrameReader

Returns a DataFrameReader for reading data from external sources.

Returns DataFrameReader

A DataFrameReader instance

Example

// Read CSV
const df = spark.read.csv("data.csv");

// Read JSON
const df2 = spark.read.json("data.json");

// Read Parquet with options
const df3 = spark.read
  .option("mergeSchema", "true")
  .parquet("data.parquet");

// Read from table
const df4 = spark.read.table("my_table");

table

table(name: string): DataFrame
Returns the specified table as a DataFrame.
Parameters
- name: string
  The name of the table to load
Returns DataFrame
A DataFrame representing the table
Example
```
const users = spark.table("users");
await users.show();

// With database qualifier
const logs = spark.table("production.logs");
```
- Defined in src/org/apache/spark/sql/SparkSession.ts:332

Class SparkSession

Remarks

Example

Since

Index

Constructors

Properties

Methods

builder

config

core

io

Constructors

constructor

Parameters

Returns SparkSession

Properties

Readonlycatalog

Readonlyudf

client

Methods

planFromRelation

Parameters

Returns LogicalPlan

dataFrameFromRelation

Parameters

Returns DataFrame

builder

Staticbuilder

Returns SparkSessionBuilder

Example

config

conf

Returns RuntimeConfig

Example

core

session_id

Returns string

version

Returns Promise<string>

Example

emptyDataFrame

Returns DataFrame

Example

sql

Parameters

Returns Promise<DataFrame>

Example

createDataFrame

Parameters

Returns DataFrame

Example

createDataFrameFromArrowTable

Parameters

Returns DataFrame

Example

range

Parameters

Returns DataFrame

Parameters

Returns DataFrame

Parameters

Returns DataFrame

Parameters

Returns DataFrame

Example

io

read

Returns DataFrameReader

Example

table

Parameters

Returns DataFrame

Example

Settings

On This Page

`Readonly`catalog

`Readonly`udf

`Static`builder