kyuubi

Kyuubi is an enhanced editon of Apache Spark's primordial Thrift JDBC/ODBC Server.

Kyuubi Containerization Guide

This document gives you a brief description about how Kyuubi submits itself.

Kyuubi supports “client” mode by default, which means that Kyuubi launches a server process on the local machine node and serves client side JDBC/ODBC connections. We need to setup all environments and other preparations for each node for launching Kyuubi server. This is very discommodious to deploy Kyuubi server, especially in HA mode, and even worse when running on different releases of Linux.

Kyuubi containerization is a much more easy for Kyuubi deployment, which makes Kyuubi server instance a containerized, server-less service serving in YARN Container.

Service Model

The above picture shows the whole architecture for Kyuubi containerization. The key concept is simple and obvious, which runs Kyuubi server as YARN container and serve the JDBC/ODBC client remotely. In such an deployment mode, we do not need to configure or even make some customizations for some complicated situations.

We can use the Client to fire a number ofr Kyuubi servers that meet our needs. The containerized Kyuubi server will be maintained in the YARN cluster as a long running service.

Glossary

Name Description
Client Kyuubi YARN Client, with all information we need to deploy Kyuubi
ResourceManager YARN ResourceManager
NodeManager YARN NodeManager
Kyuubi Server Kyuubi server instance wrapped as KyuubiAppMaster launched by YARN as an ApplicationMaster container
Spark AM Spark’s ApplicationMaster, here as the role of ExecutorLauncher
Spark Executor A process launched on a NodeManager, that runs tasks and keeps data in memory or disk storage across them. Each SparkContext has its own executors.
Zookeeper Service Discovery ZooKeeper Dynamic Service Discovery, which is useful in Kyuubi containerization because the port of KyuubiServer frontend service is random picked.
JDBC/ODBC/Thrift Client Various kinds of clients talk to Kyuubi Server

Configurations

The table below contains the server side configurations used by the Kyuubi container itself for launching and sizing itself.

Name Default Description
–deploy-mode client when “cluster” is set, Kyuubi containerization will be enabled
spark.driver.memory 1024m Kyuubi server container heap size
spark.yarn.driver.memoryOverhead spark.drive.memory * 0.1 Overhead memory for Kyuubi server container
spark.driver.cores DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES Kyuubi server container cores
spark.yarn.am.extraJavaOptions (none) Extra jvm options for Kyuubi container

Launching Kyuubi on Yarn

Firstly, please refer to the Kyuubi Deployment Guide on line documentation to learn how to configure the Kyuubi client.

Then, the only thing we need to do is to launch Kyuubi with bin/start-kyuubi.sh and specify the deploy mode to “cluster”.

For example:

$ bin/start-kyuubi.sh \ 
    --master yarn \
    --deploy-mode cluster

At last, a KYUUBI type YARN application named KYUUBI SERVER[version] will be created on the YARN cluster. If we go to the ResourceManager UI, we may see somme thing as follow,

And also, the server log is available to look up through the ApplicationMaster page.

Additional Documentations

Building Kyuubi
Kyuubi Deployment Guide
High Availability Guide
Configuration Guide
Authentication/Security Guide
Kyuubi ACL Management Guide
Kyuubi Architecture
Home Page