kyuubi

Kyuubi is an enhanced editon of Apache Spark's primordial Thrift JDBC/ODBC Server.

Kyuubi Deployment Guide

This document gives you a quick overview of how Kyuubi runs on clusters. We first need to distinguish the concept of deploy mode for Kyuubi and Spark in order to describe this topic as clearly as possible.

1. How Kyuubi submit Spark applications(client)

2. How to submit Kyuubi server itself(client/cluster)

Spark supports many kinds of cluster manager types for deploying itself. The cluster manager refers to an external service for acquiring resources on the cluster (e.g. k8s, YARN). Spark applications can be submitted to a cluster in two different kinds of deploy mode distinguished by where the driver process runs. In client mode, the driver is launched outside the cluster, while in cluster mode the driver inside. The driver usually refers to where the SparkContext instance lives.

Different from ordinary Spark applications, Kyuubi manages multiple SparkContext instances in the Kyuubi server JVM. In other words, Kyuubi supports submitting Spark applications only in client mode under current implementation.

For running the Kyuubi server, we also support launching the server instance in two different ways. One is to launch it in a local machine(a.k.a client mode), the other in a YARN Container(a.k.a cluster mode).

Cluster Manager

Although Spark currently supports several cluster managers, such as Standalone, Apache Mesos, Kubernetes, and Hadoop YARN, we choose the Hadoop YARN as as the first-class support to gain better compatibility and multi tenancy on Hadoop clusters.

Kyuubi cluster mode only support on YARN.

Launching Kyuubi at local

Preparations

Running Kyuubi on YARN requires:

Configurations

Make sure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the client side configurations. files for the Hadoop cluster.

For example(in kyuubi-env.sh/spark-env.sh):

export HADOOP_CONF_DIR=/path/to/hadoop/conf

These configurations are used to read/write system staging files and data files to HDFS, and connect to the ResourceManager.

Kyuubi relays SPARK_HOME to identify Spark and other dependencies, so export SPARK_HOME in $KYUUBI_HOME/bin/kyuubi-env.sh

export SPARK_HOME=/the/path/to/a/runable/spark/binary/dir

To correctly connect the Hive Metastore, we need to configure hive-site.xml in SPARK_HOME/conf directory.

Startup

If this is the first time to play with Kyuubi, we suggest you that execute SPARK_HOME/bin/spark-sql and run some test sql statement to verify the Spark/Yarn/Hive client are all ready and correct at the very beginning.

And then the last, start Kyuubi with bin/start-kyuubi.sh

$ bin/start-kyuubi.sh \ 
    --master yarn \
    --deploy-mode client \
    --driver-memory 10g \
    --conf spark.kyuubi.frontend.bind.port=10009

This will launch Kyuubi server at the machine you execute the script.

Additions

Please refer to the Configuration Guide in the online documentation for an overview on how to configure Kyuubi.

Please refer to the Kyuubi Containerization Guide in the online documentation to learn how to enable Kyuubi on YARN cluster.

Additional Documentations

Building Kyuubi
Configuration Guide
Kyuubi Containerization Guide
High Availability Guide
Authentication/Security Guide
Kyuubi ACL Management Guide
Kyuubi Architecture
Home Page