This document gives you a quick overview of how Kyuubi runs on clusters. We first need to distinguish the concept of deploy mode for Kyuubi and Spark in order to describe this topic as clearly as possible.
1. How Kyuubi submit Spark applications(client)
2. How to submit Kyuubi server itself(client/cluster)
Spark supports many kinds of cluster manager types for deploying itself. The cluster manager refers to an external service for acquiring resources on the cluster (e.g. k8s, YARN). Spark applications can be submitted to a cluster in two different kinds of deploy mode distinguished by where the driver process runs. In client mode, the driver is launched outside the cluster, while in cluster mode the driver inside. The driver usually refers to where the SparkContext instance lives.
Different from ordinary Spark applications, Kyuubi manages multiple SparkContext instances in the Kyuubi server JVM. In other words, Kyuubi supports submitting Spark applications only in client mode under current implementation.
For running the Kyuubi server, we also support launching the server instance in two different ways. One is to launch it in a local machine(a.k.a client mode), the other in a YARN Container(a.k.a cluster mode).
Although Spark currently supports several cluster managers, such as Standalone, Apache Mesos, Kubernetes, and Hadoop YARN, we choose the Hadoop YARN as as the first-class support to gain better compatibility and multi tenancy on Hadoop clusters.
Kyuubi cluster mode only support on YARN.
Running Kyuubi on YARN requires:
Make sure that
YARN_CONF_DIR points to the directory which contains the client side configurations.
files for the Hadoop cluster.
For example(in kyuubi-env.sh/spark-env.sh):
These configurations are used to read/write system staging files and data files to HDFS, and connect to the ResourceManager.
SPARK_HOME to identify Spark and other dependencies, so export
To correctly connect the Hive Metastore, we need to configure
If this is the first time to play with Kyuubi, we suggest you that execute
SPARK_HOME/bin/spark-sql and run some test
sql statement to verify the Spark/Yarn/Hive client are all ready and correct at the very beginning.
And then the last, start Kyuubi with
$ bin/start-kyuubi.sh \ --master yarn \ --deploy-mode client \ --driver-memory 10g \ --conf spark.kyuubi.frontend.bind.port=10009
This will launch Kyuubi server at the machine you execute the script.
Please refer to the Configuration Guide in the online documentation for an overview on how to configure Kyuubi.
Please refer to the Kyuubi Containerization Guide in the online documentation to learn how to enable Kyuubi on YARN cluster.