Kyuubi is an enhanced edition of the Apache Spark's primordial Thrift JDBC/ODBC Server. It is mainly designed for directly running SQL towards a cluster with all components including HDFS, YARN, Hive MetaStore, and itself secured. Kyuubi is a Spark SQL thrift service with end-to-end multi tenant guaranteed. Please go to Kyuubi Architecture to learn more if you are interested.
Basically, the Thrift JDBC/ODBC Server as a similar ad-hoc SQL query service of Apache Hive's HiveServer2 for Spark SQL, acts as a distributed query engine using its JDBC/ODBC or command-line interface. In this mode, end-users or applications can interact with Spark SQL directly to run SQL queries, without the need to write any code. We can make pretty business reports with massive data using some BI tools which supported JDBC/ODBC connections, such as Tableau, NetEase YouData and so on. Profiting from Apache Spark's capability, we can archive much more performance improvement than Apache Hive as a SQL on Hadoop service.
But unfortunately, due to the limitations of Spark's own architecture，to be used as an enterprise-class product, there are a number of problems compared with HiveServer2，such as multi-tenant isolation, authentication/authorization, high concurrency, high availability, and so on. And the Apache Spark community's support for this module has been in a state of prolonged stagnation.
Kyuubi has enhanced the Thrift JDBC/ODBC Server in some ways for solving these existing problems, as shown in the following table.
|Features||Spark Thrift Server||Kyuubi||Comments|
|multiple SparkContext||✘||✔||User tagged SparkContext|
|lazy SparkContext||✘||✔||Session level SparkContext|
|SparkContext cache||✘||✔||SparkContext Cache Management|
|dynamic queue||✘||✔||Kyuubi identifies
|session level configurations||
||✔||Dynamic Resource Requesting|
|authorization||✘||✔||Kyuubi ACL Management Guide|
|impersonation||✘||✔||Kyuubi fully support
|multi tenancy||✘||✔||Based on the above features，Kyuubi is able to run as a multi-tenant server on a LCE supported Yarn cluster.|
|operation log||✘||✔||Kyuubi redirect sql operation log to local file which has an interface for the client to fetch.|
|high availability||✘||✔||ZooKeeper Dynamic Service Discovery|
|containerization||✘||✔||Kyuubi Containerization Guide|
|type mapping||✘||✔||Kyuubi support Spark result/schema to be directly converted to Thrift result/schemas bypassing Hive format results|
Please refer to the Building Kyuubi in the online documentation for an overview on how to build Kyuubi.
We can start Kyuubi with the built-in startup script
First of all, export
And then the last, start Kyuubi with
$ bin/start-kyuubi.sh \ --master yarn \ --deploy-mode client \ --driver-memory 10g \ --conf spark.kyuubi.frontend.bind.port=10009
Run Spark SQL on Kyuubi
Multi Tenancy Support
Kyuubi may work well with different deployments such as non-secured Yarn, Standalone, Mesos or even local mode, but it is mainly designed for a secured HDFS/Yarn Cluster on which Kyuubi will play well with multi tenant and secure features.
Suppose that you already have a secured HDFS cluster for deploying Spark, Hive or other applications.
- YARN Secure Containers
Spark on Yarn
- Setup for Spark On Yarn Ensure that
YARN_CONF_DIRpoints to the directory which contains the (client side) configuration files for the Hadoop cluster.
- Configuration of Hive is done by placing your
Please refer to the Configuration Guide in the online documentation for an overview on how to configure Kyuubi.
Please refer to the Authentication/Security Guide in the online documentation for an overview on how to enable security for Kyuubi.