Spark User Impersonation Configuration


Users in Kylo have access to all Hive tables accessible to the kylo user by default. By configuring Kylo for a secure Hadoop cluster and enabling user impersonation, users will only have access to the Hive tables accessible to their specific account. A local spark shell process is still used for schema detection when uploading a sample file.


This guide assumes that Kylo has already been setup with Kerberos authentication and that each user will have an account in the Hadoop cluster.

Kylo Configuration

Kylo will need to launch a separate spark shell process for each user that is actively performing data transformations. This means that the kylo-spark-shell service should no longer be managed by the system.

  1. Stop and disable the system process.
$ service kylo-spark-shell stop
$ chkconfig kylo-spark-shell off
  1. Add the auth-spark profile in This will enable Kylo to create temporary credentials for the spark shell processes to communicate with the kylo-services process.
$ vim /opt/kylo/kylo-services/conf/

spring.profiles.include = auth-spark, ...
  1. Enable user impersonation in It is recommended that the yarn-cluster master be used to ensure that both the Spark driver and executors run under the user’s account. Using the local or yarn-client masters are possible but not recommended due the Spark driver running as the kylo user.
$ vim /opt/kylo/kylo-services/conf/

# Ensure these two properties are commented out

# Executes both driver and executors as the user = cluster = yarn
# Enables user impersonation = true
# Reduces memory requirements and allows Kerberos user impersonation = --driver-memory 512m --executor-memory 512m --driver-java-options

kerberos.spark.kerberosEnabled = true
kerberos.spark.kerberosPrincipal = kylo
kerberos.spark.keytabLocation = /etc/security/keytabs/kylo.headless.keytab
  1. Redirect logs to kylo-spark-shell.log. By default the logs will be written to kylo-services.log and include the output of every spark shell process. The below configuration instead redirects this output to the kylo-spark-shell.log file.
$ vim /opt/kylo/kylo-services/conf/, sparkShellLog

log4j.appender.sparkShellLog.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %t:%c{1}:%L - %m%n
  1. Configure Hadoop to allow Kylo to proxy users.
$ vim /etc/hadoop/conf/core-site.xml