Yarn Cluster Mode Configuration


In order for the yarn cluster mode to work to validate the Spark processor, the JSON policy file has to be passed to the cluster. In addition the hive-site.xml file needs to be passed. This should work for both HDP and Cloudera clusters.


You must have Kylo installed.

Step 1: Add the Data Nucleus Jars


This step is required only for HDP and is not required on Cloudera.

If using Hive in your Spark processors, provide Hive jar dependencies and hive-site.xml so that Spark can connect to the right Hive metastore. To do this, add the following jars into the “Extra Jars” parameter:

/usr/hdp/current/spark-client/lib (/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-x.x.x.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-x.x.x.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-x.x.x.jar)

Step 2: Add the hive-site.xml File

Specify “hive-site.xml”. It should be located in the following location:


Add this file location to the “Extra Files” parameter. To add multiple files, separate them with a comma.


Step 3: Validate and Split Records Processor

If using the “Validate and Split Records” processor in the standard-ingest template, pass the JSON policy file as well.