Custom Provenance Events

You can use Kylo’s Provenance API to create custom Provenance Events that result in Jobs/Steps in Kylo Operations Manager.

The API allows you to programmatically create Provenance events. Kylo ships with 3 implementations:

There is also a sample Spark application that uses this api

Example Usage

  1. Add the provenance api implementation as a dependency.
To create a new Provenance event you need to include one of the kylo-provenance implementation’s in your project


  1. An example Program might look like the following. Complete example code can be found here.
//Get the proper ProvenanceService based upon some configuration
ProvenanceEventService provenanceEventService = ProvenanceServiceFactory.getProvenanceEventService(params);

try {
    //Store all the events we want to send to the api in a list
    List<ProvenanceEventRecordDTO> events = new ArrayList<>();

    //build an event using the ProvenanceEventDtoBuilder
    ProvenanceEventRecordDTO event = new ProvenanceEventDtoBuilder(params.getFeedName(),params.getFlowFileId(),componentName)

       /// do some work

       //record the end time and some attributes to be displayed on the step in Operations Manager
       event.getAttributeMap().put("databases", df.toJSON().collectAsList().toString());

       //add the event to the list

       // when ready send the events off to the api to be processed
 } finally {
     //When done close the connection to the service

Spark Kafka Example

  1. build and copy the spark-provenance-app

  2. copy this and make available to NiFi (i.e. copy it to the /opt/nifi/data/lib/app)

    ln -s /opt/nifi/data/lib/app/kylo-spark-provenance-app-0.9.1-SNAPSHOT-jar-with-dependencies.jar  kylo-spark-provenance-app-with-dependencies.jar
  3. Import the Sample Spark App with Provenance template This is an example template that will call the spark-provenance-app in step 1 and write out 2 additional steps/provenance events

  4. Import the kafka_provenance_to_jms feed. This is a system wide template that is listening to 2 kafka topics for batch and streaming data and publish the events to JMS.


  5. Create a feed using the Sample Spark App with Provenance template. Note this is a Spark2 application so set the spark home property accordingly


  • The Sample Spark App with Provenance Feed is below and only has 4 processors in the template and thus will only create 4 steps for the job execution in Kylo.
    • GenerateFlowFile
    • Initialize Feed Parameters
    • Spark Provenance
    • Winner Winner
  • The actual Spark application has provenance code that will create 2 additional steps after the Spark Provenance step for each job.
    • Databases
    • Another Step