NiFi & Kylo Provenance¶
Introduction¶
Kylo uses a custom ProvenanceRepository (KyloPersistentProvenanceEventRepository) to send data from NiFi to Kylo. A custom NiFi nar file https://github.com/Teradata/kylo/tree/master/integrations/nifi/nifi-nar-bundles/nifi-provenance-repo-bundle is used for the ProvenanceRepository.
Setup¶
Edit the nifi.properties file (
/opt/nifi/current/conf/nifi.properties
) and change thenifi.provenance.repository.implementation
property as below:# Provenance Repository Properties #nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository nifi.provenance.repository.implementation=com.thinkbiganalytics.nifi.provenance.repo.KyloPersistentProvenanceEventRepository
Ensure the correct nars are available in the NiFi classpath. Depending upon the NiFi version there are 2 different nar files that are used. If you use the kylo wizard it will copy the nar files and setup the symlinks to point to the correct nar version for your NiFi installation.
For NiFi 1.0 or 1.1
- kylo-nifi-provenance-repo-v1-nar-<version>.nar
For NiFi 1.2 or 1.3
- kylo-nifi-provenance-repo-v1.2-nar-<version>.nar
- Configure the KyloPersistentProvenanceEventRepository properties: The Provenance Repository uses properties found in the
/opt/nifi/ext-config/config.properties
file. Note: this location is configurable via the System Property
kylo.nifi.configPath
passed into NiFi when it launches. Below are the defaults which are automatically set if the file/properties are not found.Note: the config.properties marked with ## Supports dynamic update below can be updated without restarting NiFi. Every 30 seconds a check is made to see if the config.properties file has been updated.
### jms.activemq.broker.url=tcp://localhost:61616 ## Back up location to write the Feed stats data if NiFi goes down ## *Supports dynamic update* kylo.provenance.cache.location=/opt/nifi/feed-event-statistics.gz ## The maximum number of starting flow files per feed during the given run interval to send to ops manager ## *Supports dynamic update* kylo.provenance.max.starting.events=5 ## The number of starting flow files allowed to be sent through until the throttle mechanism in engaged. # if the feed starting processor gets more than this number of events during a rolling window based upon the kylo.provenance.event.throttle.threshold.time.millis timefame events will be throttled back to 1 per second until its slowed down kylo.provenance.event.count.throttle.threshold=15 ## Throttle timefame used to check the rolling window to determine if rapid fire is occurring kylo.provenance.event.throttle.threshold.time.millis=1000 ## run interval to gather stats and send to ops manager ## *Supports dynamic update* kylo.provenance.run.interval.millis=3000 ## JSON string of the Event Type to Array of Processor classes ## These processors produce orphan child flow files that dont send DROP provenance events for the children. ## Child flow files produced by events matching the EventType and processor class will not be processed ## *Supports dynamic update* kylo.provenance.orphan.child.flowfile.processors={"CLONE":["ConvertCSVToAvro"]}
- Configure the KyloPersistentProvenanceEventRepository properties: The Provenance Repository uses properties found in the
Event Processing¶
When NiFi runs the processors will send provenance events to JMS Queues. Kylo listens on these JMS queues and creates Jobs/Steps and Streaming statistics about each feed and job execution. These are displayed in the Operations Manager.