NiFi & Kylo Reporting Task (deprecated)¶
Warning
NOTE this is only for Kylo 0.8.1 and below. For Kylo 0.8.2 or above please refer to this document: NiFi & Kylo Provenance
Introduction¶
Kylo communicates with NiFi via a NiFi reporting task. As flow files run through NiFi, each processor creates provenance events that track metadata and status information of a running flow. A NiFi reporting task is used to query for these provenance events and send them to Kylo to display job and step executions in Kylo’s operations manager.
Processing Provenance Events¶
The Kylo reporting task relies on Kylo to provide feed information, which it uses to augment the provenance event giving the NiFi event feed context. It does this through a cache called the “NiFi Flow Cache”, which is maintained by the Kylo Feed manager and kept in sync with the NiFi reporting task. As feeds are created and updated, this cache is updated and synchronized back to the NiFi reporting task upon processing provenance events. The cache is exposed through a REST API, which is used by the reporting task.
The NiFi Flow Cache REST API¶
The above REST endpoints allow you to manage the cache. Kylo and the reporting task will automatically keep the cache in sync. If needed you can use these REST endpoints to manage, view, and reset the cache.
Note
If for some reason the reporting task is reporting Kylo as “not available”, you can try to reset the cache to fix the problem using the “reset-cache” endpoint.
Reporting Task Creation¶
When Kylo starts up, it will attempt to auto create the controller service and reporting task in NiFi that is needed to communicate with Kylo. If this process doesn’t work, or if you want more control, you can manually create it following the steps below.
Manual Setup¶
To setup the reporting task, click the menu icon on the top right and click the “Controller Settings” link.
From there we need to setup a Controller Service before adding the Reporting task. The Controller Service is used to allow NiFi to talk to Kylo REST endpoints that gather feed information needed for processing NiFi events. Setup a new Metadata Provider Selection Service and set the properties to communicate with your Kylo instance.
Next add the reporting task.
Set the schedule on the reporting task.
Reporting Task Properties¶
Name | Default Value | Allowable Values | Description |
Metadata Service | Controller Service API: MetadataProviderService Implementation: | Kylo metadata service | |
Max batch feed events per second | 10 | The maximum number of events/second for a given feed allowed to go through to Kylo. This is used to safeguard Kylo against a feed that starts acting like a stream Supports Expression Language: true | |
JMS event group size | 50 | The size of grouped events sent over to Kylo. This should be less than the Processing Batch Size Supports Expression Language: true | |
Rebuild cache on restart | false | Should the cache of the flows be rebuilt every time the Reporting task is restarted? By default, the system will keep the cache up to date; however, setting this to true will force the cache to be rebuilt upon restarting the reporting task. Supports Expression Language: true | |
Last event id not found value | KYLO | KYLO ZERO MAX_EVENT_ID |
If there is no minimum value to start the range query from (i.e. if this reporting task has never run before in NiFi) what should be the initial value?” KYLO: It will attempt to query Kylo for the last saved id and use that as the latest id ZERO: this will get all events starting at 0 to the latest event id. MAX_EVENT_ID: this is set it to the max provenance event. This is the default setting |
Initial event id value | LAST_EVENT_ID | LAST_EVENT_ID KYLO MAX_EVENT_ID |
Upon starting the Reporting task what value should be used as the minimum value in the range of provenance events this task should query? LAST_EVENT_ID: will use the last event successfully processed from this task. This is the default setting. KYLO: It will attempt to query Kylo for the last saved id and use that as the latest id MAX_EVENT_ID will start processing every event > the Max event id in provenance. This value is evaluated each time this reporting task is stopped and restarted. You can use this to reset provenance events being sent to Kylo. This is not the ideal behavior so you may lose provenance reporting. Use this with caution. |
Processing batch size | 500 | The maximum number of events to process in a given interval. If there are more events than this number to process in a given run of this reporting task it will partition the list and process the events in batches of this size to increase throughput to Kylo. Supports Expression Language: true |