Features¶

Kylo is a full-featured Data Lake platform built on Apache Hadoop and Spark. Kylo provides a turn-key, business-friendly Data Lake solution enabling data ingest, data preparation, and data discovery.

Features	Description
License	Apache 2.0
Major Features
Data Ingest	Users can easily configure feeds in guided UI
Data Preparation	Visual sql builder and data wrangling
Operations dashboard	Feed health and service monitoring
Global search	Lucene search against data and metadata
Data Processing
Data Ingest	Guided UI for data ingest into Hive (extensible)
Data Export	Export data to RDBMS or other targets
Data Wrangling	Visually wrangle data and build/schedule recipes
PySpark, Spark Jobs	Execute Spark jobs
Custom Pipelines	Build and templatize new pipelines
Feed Chaining	Trigger feeds based on dependencies and rules
Ingest Features
Batch	Batch processing
Streaming	Streaming processing
Snapshot/Incremental Loads	Track highwater using date field or replace target
Schema Discovery	Infer schema from source file samples
Data Validation	Configure field validation in UI
Data Profile	Automatically profile statistics
Data Cleanse/Standardization	Easily configure field standardization rules
Custom Partitioning	Configure Hive partitioning
Ingest Sources
FTP, SFTP	Source from FTP, SFTP
Filesystem	Poll files from a filesystem
HDFS, S3	Extract files from HDFS and S3
RDBMS	Efficiently extract RDBMS data
JMS, KAFKA	Source events from queues
REST, HTTP	Source data from messages
Ingest Targets
HDFS	Store data in HDFS
HIVE	Store data in Hive tables
HBase	Store data in HBase
Ingest Formats
ORC, Parquet, Avro, RCFile, Text	Store data in popular table formats
Format Compression	Specify compression for ORC and Parquet types
Extensible source formats	Ability to define custom schema plug-in Serdes
Metadata
Tag/Glossary	Add tags to feeds for searchability
Business Metadata (extended properties)	Add business-defined fields to feeds
REST API	Powerful REST APIs for automation and integration
Visual Lineage	Explore process lineage
Profile History	View history of profile statistics
Search/Discover	Lucene syntax search against data and metadata
Operational Metadata	Extensive metadata capture
Security
Keberos Support	Supports Kerberized clusters
Obfuscation	Configure field-level data protection
Encryption at Rest	Compatible with HDFS encryption features
Access Control (LDAP, KDC, AD, SSO)	Flexible security options
Data Protection	UI configurable data protection policies
Application Groups, Roles	Admin configured roles
Operations
Dashboard	KPIs, alerts, performance, troubleshooting
Scheduler	Timer, Cron-style based on Quartz engine
SLA Monitoring	Service level agreements tied to feed performance
Alerts	Alerts with integration options to enterprise
Health Monitoring	Quickly identify feed and service health issues
Performance Reporting	Pivot on performance statistics
Scalability
Edge Clustering	Scale edge resources