FeaturesΒΆ
Kylo is a full-featured Data Lake platform built on Apache Hadoop and Spark. Kylo provides a turn-key, business-friendly Data Lake solution enabling data ingest, data preparation, and data discovery.
Features |
Description |
License | Apache 2.0 |
Major Features |
|
Data Ingest | Users can easily configure feeds in guided UI |
Data Preparation | Visual sql builder and data wrangling |
Operations dashboard | Feed health and service monitoring |
Global search | Lucene search against data and metadata |
Data Processing |
|
Data Ingest | Guided UI for data ingest into Hive (extensible) |
Data Export | Export data to RDBMS or other targets |
Data Wrangling | Visually wrangle data and build/schedule recipes |
PySpark, Spark Jobs | Execute Spark jobs |
Custom Pipelines | Build and templatize new pipelines |
Feed Chaining | Trigger feeds based on dependencies and rules |
Ingest Features |
|
Batch | Batch processing |
Streaming | Streaming processing |
Snapshot/Incremental Loads | Track highwater using date field or replace target |
Schema Discovery | Infer schema from source file samples |
Data Validation | Configure field validation in UI |
Data Profile | Automatically profile statistics |
Data Cleanse/Standardization | Easily configure field standardization rules |
Custom Partitioning | Configure Hive partitioning |
Ingest Sources |
|
FTP, SFTP | Source from FTP, SFTP |
Filesystem | Poll files from a filesystem |
HDFS, S3 | Extract files from HDFS and S3 |
RDBMS | Efficiently extract RDBMS data |
JMS, KAFKA | Source events from queues |
REST, HTTP | Source data from messages |
Ingest Targets |
|
HDFS | Store data in HDFS |
HIVE | Store data in Hive tables |
HBase | Store data in HBase |
Ingest Formats |
|
ORC, Parquet, Avro, RCFile, Text | Store data in popular table formats |
Format Compression | Specify compression for ORC and Parquet types |
Extensible source formats | Ability to define custom schema plug-in Serdes |
Metadata |
|
Tag/Glossary | Add tags to feeds for searchability |
Business Metadata (extended properties) | Add business-defined fields to feeds |
REST API | Powerful REST APIs for automation and integration |
Visual Lineage | Explore process lineage |
Profile History | View history of profile statistics |
Search/Discover | Lucene syntax search against data and metadata |
Operational Metadata | Extensive metadata capture |
Security |
|
Keberos Support | Supports Kerberized clusters |
Obfuscation | Configure field-level data protection |
Encryption at Rest | Compatible with HDFS encryption features |
Access Control (LDAP, KDC, AD, SSO) | Flexible security options |
Data Protection | UI configurable data protection policies |
Application Groups, Roles | Admin configured roles |
Operations |
|
Dashboard | KPIs, alerts, performance, troubleshooting |
Scheduler | Timer, Cron-style based on Quartz engine |
SLA Monitoring | Service level agreements tied to feed performance |
Alerts | Alerts with integration options to enterprise |
Health Monitoring | Quickly identify feed and service health issues |
Performance Reporting | Pivot on performance statistics |
Scalability |
|
Edge Clustering | Scale edge resources |