FeaturesΒΆ
Kylo is a full-featured Data Lake platform built on Apache Hadoop and Spark. Kylo provides a turn-key, business-friendly Data Lake solution enabling data ingest, data preparation, and data discovery.
Features |
Description |
| License | Apache 2.0 |
Major Features |
|
| Data Ingest | Users can easily configure feeds in guided UI |
| Data Preparation | Visual sql builder and data wrangling |
| Operations dashboard | Feed health and service monitoring |
| Global search | Lucene search against data and metadata |
Data Processing |
|
| Data Ingest | Guided UI for data ingest into Hive (extensible) |
| Data Export | Export data to RDBMS or other targets |
| Data Wrangling | Visually wrangle data and build/schedule recipes |
| PySpark, Spark Jobs | Execute Spark jobs |
| Custom Pipelines | Build and templatize new pipelines |
| Feed Chaining | Trigger feeds based on dependencies and rules |
Ingest Features |
|
| Batch | Batch processing |
| Streaming | Streaming processing |
| Snapshot/Incremental Loads | Track highwater using date field or replace target |
| Schema Discovery | Infer schema from source file samples |
| Data Validation | Configure field validation in UI |
| Data Profile | Automatically profile statistics |
| Data Cleanse/Standardization | Easily configure field standardization rules |
| Custom Partitioning | Configure Hive partitioning |
Ingest Sources |
|
| FTP, SFTP | Source from FTP, SFTP |
| Filesystem | Poll files from a filesystem |
| HDFS, S3 | Extract files from HDFS and S3 |
| RDBMS | Efficiently extract RDBMS data |
| JMS, KAFKA | Source events from queues |
| REST, HTTP | Source data from messages |
Ingest Targets |
|
| HDFS | Store data in HDFS |
| HIVE | Store data in Hive tables |
| HBase | Store data in HBase |
Ingest Formats |
|
| ORC, Parquet, Avro, RCFile, Text | Store data in popular table formats |
| Format Compression | Specify compression for ORC and Parquet types |
| Extensible source formats | Ability to define custom schema plug-in Serdes |
Metadata |
|
| Tag/Glossary | Add tags to feeds for searchability |
| Business Metadata (extended properties) | Add business-defined fields to feeds |
| REST API | Powerful REST APIs for automation and integration |
| Visual Lineage | Explore process lineage |
| Profile History | View history of profile statistics |
| Search/Discover | Lucene syntax search against data and metadata |
| Operational Metadata | Extensive metadata capture |
Security |
|
| Keberos Support | Supports Kerberized clusters |
| Obfuscation | Configure field-level data protection |
| Encryption at Rest | Compatible with HDFS encryption features |
| Access Control (LDAP, KDC, AD, SSO) | Flexible security options |
| Data Protection | UI configurable data protection policies |
| Application Groups, Roles | Admin configured roles |
Operations |
|
| Dashboard | KPIs, alerts, performance, troubleshooting |
| Scheduler | Timer, Cron-style based on Quartz engine |
| SLA Monitoring | Service level agreements tied to feed performance |
| Alerts | Alerts with integration options to enterprise |
| Health Monitoring | Quickly identify feed and service health issues |
| Performance Reporting | Pivot on performance statistics |
Scalability |
|
| Edge Clustering | Scale edge resources |