This page can be used as a guide to prepare you environment for installation.
Supported Operating Systems¶
Supported Hadoop Distributions¶
Edge Node Hardware Requirements¶
Although the hardware requirements depend on the volume of data that will be processed here are some general recommendations:
- Minimum production recommendation is 4 cores CPU, 16 GB RAM.
- Preferred production recommendation is 8 cores CPU, 32 GB RAM.
Kylo and Apache NiFi can be installed on a single edge node, however it is recommended that they run on separate edge nodes.
Kylo Stack Dependencies¶
Below is a list of some of the major components Kylo uses along with the version that Kylo currently supports:
|Persistence||MySQL||5.x (tested with 5.1.73)||Used to store both the Modeshape (JCR 2.0) metadata and the Operational Relational (Kylo Ops Manager) metadata|
|Persistence||Postgres||9.x||Used to store both the Modeshape (JCR 2.0) metadata and the Operational Relational (Kylo Ops Manager) metadata|
|Persistence||MS SQL Server||Azure||Used to store both the Modeshape (JCR 2.0) metadata and the Operational Relational (Kylo Ops Manager) metadata|
|JMS||ActiveMq||5.x (tested with 5.13.3)||Used to send messages between different modules and to send Provenance from NiFi to Kylo|
|NiFi||NiFi||1.0 - 1.5,(HDF 2.0)||Either HDF or open source NiFi work.|
|Spark||Spark Client||1.5.x, 1.6.x, 2.x||NiFi and Kylo have routines that leverage Spark.|
|Hive||Hive||1.2.x+||Required if using Hive and the standard ingest template|
|Hadoop||HDFS||2.7.x+||Required if using Hive and the standard ingest template|
|Java||Java||Java 8_92+||The Kylo install will setup its own Java Home so it doesn’t affect any other Java versions running on the machine.|
|Search||Elasticsearch||2.3.x, 5.x||For index and search of Hive metadata and indexing feed data when selected as part of creating a feed|
|Search||Solr||6.5.1 (SolrCloud mode)||For index and search of Hive metadata and indexing feed data when selected as part of creating a feed|
Below are tools required to be installed on the Linux box before installing the Kylo components
|Curl (for downloading installation files)|
|RPM or dpkg(for install)|
Required new linux service accounts are listed below. Within enterprises there are often approvals required and long lead times to obtain service accounts. Kerberos principals are required where the service interacts with a Kerberized Hadoop cluster. These services are not typically deployed to control and data nodes. The Nifi, activemq, Elastic services and Kylo metastore databases (mysql or postgres) are IO intensive.
|Service||Purpose||Local Linux Users||Local Linux Groups||Keytab file||upn||spn|
|kylo-services||Kylo API Server||kylo||kylo, hdfs or supergroup||/etc/security/keytabs/kylo.service.keytab||*kylo@EXAMPLE.COM*|
|kylo-ui||Provides Kylo feed and operations user interface||kylo||kylo, hdfs or supergroup|
|nifi||Orchestrate data flows||nifi||nifi, hdfs or supergroup||/etc/security/keytabs/nifi.service.keytab||*nifi@EXAMPLE.COM*|
|activemq||Broker messages between components||activemq||activemq|
|elasticsearch||Manages searchable index||elasticsearch||elasticsearch|
|mysql or postgres||Metastore for Kylo feed manager and operational metadata||mysql or postgres||mysql or postgres|
You have the flexibility to change the installation locations and service accounts when using the TAR installation method
Kylo relies heavily on integration with other services. Below is a list of network ports that are required for the standard ingest to work
|Port||From Service||To Service|
|ALL||kylo-spark-shell||Yarn, data nodes|
|Port||From Service||To Service|
Default HDFS Locations (for standard ingest)¶
The below locations are configurable. If you plan on using the default locations they will be create here.
|HDFS Location | Description|
|/archive||Archive original files|
|/etl||Feed processing file location|
|/model.db||Hive feed, invalid, valid, profile location|
|/app/warehouse||Hive feed table final location|