Review Dependencies

This page can be used as a guide to prepare you environment for installation.

Supported Operating Systems

Operating System Versions
RHEL,CentOs 6.x, 7.x
SUSE v11
Ubuntu 16.x,17.x

Supported Hadoop Distributions

Platform Sandbox URL Version
Hortonworks https://hortonworks.com/products/sandbox/ HDP 2.3+
Cloudera https://www.cloudera.com/downloads/quickstart_vms/5-12.html 5.8+

Edge Node Hardware Requirements

Although the hardware requirements depend on the volume of data that will be processed here are some general recommendations:

  • Minimum production recommendation is 4 cores CPU, 16 GB RAM.
  • Preferred production recommendation is 8 cores CPU, 32 GB RAM.

Note

Kylo and Apache NiFi can be installed on a single edge node, however it is recommended that they run on separate edge nodes.

Kylo Stack Dependencies

Below is a list of some of the major components Kylo uses along with the version that Kylo currently supports:

Category Item Version Description
Persistence MySQL 5.x (tested with 5.1.73) Used to store both the Modeshape (JCR 2.0) metadata and the Operational Relational (Kylo Ops Manager) metadata
Persistence Postgres 9.x Used to store both the Modeshape (JCR 2.0) metadata and the Operational Relational (Kylo Ops Manager) metadata
Persistence MS SQL Server Azure Used to store both the Modeshape (JCR 2.0) metadata and the Operational Relational (Kylo Ops Manager) metadata
JMS ActiveMq 5.x  (tested with  5.13.3) Used to send messages between different modules and to send Provenance from NiFi to Kylo
NiFi NiFi 1.0 - 1.3,(HDF 2.0) Either HDF or open source NiFi work.
Spark Spark Client 1.5.x, 1.6.x, 2.x NiFi and Kylo have routines that leverage Spark.
Hive Hive 1.2.x+ Required if using Hive and the standard ingest template
Hadoop HDFS 2.7.x+ Required if using Hive and the standard ingest template
Java Java Java 8_92+ The Kylo install will setup its own Java Home so it doesn’t affect any other Java versions running on the machine.
Search Elasticsearch 2.3.x, 5.x For index and search of Hive metadata and indexing feed data when selected as part of creating a feed
Search Solr 6.5.1 (SolrCloud mode) For index and search of Hive metadata and indexing feed data when selected as part of creating a feed

Linux Tools

Below are tools required to be installed on the Linux box before installing the Kylo components

Tool
Curl (for downloading installation files)
RPM or dpkg(for install)

Service Accounts

Required new linux service accounts are listed below. Within enterprises there are often approvals required and long lead times to obtain service accounts. Kerberos principals are required where the service interacts with a Kerberized Hadoop cluster. These services are not typically deployed to control and data nodes. The Nifi, activemq, Elastic services and Kylo metastore databases (mysql or postgres) are IO intensive.

Service Purpose Local Linux Users Local Linux Groups Keytab file upn spn
kylo-services Kylo API Server kylo kylo, hdfs or supergroup /etc/security/keytabs/kylo.service.keytab *kylo@EXAMPLE.COM*  
kylo-ui Provides Kylo feed and operations user interface kylo kylo, hdfs or supergroup      
nifi Orchestrate data flows nifi nifi, hdfs or supergroup /etc/security/keytabs/nifi.service.keytab *nifi@EXAMPLE.COM*  
activemq Broker messages between components activemq activemq      
elasticsearch Manages searchable index elasticsearch elasticsearch      
mysql or postgres Metastore for Kylo feed manager and operational metadata mysql or postgres mysql or postgres      

Note

You have the flexibility to change the installation locations and service accounts when using the TAR installation method

Network Ports

Kylo relies heavily on integration with other services. Below is a list of network ports that are required for the standard ingest to work

Required

Port From Service To Service
8400 Browser/NiFi kylo-ui
8079 Browser/kylo-services NiFi
61616 kylo-services/NiFi ActiveMQ
3306 kylo-services/NiFi MySQL
9200 kylo-services/NiFi Elasticsearch
9300 kylo-services/NiFi Elasticsearch 2.x
8983 kylo-services/NiFi SOLR
9983 kylo-services/NiFi SOLR
10000 kylo-services/NiFi HiveServer2
ALL kylo-spark-shell Yarn, data nodes

Optional

Port From Service To Service
8420 REST Client kylo-services
8161 Browser ActiveMQ Admin

Default HDFS Locations (for standard ingest)

The below locations are configurable. If you plan on using the default locations they will be create here.

HDFS Location | Description
/archive Archive original files
/etl Feed processing file location
/model.db Hive feed, invalid, valid, profile location
/app/warehouse Hive feed table final location