Postgres Metastore Configuration¶
Introduction¶
Kylo currently requires MySQL for the kylo schema. However, you can configure Kylo to work with a cluster that uses Postgres. We need to make some modifications to support Hive.
Kylo Services Configuration¶
Step 1: Ensure the Postgres driver is on the classpath¶
Ensure the postgres jdbc driver jar file is included in the kylo-services classpath.
Copy the driver jar file to the kylo-services/lib
folder.
Step 2: Update the application.properties¶
For Kylo to connect to a Postgres databases for the Hive metadata you need to change the following section of the kylo-services application.properties file.
hive.metastore.datasource.driverClassName=org.postgresql.Driver
hive.metastore.datasource.url=jdbc:postgresql://<hostname>:5432/hive
hive.metastore.datasource.username=hive
hive.metastore.datasource.password=
hive.metastore.datasource.validationQuery=SELECT 1
hive.metastore.datasource.testOnBorrow=true
Elasticsearch NiFi Template Changes¶
The index_schema_service template is used to query out feed metadata from the Hive tables, which is then stored in elasticsearch so it can be searched for in Kylo. The following steps need to be taken to the template to support Postgres:
Step 1: Copy the Postgres JAR file to NiFi¶
mkdir /opt/nifi/postgres
cp /opt/kylo/kylo-services/lib/postgresql-9.1-901-1.jdbc4.jar
/opt/nifi/postgres
chown -R nifi:users /opt/nifi/postgres
Step 2: Create a Controller Service for Postgres Connection¶
You will need to create an additional database controller services to connect to the second database.
Controller Service Properties:
Controller Service Type: DBCPConnectionPool
Database Connection URL: jdbc:postgresql://<host>:5432/hive
Database Driver Class Name: org.postgresql.Driver
Database Driver Jar URL:
file:///opt/nifi/postgres/postgresql-9.1-901-1.jdbc4.jar Database
User: hive
Password: <password>
Enable the Controller Service.
Step 3: Update “Query Hive Table Metadata” Processor¶
Edit the “Query Hive Table Schema” processor and make two changes:
- Disable the “Query Hive Table Metadata” processor.
- Change the Database Connection Pooling Service to the Postgres Hive controller service created above.
- Update the “SQL select Query” to be a Postgres query.
SELECT d."NAME", d."OWNER_NAME", t."CREATE_TIME", t."TBL_NAME",
t."TBL_TYPE",
c."COLUMN_NAME", c."TYPE_NAME"
FROM "COLUMNS_V2" c
JOIN "SDS" s on s."CD_ID" = c."CD_ID"
JOIN "TBLS" t ON s."SD_ID" =t."SD_ID"
JOIN "DBS" d on d."DB_ID" = t."DB_ID"
where d."NAME" = '${category}' and t."TBL_NAME" like '${feed}';
- Enable the “Query Hive Table Metadata” processor.
- Test a feed to make sure the data is getting indexed.