Cloudera Docker Sandbox Deployment Guide

About

In some cases, you may want to deploy a Cloudera sandbox in AWS for a team to perform a simple proof-of-concept, or to avoid system resource usage on the local computer. Cloudera offers a Docker image, similar to the Cloudera sandbox, that you download and install to your computer.

Warning

Once you create the docker container called “cloudera” do not remove the container unless you intend to delete all of your work and start cleanly. There are instructions below on how to start and stop an existing container to retain your data.

Prerequisites

You need access to an AWS instance and permission to create an EC2 instance.

Installation

Step 1: Create an EC2 instance

For this document, we will configure a CoreOS AMI which is optimized for running Docker images.

  1. Choose an AMI for the region in which you will configure the EC2 instance.

Note

For detailed procedures for configuring_the_EC2 instance, visit Running CoreOS Container Linux on EC2 on the CoreOS website.

  1. Create the EC2 instance. You might want to add more disk space than the default 8GB.
  2. Configure the EC2 security group.
  3. After starting up the instance, Login to the EC2 instance:
$ ssh -i <private_key> core@<IP_ADDRESS>

Step 2: Create Script to Start Docker Container

Create a shell script to startup the Docker container. This makes it easier to create a new container if you decided to delete it at some point and start clean.

  1. Start Cloudera:
$ vi startCloudera.sh
  1. Add the following:
#!/bin/bash
docker run --name cloudera =
  --hostname=quickstart.cloudera \
  --privileged=true -t -d \
  -p 8888:8888 \
  -p 7180:7180 \
  -p 80:80 \
  -p 7187:7187 \
  -p 8079:8079 \
  -p 8400:8400 \
  -p 8161:8161 \
  cloudera/quickstart:5.7.0-0-beta /usr/bin/docker-quickstart
  1. Change permissions:
$ chmod 744 startCloudera.sh
  1. Start the Container:
$ /startCloudera.sh
It will have to first download the Docker image, which is about 4GB, so give it some time.

Step 3: Login to the Cloudera Container and Start Cloudera Manager

  1. Login to the Docker container:
$ docker exec -it cloudera bash
  1. Start Cloudera Manager:
$ /home/cloudera/cloudera-manager --express
  1. Login to Cloudera Manager:
<EC2_HOST>:7180 (username/password is cloudera/cloudera)
  1. Start all services in Cloudera Manager.
  2. After it’s started exit the container to go back to the CoreOS host.

Step 4: Build a Cloudera Distribution of Kylo and Copy it to the Docker Container

  1. Modify the pom.xml file for the kylo-services-app module. Change:
  <dependency>
<groupId>com.thinkbiganalytics.datalake</groupId>
<artifactId>kylo-service-monitor-ambari</artifactId>
<version>0.3.0-SNAPSHOT</version>
</dependency/>

       To

  <dependency>
<groupId>com.thinkbiganalytics.datalake</groupId>
<artifactId>kylo-service-monitor-cloudera</artifactId>
<version>0.3.0-SNAPSHOT</version>
</dependency/>
  1. From the kylo root folder, run:
$ mvn clean install -o -DskipTests
  1. Copy the new RPM file to the CoreOS box.
$ scp -i ~/.ssh/<EC2_PRIVATE_KEY>
<DLA_HOME>/install/target/rpm/tkylo/RPMS/noarch/kylo
core@<EC2_IP_ADDRESS>:/home/core
  1. From the CoreOS host, copy the RPM file to the Docker container.
$ docker cp
/home/core/kylo-<VERSION>.noarch.rpm
cloudera:/tmp

Step 5: Install Kylo in the Docker Container

  1. Login to the Cloudera Docker container.
$ docker exec -it cloudera bash

$ cd /tmp
  1. Create Linux Users and Groups.

    Creation of users and groups is done manually because many organizations have their own user and group management system. Therefore we cannot script it as part of the RPM install.

$ useradd -r -m -s /bin/bash nifi
$ useradd -r -m -s /bin/bash kylo
$ useradd -r -m -s /bin/bash activemq
Validate the above commands created a group as well by looking at /etc/group. Some operating systems may not create them by default.
$ cat /etc/group
If the groups are missing then run the following:
$ groupadd kylo
$ groupadd nifi
$ groupadd activemq
  1. Follow the instructions in the Deployment Wizard guide to install the RPM and other components.

Note

There is an issue installing the database script so say No to the wizard step asking to install the database script. We will do that manually. I will update this section when it’s fixed.

  1. Follow these steps, that are not in the wizard deployment guide but are required to run Kylo in this environment:
    1. Run the database scripts:
$ /opt/kylo/setup/sql/mysql/setup-mysql.sh root cloudera
  1. Edit /opt/kylo/kylo-services/conf/application.properties:

    Make the following changes in addition to the Cloudera specific changes, described in the Appendix section of the wizard deployment guide for Cloudera:

###Ambari Services Check
#ambariRestClientConfig.username=admin
#ambariRestClientConfig.password=admin
#ambariRestClientConfig.serverUrl=http://127.0.0.1:8080/api/v1
#ambari.services.status=HDFS,HIVE,MAPREDUCE2,SQOOP
###Cloudera Services Check
clouderaRestClientConfig.username=cloudera
clouderaRestClientConfig.password=cloudera
clouderaRestClientConfig.serverUrl=127.0.0.1
cloudera.services.status=HDFS/[DATANODE,NAMENODE],HIVE/[HIVEMETASTORE,HIVESERVER2],YARN
##HDFS/[DATANODE,NAMENODE,SECONDARYNAMENODE],HIVE/[HIVEMETASTORE,HIVESERVER2],YARN,SQOOP
  1. Add the “kylo” user to the supergroup:
$ usermod -a -G supergroup kylo
  1. Run the following commands to address an issue with the Cloudera Sandbox and fix permissions.
$ su - hdfs
$ hdfs dfs -chmod 775 /
  1. Start up the Kylo Apps:
$ /opt/kylo/start-kylo-apps.sh
  1. Try logging into <EC2_HOST>:8400 and <EC2_HOST>:8079.

Shutting down the container when not in use

EC2 instance can get expensive to run. If you don’t plan to use the sandbox for a period of time, we recommend shutting down the EC2 instance. Here are instructions on how to safely shut down the Cloudera sandbox and CoreOS host.

  1. Login to Cloudera Manager and tell it to stop all services.
  2. On the CoreOS host, type “docker stop cloudera”.
  3. Shutdown the EC2 Instance.

Starting up an Existing EC2 instance and Cloudera Docker Container

  1. Start the EC2 instance.
  2. Login to the CoreOS host.
  3. Type “docker start cloudera” to start the container.
  4. SSH into the docker container.
$ docker exec -it cloudera bash
  1. Start Cloudera Manager.
$ /home/cloudera/cloudera-manager --express
  1. Login to Cloudera Manager and start all services.