Setting Up Elasticsearch and Kibana on AWS (Part I)

This tutorial will be divided into two parts. First, we will set up a group of Elasticsearch servers (cluster) using amazon web services (AWS). Second, we will learn how to create a dashboard using Kibana.

Elasticsearch and Kibana are powerful tools for visualizing and processing large amounts of information. Elasticsearch is a JSON store that distributes data across multiple servers for a near-real time response to queries. Kibana is a tool that leverages Elasticsearch to create visualizations from your dataset.

This combination allows you to quickly sort through large data sets to find patterns or aberrations. For instance, you could use Elasticsearch and Kibana to rapidly create a dashboard that visualizes traffic data over the course of a day. While also displaying what composes the traffic, e.g. the number of passengers per vehicle. And then quickly pivot to display how traffic changes over the seasons. With Elasticsearch offering fast look ups and Kibana to create visualizations, you have the ability to iterate dashboards with large datasets.

To create our Elasticsearch cluster, we will leverage two AWS services, CloudFormation and OpsWorks. CloudFormation allows you to upload different ‘recipes’ to create new server instances. It’s an easy way to automate installing the different dependencies to your server. OpsWorks let’s you easily spin up and destroy instances. Let’s begin with developing an IAM role in AWS.

Forming The Cloud

Select AWS Security Essentials

  1. The first step is to create an IAM role. Sign into your AWS account (you have one right? if not go here),

  2. In the top right click on your profile name. Click on security credentials.

  3. Click on ‘Roles’ in the sidebar and then ‘Create New Role’. Call the role aws-opsworks-ec2-role. On the next page select amazon EC2, and then select AmazonEC2FullAccess. Finish creating your role.

  4. Go to this github repo and right click on raw, and save. This is going to be the template that CloudFormation will use to create your servers. The image below shows what you should be seeing.

  5. Go back to the AWS console (click on the AWS logo in the nav) and, under Deployment & Management, go to CloudFormation.

  6. Create a stack and name it elastic-search.

  7. Select 'Upload a template to Amazon S3' and upload the template downloaded from the Github Repo.

  8. Under specify parameters, change DefaultOWRoles to yes.

  9. Change the username and password to your liking. It will be used to access your server later on.

  10. Finish creating the stack.

Review CloudFormation

Create an SSH KEY

You’ll need an ssh key to be able to access your server to upload your data into Elasticsearch.

To create an ssh key, go to the EC2 dashboard, and on the left side click on key pairs. Create a key pair and name it elastic-search. Put your elastic-search.pem in a safe place.

OpsWorks

Next is to edit the OpsWork stack.

  1. Head to the OpsWorks stack by clicking on the AWS logo, you should see elastic-search listed. Click on actions and select edit.

  2. Change the default ssh key to the key created earlier elastic-search.

  3. Select aws-opsworks-ec2-role as the default IAM instance profile.

  4. Under configuration management, toggle Manage Berkshelf to yes.

  5. Save the edits.

  6. Create an instance (just an instance, not time or load based) and select the t2.micro size.

  7. Create two more instances for a total of three instances.

  8. Start all the instances, don’t worry it’s not frozen, it takes awhile to spin up the servers.

Mapping to Elasticsearch

After the servers are loaded up, click on a servers ip address. Go to the link http://{{server-ip-address}}/_plugin/head and click on any request in the tab to make a put to /shakespeare with this JSON

{
  "mappings": {
    "_default_": {
      "properties": {
        "speaker": {
          "type": "string", 
          "index": "not_analyzed"
        },
        "play_name": {
          "type": "string", 
          "index": "not_analyzed"
        },
        "line_id": { 
          "type": "integer" 
        },
        "speech_number": { 
          "type": "integer" 
        },
        "text_entry": { 
          "type": "string"
        }
      }
    }
  }
}

Elasticsearch Mapping

This creates the indexes that Elasticsearch uses to store data. The important part to note is “index”: “not_analyzed”. Elasticsearch stores data by breaking strings into an array, eg “Toast is good” is broken into [“toast”, “is”, “good”], which allows you to search for keywords but prevents you from searching for the exact phrase “toast is good”.

Import the Data

SSH into the server to download the data for Kibana. From the OpsWorks page, select SSH. Open your ssh client, go to the folder where your elastic-search.pem resides, and run chmod 400 elastic-search.pem to allow you to read the file.

Then run ssh -i path_to_key/elastic-search.pem ec2-user@{{server-ip-address}}to ssh into your server instance.

Once in the server, download the data that we will visualize.

curl -XGET 'https://www.elastic.co/guide/en/kibana/3.0/snippets/shakespeare.json' > shakespeare.json  

The above command downloads our data and puts it into a json file.
Now we need to upload the json into Elasticsearch to be analyzed using the following command:

curl -XPOST 'localhost:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare.json  

And there we go, we have an Elasticsearch cluster with a dataset ready for visualization. Next tutorial will be about working with Kibana. Head to {{ip-address}}/_plugin/kibana4-static/public/#/settings/indices/?_g=() to get started!

comments powered by Disqus