Amazon Redshift Settings

We provide a way to transfer your data to your Amazon Redshift cluster easily from your Heroku application. We will put Heroku specified data (access logs and runtime metrics) into specific tables as well as store all the raw data that is output. Along with this data, we will also catch specific JSON that is output to STDOUT and, if the table is set up on Amazon Redshift, upload the data automatically for you.

Amazon Redshift features on FlyData for Heroku:

  • It’s free for all users
  • Automatically upload heroku specified logs (access logs and runtime metrics)
  • Upload JSON formatted output onto an Amazon Redshift cluster
  • Use your own Amazon Redshift cluster or use our Redshift sandbox for free.
  • Upload once an hour to keep your cluster up to date.

How to setup

If you haven’t already installed our Heroku add-on, find us on the Heroku add-on page and install. Once installed, go to the settings page for FlyData. There you will find the Amazon Redshift settings. Before you continue, if you want to store your logs in your own S3 bucket, you should do so now before enabling Amazon Redshift. Please note that the S3 bucket will have to be located on US-EAST1.

Redshift Settings

Otherwise, enable the Amazon Redshift setting and choose which logs you want to upload automatically. JSON logs will automatically upload once you have created tables on your Amazon Redshift cluster. Otherwise, pick to either use our sandbox or provide your own Amazon Redshift cluster details. Afterwards, click save and we will from then, automatically upload any new log data to the Amazon Redshift cluster of your choice. For help accessing your cluster, please refer to our documentation here.

JSON support

Along with just basic Heroku specified log data, we also track certain JSON formatted data. The table has to be created on the Amazon Redshift cluster before we can upload. The JSON data should follow this guideline.

{ "YOUR_TABLENAME": {DATA} }

For example, if I wanted to upload data to the table items_bought, I would first create the table on my Amazon Redshift cluster like this:

CREATE TABLE items_bought
  (id INT,
   user_id INT,
   item VARCHAR(255),
   quantity INT );

After creation of the table on my cluster, I would output this from my Heroku application:

{"items_bought":{"id":19, "user_id":2, "item":"orange", "quantity":6} }

If I wanted to upload more than one entry, I could put the entries in an array like this:

{"items_bought":[
   {"id":19, "user_id":2, "item":"orange", "quantity":6},
   {"id":19, "user_id":2, "item":"banana", "quantity":12} ] }

Otherwise, if I wanted to upload to more than one table, I would first create the table on my cluster:

CREATE TABLE user_info (
id INT,
name VARCHAR(255)
);

After the table is created, I could add another key to the JSON like this:

{"items_bought":[
   {"id":19, "user_id":2, "item":"orange", "quantity":6},
   {"id":19, "user_id":2, "item":"banana", "quantity":12} ],
 "user_info":{"id":2, "name":"Matt"} }

In terms of Ruby code, it can be output by creating a hash of your information and then printing it in JSON format like so:

require 'json'
...
puts {"items_bought"=>[
    {"id"=>19, "user_id"=>2, "item"=>"orange", "quantity"=>6},
    {"id"=>19, "user_id"=>2, "item"=>"banana", "quantity"=>12} ],
  "user_info"=>{"id"=>2, "name"=>"Matt"}}.to_json

If you are creating a table with a timestamp or date data type, you will have to format these types in a specific format for Amazon Redshift. Fortimestamp, it will need to be in a YYYY-MM-DD HH:mm:SS string. Fordate, it will need to be in a “YYYY-MM-DD” string

After this minor adjustment, we can automatically upload this JSON data that you have printed to a table on the Amazon Redshift cluster you have chosen previously.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk