JSON Data Format

FlyData supports the importing of JSON formatted data.

When your data is formatted as JSON, FlyData automatically creates tables and dynamically adds columns when new key-values are received. Dynamic column creation through JSON is one of the most characteristic features of FlyData, allowing you to handle table schemas with much greater flexibility. As this feature is not available to CSV and TSV formats, it is necessary that you output your data in JSON format to take advantage of this feature.

The JSON data should follow the guidelines below:

Basic JSON format

When you first sign up for FlyData, you are prompted to specific a table name to correspond to each of your log files. From there, every time your log files are updated with new JSON data rows, FlyData updates the corresponding Amazon Redshift table, using the new JSON key-value pairs to update columns and values in the table.

{ "[column-name]": [column-value], … }

A line in JSON must start with { and ends with the newline character, \n. If the data contains invalid JSON formatting, FlyData ignores those lines.

Nested JSON format

FlyData also supports JSON data in nested structures that have hashes or arrays of hashes in the value.

When a value is nested, FlyData treats its keys as table name suffixes and processes the values in the same way as that of root key-values. For example, if a JSON record for a "users" table includes a hash value with the key name"address", then the child table takes the name "users_address".

Additionally, FlyData adds an "id" column to all tables in order to maintain the relation of the parent and child tables. FlyData issues the "id" value automatically, unless the record already includes an "id" key and value. In such cases, for the child tables, FlyData creates a "[parent-table-name]_id"column and matches it to the "id" value of the parent table record.

For instance, the following JSON data for a "users" table is stored in two tables, one for "users" and one for "addresses":

// JSON Data
{
  "name": "John",
  "address": {
    "city": "LA"
  }
}

Once FlyData uploads it to Amazon Redshift, the following shows how it appears on the cluster tables:

// Stored Data on the Redshift cluster

// users (parent table)
-----------------
| name | id     |
+------+--------+
| John | 123456 |
-----------------

// users_address (child table)
-------------------
| city | users_id |
+------+----------+
| LA   | 123456   |
-------------------

As you can see, the value of "address" can be an array of hash and all of the elements of the array are processed. However, if the value is an array with non-hash, values are processed in the different way.

The record log columns are as follows:

  • index (store the index of array)
  • value
  • parent_id

Here is an example of "gps_history" table that includes coordinates array values.

// JSON Data
{
  "created_at": "2013-03-12 12:23:45",
  "coordinates": [
    121.01,
    14.51
  ]
}

// Stored Data on the Redshift cluster

// gps_history (parent table)
--------------------------------
|      created_at     |   id   |        
+---------------------+---------
| 2013-03-12 12:23:45 | 123456 |
--------------------------------

// gps_history_coordinates (child table)
-----------------------------------
| index | value  | gps_history_id |
+-------+--------+----------------+
| 0     | 121.01 | 123456         |
+-------+--------+----------------+
| 1     | 14.51  | 123456         |
-----------------------------------

Column Format Type

FlyData detects the format type from the value to create a new table or add a new column. The supported format types are as following.

  • integer (4-byte)
  • big integer (8-byte)
  • float
  • boolean
  • varchar (256-byte)
  • date
  • timestamp

If the value of the unknown key is "null", empty hash, or empty array, that is skipped to add a new column.

Date and Timestamp format

FlyData replaces the text for timestamp or date format fields with the date format that Amazon Redshift supports.

Supported formats for timestamp and date format fields include:

  • Date

     

    YYYY/mm/dd  (ex 2012/05/25)
    YYYY-mm-dd  (ex 2013-04-12)
  • Timestamp

     

    // W3C http://www.w3.org/TR/NOTE-datetime
    YYYY-MM-DDThh:mm:ssTZD (ex 1997-07-16T19:20:01+01:00)
    YYYY-MM-DDThh:mm:ss.sTZD (ex 1997-07-16T19:20.123+01:00)
    
    // Twitter
    ddd MMM dd HH:mm:ss zzzz yyyy (ex Sat Sep 17 14:48:57 +0000 2011)
    
    // Redshift
    YYYY-MM-DD hh:mm:ss (ex 2012-01-23 19:20:15)
    YYYY-MM-DD hh:mm:ss.s (ex 2012-01-23 19:20:15.243)
    
    // Apache
    [DD/MMM/YYYY:HH:mm:ss zzzz] (ex [13/Sep/2006:07:01:53 -0700])
    
    // Others
    YYYY-MM-DD hh:mm:ss zzzz (ex 2012-01-23 19:20:15 +0000)
    YYYY-MM-DD hh:mm:ss.s zzzz (ex 2012-01-23 19:20:15.243 +0000)

Supported characters

Data can contain only UTF-8 characters up to three-bytes long. Invalid characters will be replaced with ‘?’ character.

Settings

You can configure the following options from the FlyData settings page:

  • Auto create table If turned off and no tables to store data, data will be ignored.
  • Auto add columns If "support nested JSON" option is on, "id" and"parent id" column will be added automatically even if this option is off.
  • Support nested JSON If turned off, the value of array or hash will be stored with a serialized text of JSON.

     

    CREATE TABLE your_tablename_here (
       an_integer INT,
       first_string VARCHAR255),
       second_string VARCHAR(255)
    );
 
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk