Apache Log Data Format

FlyData supports the commonly used Apache log format, Common Log Format (CLF) and Combined Log Format.

Using this format, users can upload their Apache access logs to Amazon Redshift and start analyzing their data right away. You don’t even need to create a table in Redshift, as FlyData does the job for you.

Standard log attributes

For example, the following log data in Combined Log Format will be uploaded to the Redshift table as follows:

  • Log Data

     

    127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 
    2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
  • Data in Redshift

     

    // apache_access_log table
    ------------------------------------------------------
    | COLUMN         | ROW                               |
    +----------------+-----------------------------------+
    | ip             | 127.0.0.1                         |
    | remote_logname | -                                 |
    | remote_user    | frank                             |
    | timestamp      | 2000-10-10 13:55:36               |
    | http_method    | GET                               |
    | resource       | /apache_pb.gif                    |
    | protocol       | HTTP/1.0                          |
    | status         | 200                               |
    | size           | 2326                              |
    | referrer       | http://www.example.com/start.html |
    | user_agent     | Mozilla/4.08 [en] (Win98; I ;Nav) |
    ------------------------------------------------------

 

Request query parameters

In addition to the standard log attributes mentioned above, FlyData also allows users to store the values of request query parameters, (for instance, ?user_id=123) into their own corresponding columns on the Redshift table.

If the column for this is missing, FlyData automatically creates the column so that the user doesn’t have to worry about the table definition.

  • Log Data

     

    127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /purchase?user_id=293&item_id=201
    HTTP/1.0" 200 2326 "http://www.example.com/store.html" "Mozilla/4.08 [en] (Win98; I Nav)"
  • Data in Redshift

     

    // Apache_access_log table
    ------------------------------------------------------
    | COLUMN         | ROW                               |
    +----------------+-----------------------------------+
    | ip             | 127.0.0.1                         |
    | remote_logname | -                                 |
    | remote_user    | frank                             |
    | timestamp      | 2000-10-10 13:55:36               |
    | http_method    | GET                               |
    | resource       | /purchase?user_id=293&item_id=102 |
    | protocol       | HTTP/1.0                          |
    | status         | 200                               |
    | size           | 2326                              |
    | referrer       | http://www.example.com/store.html |
    | user_agent     | Mozilla/4.08 [en] (Win98; I ;Nav) |
    | user_id        | 293                               |
    | item_id        | 102                               |
    ------------------------------------------------------

 

FlyData extended format

In the event that the user has custom data that fits neither into the standard parameters nor into the request query parameters, they can use the FlyData Extended Log Format.

Extended Log Format is the addition of double-quoted strings to the end of Common Log Format or Combined Log Format. The contents must be in a string of key=value pairs concatenated with &, which is the same format as the request query parameters.

  • Log Data

     

    127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /purchase?user_id=293&item_id=201 
    HTTP/1.0" 200 2326 "http://www.example.com/store.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
    "session_id=rfnq17675gtrfejbtc46n0vi97&response_time=7"

    Here, the last double-quoted string, for session_id andresponse_time, are in the FlyData Extended Log Format. During upload, FlyData will create columns for them on the Amazon Redshift table.

  • Data in Redshift

     

    // Apache_access_log table
    ------------------------------------------------------
    | COLUMN         | ROW                               |
    +----------------+-----------------------------------+
    | ip             | 127.0.0.1                         |
    | remote_logname | -                                 |
    | remote_user    | frank                             |
    | timestamp      | 2000-10-10 13:55:36               |
    | http_method    | GET                               |
    | resource       | /purchase?user_id=293&item_id=102 |
    | protocol       | HTTP/1.0                          |
    | status         | 200                               |
    | size           | 2326                              |
    | referrer       | http://www.example.com/store.html |
    | user_agent     | Mozilla/4.08 [en] (Win98; I ;Nav) |
    | user_id        | 293                               |
    | item_id        | 102                               |
    | session_id     | rfnq17675gtrfejbtc46n0vi97        |
    | response_time  | 7                                 |
    ------------------------------------------------------
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk