Custom Log Formats

Introduction

Urchin can process virtually any log file format. By providing Urchin with the necessary information about the log format, Urchin will read and parse the raw data according to your configuration. This article describes a step by step method for creating custom log formats. Once a custom format is created it can be selected in the Administration Interface as the format for any log file.

It should be noted that certain log data is required in order to create certain reports. You will need to be sure your log contains the minimally required fields for Urchin to process it. If you are unsure see the Log Files - Logging - Other Webservers document in the Urchin Administration section of the Documentation Center.

Creating Custom Formats

A custom log format is created by creating a format specification file in the

   [urchin install dir]/lib/custom/logformats/
folder. A sample file is provided called 'custom.lf'. Multiple custom files can be created. Each one needs the '.lf' extension for Urchin to recognize it. The default built-in log formats such as apache, w3c, netscape, etc. are located in
   [urchin install dir]/lib/reporting/logformats/
In each of the above directories, there is an available fields list in the fieldlist.txt file. The custom folder holds the custom fields list and the reporting folder holds the standard fields list. Custom log formats can refer to fields in either list by using the field id numbers. Once a custom format is created, it is available for selection in the Admininstration Interface. Here are the basic steps for creating a custom format:

Step 1: Copy the example
In the 'lib/custom/logformats' folder of the Urchin distribution, make a copy of the 'custom.lf' file. The new filename should not use spaces, and it needs the '.lf' extension.

Step 2: Set the primary positions
Edit the file created in the above step. The file contains a lot of useful information about each variable. The first step is to set the PrimaryPositions variable. This comma separated list identifies each field (by id number) in the log format and its relative position. Use the fieldlist.txt files in the two directories mentioned above to determine which field ids to use. If you have custom date and time formats that are different from Apache and IIS formats, use fields 16 and 17 respectively. If the date and time are together in one field, just use field 16.

Step 3: Check the fields separator
Check the FieldSeparator1 variable, and set this to the separator between your fields. Use \s for a space and a \t for a tab.

Step 4: Is the HTTP status field available?
If the HTTP status field is available, then leave the StatusRequired variable set to YES. This will separate valid and error hits appropriately. If there is not status in the log, then set this to NO, so that all hits will be considered valid.

Step 5: Are you using a custom Date/Time format?
If so, then edit the CustomDateFormat for field 16 and the CustomTimeFormat for field 17. The format is specified using the % variables defined in the Custom Date Format article later in this section of the documentation.

Step 6: Check the other variables
Most of the other variables will be OK for most custom log formats, but check the comments provided in the file on each variable to see if it applies to your situation.

Step 7: Do you have custom calculated fields?
In addition to the format, you can specify custom calculated fields in the specification file. An example with comments is provided at the bottom of the file. The custom calculated field works the same way as the Advanced Filter in the filtering section except that custom calculated fields are processed first. Please see the article on custom calculated fields form more information.

Save the file and you ready to assign it as the format for a log file in the Administrative Interface. It will automatically show up as one of the format options in the pull down menu for log formats.