Working with Log Sources

Overview

You will generally add a Log Source in the course of creating a Profile. A Log Source is Urchin's way of identifying the characteristics of an access log (sometimes called a transfer log) for one of your websites. Access logs contain all the hits, or requests for web documents, that are made to your website. Some of the log file characteristics that are associated with a Log Source are the path to the log file, the format of the log file (e.g. W3C or NCSA), whether the log is local or on a remote system, and whether a filter should be applied to the log file during processing.

An important concept to understand is that Log Sources exist independently of Profiles. Every Profile must have at least one Log Source associated with it to obtain reporting. However, several Profiles could conceivably use the same Log Source. For example, you may want to create multiple Profiles using the same Log Source, but give each Profile a different filter to produce varying report results. So there is not necessarily a 1:1 ratio between Log Sources and Profiles.

Configuring Log Sources

To get started adding a Log Source to the system, log-in to the Urchin administrative system as the administrator and click on the Configuration button at left. Next, click the Log Manager button. To create a new Log Source, click the Add button at the top right of the screen. You will be taken to the Add Log Source Wizard. This is a simple series of steps designed to help you get the Log Source set up quickly and easily. Each screen in the Wizard has explicit help information to explain the configuration information displayed on that screen.

In the Log Settings screen you will note that you have to choose a Log Format. This setting tells Urchin how the data in your log file is arranged. It is important that you select the correct format for your log or Urchin will not be able to produce meaningful report data. Urchin understands a default set of log formats that you can choose from via a dropdown menu. They are:

  • Auto: Urchin uses this format to automatically detect NCSA, W3C, Netscape, ELF, and ELF2 log formats. Instead of explicitly selecting one of these, you may choose Auto and Urchin will correctly deduce how to read the data if your log format is in this list.
  • NCSA: Apache modified Extended/Combined format (see Logging - Apache and IIS for a description of this format)
  • W3C: Microsoft IIS servers typically use this format, although other webservers can also be configured to produce W3C logs.
  • Netscape: Netscape and iPlanet servers use this format by default.
  • ELF/ELF2: E-Commerce Log Format; see the specification in the E-commerce Module section for details.
  • Google: If you have licensed the Campaign Tracking Module, use this format for logs containing Google cost-per-click spending data. Note that the Google log format can not be auto detected.
  • Overture:If you have licensed the Campaign Tracking Module, use this format for logs containing Overture cost-per-click spending data. Note that the Overture log format can not be auto detected.
  • Custom: Although not initially listed in the dropdown menu, you can create your own custom log formats, which will automatically appear in the dropdown menu when properly configured. Please refer to the "Custom Log Formats" article in the Advanced Topics -> Customization section of the Documentation Center.

If you don't believe your webserver currently produces logs in one of the recognized default formats, then either you can reconfigure your webserver to log in one of these formats, or you can create a custom log format that conforms to how your webserver currently logs. If you want to reconfigure your webserver logging, then it is recommended that you choose the W3C or NCSA style logging.

Load Balancing and Parallel Log Processing

If you have purchased a Load Balancing License, the Log Source Wizard provides a Parallel Log Processing option. When Parallel Log Processing is enabled, Urchin opens all of the log files at once and reads them in a rotating fashion, one section at a time, each section corresponding to 15 minutes of log activity. Enabling Parallel Log Processing significantly increases performance on load balanced sites.