Log Management

Overview

Log management is an important concern when running software such as Urchin. Because busy sites will build up large log files fairly quickly (up to several gigabytes in one month in some cases), log management should be considered carefully. It is recommended that a standard log rotation practice be established. Compressing and otherwise archiving files offline are standard practices. Please see the article on Log Rotation Best Practices in this section for further information on establishing such a procedure. Log management is necessary only for disk resource usage considerations, not for purposes of avoiding reprocessing data. Urchin does not need any sort of log rotation to avoid data duplication, as it is equipped with a log tracking capability that ensures that previously read log data is not reprocessed. Because Urchin should never need to re-read a log file once they have been processed, at your discretion you may delete the log(s) after each processing run. However, it is not uncommon to keep old logs for a specified amount of time for historical or auditing reasons.

Managing Logs via Urchin

Each Log Source has a Log Destiny setting with the options Don't Touch, Archive/Compress, and Delete. Once all Profiles that are utilizing a Log Source have finished their processing, Urchin uses the Log Destiny setting to determine the disposition of the Log Source. The Log Destiny setting is accessible under the Advanced Settings tab for a given Log Source. It is recommended to set Log Destiny to Archive/Compress so that you save disk space if you want to keep your logs for some period of time. If you are comfortable with the fact that once you've processed a log that it is removed, then you can choose a Log Destiny of Delete. However, realize that this means you will not have the option of rerunning Urchin against that log in the future unless you have a backup elsewhere.

Considerations

A few special situations should be noted:

  • Do not use the Archive or Delete options with a Log Source if you are processing live logs. A live log is one that is being actively written to by a webserver. Using these setting with a live log will cause a loss of data.
  • If Log Destiny for a remotely retrieved Log Source is set to Don't Touch, then that log will grow continually unless there is some process external to Urchin that is handling log management on the machine where the log is created. Since Urchin must transfer a copy of the remote logfile to the local system before processing, as the log file grows it will take Urchin longer and longer to transfer the file. This will have the side effect of lengthening your overall Urchin run time.