It is very typical in most operating environments for the system services and applications such as webservers to generate logfiles that record actions and events related to those services. In most cases, it is also standard practice for the operating system and/or applications to perform regular maintenance on the logfiles to keep the size of the logfiles in check. This prevents the logfiles from growing without bounds and eventually running the system out of disk space.
A very common approach to managing logs is to have a regularly scheduled log rotation task that renames the existing logs with a timestamp and then restarts the service or application with a new, zero length logfile. It is also a standard practice for the log rotation task to compress the old logfiles, and to delete logfiles after a certain age or rotation cycle threshold has been reached.
In the specific case of webserver logs, the rotation is usually handled on a daily basis to ensure that the logs remain at a manageable size. In addition, a daily rotation schedule is generally a good granularity to facilitate post-processing of webserver logs with an analysis tool like Urchin. Some webservers such as Microsoft's IIS have built-in log rotation functionality, which, when enabled, will rotate logs on a daily basis by default. Other webservers such as Apache have no explicit log rotation handler, but provide tools for easily restarting the webserver (without loss of web service) to accommodate the log rotation operation (e.g. apachectl restart ).
Log Rotation in Previous Versions of Urchin
Unlike Urchin 4, previous versions of Urchin have no built in log tracking mechanism to determine which logs have already been processed, so those earlier Urchin versions depend heavily on a reliable log rotation scheme to ensure that logs are only processed a single time. As such, pre-Urchin 4 versions have the option of providing simple log rotation functionality and the ability to restart the webserver as part of the overall processing duties. If this Urchin logrotation mechanism is not utilized, the responsibility of reliable log rotation must be handled completely by an external log management mechanism. This has traditionally been the function of a larger overall system log management scheme provided as part of the operating system (e.g. the open-source "logrotated" found in many Linux distributions).
Log Rotation Practices with Urchin 4
With the advent of Urchin 4, the need for log rotation to avoid duplicate processing of logs has been eliminated thanks to Urchin 4's new Log Tracking technology. This allows Urchin 4 much greater flexibility in processing of logs, such as the ability to process "live" logs that are still being written by the webserver, or to process logs that are rotated on an manual/irregular basis.
Important Note: Unlike previous versions of Urchin, Urchin 4 does not provide hooks for invoking a log rotation procedure or restarting a webserver after log rotation tasks have been performed, although certain post-log-processing actions are possible as described below.
While Urchin 4 operation does not require that webserver logs be rotated regularly or at all, it is recommended that a standard log rotation scheme be implemented to ensure smooth operation and to keep the Log Tracking utility from having to do a lot of unnecessary processing. It is much more efficient from both a system and application standpoint to manage several smaller logs than one very large log, as file operations tend to slow considerably as files get larger. Smaller files are also much easier to back up and restore in the event of a disk failure or other system failure.
Log rotation mechanisms needn't be overly complex -- in most cases, a simple shell script or Perl script run daily from cron on UNIX-type systems is all that is necessary. The script merely needs to rotate the existing webserver log and timestamp it (using YYYYMMDD format is recommended), and restart the webserver. Additional logic can be added to prune old logfiles to keep disk space usage in check. A sample log rotation script written in Perl can be downloaded from http://www.urchin.com/support in the Helper Scripts area. This script rotates one or more logs and timestamps them appropriately, then removes logs that are older than a certain number of days (configurable). Note: If you are running IIS on a Windows system, the log rotation functionality is included as part of the IIS management and no external script is needed.
Configuring Urchin 4 for Use with Log Rotation
Once you have your log rotation scheme in place, it is a simple matter to configure Urchin to process your rotated log. You can either set up the Log File Path specification to use a wildcard which matches the time-stamped log filename pattern when configuring a Log Source (.e.g. accesslog.* for Apache logs or ex*.log for IIS logs) or you can use Urchin's built-in timestamp pattern matching (e.g. accesslog.YYYYMMDD for Apache, exYYMMDD.log for IIS). When Urchin encounters this pattern, it will substitute yesterday's date for the YYYYMMDD pattern and process the log with the resulting filename (e.g. access-log.20020617). For further information on the date matching pattern, please see the Knowledgebase article on Usage of YYMMDD syntax in Log Sources. The wildcard specification has the advantage of allowing you to place a number of unprocessed logs in a single directory and have Urchin process them the next time it runs. This is especially convenient for handling situations where the expected logfiles are not in place when Urchin runs, e.g. due to a remote webserver being down or loss of network connectivity. The disadvantage is that Urchin must open up the directory and search each log file to determine if it has already been processed, and this can induce significant overhead when many log files are resident in the directory. If you deem your log rotation scheme to be reliable, using the YYYYMMDD pattern matching scheme is a more efficient method.
You may also wish to have Urchin 4 delete or archive/compress the log once it has been processed. Urchin 4.100 and later offer different Log Destiny options that can be set in the the Advanced Settings of a Log Source. For more information on these Log Destiny settings, please see the Log Manager: Advanced Settings reference in the Urchin 4 Administrative Manual.
Important! Log Destiny options should not be used with live logs that have not been rotated!
Configuring Log Rotation on UNIX-type systems
Due to the large variation operating system functionality and webserver configurations, and the high likelyhood that log rotation procedures are highly site-specific, there is no cookbook method for establishing webserver log rotation on UNIX-type systems. However, a sample log rotation script called WebLogRotate is available from the Urchin web site in the Helper Scripts area. This script is written in Perl to make it as portable as possible, and is typically invoked from cron on a daily basis.
Configuring Log Rotation for Windows IIS Webservers
As mentioned above, the management functions of IIS allow for automatic log rotation of webserver logs, though this functionality is not enabled by default. Please follow the steps below to configure an IIS webserver for proper log rotation. It is recommended that the logs be rotated daily, and that the log rotation be set to happen in relation to local time. By default, IIS will rotate logs at midnight GMT rather than localtime.
Under Windows 2000, you should insure that IIS webserver is configured properly to do log rotation. This is accomplished using the Computer Management function of Windows 2000. Windows NT and Windows XP utilize a similar procedure. To open Computer Management and establish log rotation, perform the following actions:
This will ensure that IIS rotates the webserver logs on a daily basis just after midnight.