Running Urchin 5 and Urchin 3 simultaneously on a Sun Cobalt system

Introduction

The instructions in this article assume that you currently have Urchin 3 processing your website logs daily, that you've already installed Urchin 5, and that you have run the u3importer utilityto import your Urchin 3 configuration and databases into Urchin 5. However, if you understand enough about your Cobalt system's configuration, you can also use these details to manually set up simultaneous Urchin 3 and Urchin 5 processing regardless of whether you've imported your Urchin 3 information already.

On Sun Cobalt systems Urchin 5 can be installed and operated alongside an existing Urchin 3 installation. The two packages will be viewed as separate applications by the Cobalt management interface. However, since you will be processing the same set of website logs using both versions, some coordination must be done to ensure that log file handling by one Urchin distribution does not interfere with processing by the other Urchin distribution.

Procedure

Urchin 3 is scheduled to run daily after 5am. The execution time cannot be determined exactly because the Urchin 3 processing must wait for the Cobalt split_logs and logrotation tasks to finish running first, and that timing will vary depending on the amount of web traffic for the system. What can be predicted is that at some point each website's home directory in /home/sites should contain a newly created web.log file. After processing by Urchin 3 these web.log files will be renamed and compressed so that in their final form they will each have the name web.log.YYYYMMDD.gz, where the string YYYYMMDD will actually be a numeric timestamp that includes the year, month, and day. Since these archived logs are created by Urchin 3 with a predictable naming scheme, you can configure Urchin 5 to run after Urchin 3 and look for these archived logs. Please be sure to read the Considerations section of this article for an explanation of the differences in how Urchin 3 and Urchin 5 handle the YYYYMMDD file naming notation.

To demonstrate the procedure, suppose you have a website named test.urchin.com on your Cobalt system. Login via the Urchin 5 administration interface and perform these steps-

  1. Select Configuration, then choose Profiles
  2. Click the Edit button next to test.urchin.com, then select the Log Sources tab
  3. In the Log Sources to Process table, click on the Log Source Name, which will take you to the Log Settings screen for that Log Source
  4. In the Log File Path box, you should see something like /home/sites/site1/logs/web.log. The number 1 in the site1 part of the path will be replaced with whatever number the Cobalt system has assigned for the test.urchin.com website.
  5. Edit the Log File Path and change only the web.log portion so that it reads web.log.YYYYMMDD.gz. The final path will look like /home/sites/site1/logs/web.log.YYYYMMDD.gz
  6. Click the Update button in the lower portion of the screen
  7. Go back to Configuration->Profiles and select the Run button for test.urchin.com.
  8. In the Run/Schedule screen select the Daily button, then set the start time to be well after when your Urchin 3 processing normally finishes. You can determine when that is by examining the Activity screen in your Urchin 3 administration interface.

If you have a large number of sites, a faster way to accomplish the necessary modificationswithout editing each Log Source using the graphical interface is to usethe command line utilities. Use the uconf-export tool to save the Urchin 5 configuration, edit a copy of the saved configuration, update the Log File Path references, and then re-import the configuration with uconf-import. You can also use the uconf-schedule utility to schedule Urchin 5 processing for all the sites simultaneously.

Considerations

Since Urchin 3 runs at around 4am, the timestamp on the archived log file it creates will be the current day's date. However, the majority of the data in the log file will be for the previous day. By default when Urchin 5 is given a Log File Path that includes the YYYYMMDD specifier, it looks for a file with a timestamp for the previous day (the assumption is that your web logs should have a timestamp that indicates what the day of the data is, not what day it was created). This means for example that on 11/20/2002, Urchin 3 will create a web.log.20021120.gz file. However, what Urchin 5 will look for is web.log.20021119.gz. Since Urchin 3 archived logs are typically kept around for 10 consecutive days, the previous day's archives should always be available. The impact of the way Urchin 5 uses YYYYMMDD is that your Urchin 5 reports will lag one day behind relative to the data that appears in your Urchin 3 reports.

See Also