Perl Helper Script: archive_udata.pl

Location to Download Script

Overview

This Perl script is a working example of a procedure that can be used to perform pruning and compressing operations on Urchin 4 profile data. Significant disk space savings can be realized by implementing database compression and placing a limit on the reporting history for Urchin profiles.

The script implements the three methods of data reduction described in the Reducing Disk Storage for Urchin Profile Monthly Databases document. It is strongly recommended that this document be perused before implementing this script.

Important Notes:

  1. ZIP archiving operations can consume significant CPU and IO resources, especially when transitioning from one month to the next.
  2. It is very important that only one instance of the script be run at any given time. No internal locking is built into the script.

Running the Script

In most environments, this script will typically be run from an automated scheduler (.e.g cron or the Windows Task Scheduler) on a daily basis.

Without any command line arguments, the script will attempt to operate on a default Urchin 4 distribution installed in the /usr/local/urchin4 directory. The default thresholds are:

  • Writable database components are never removed
  • Compress monthly databases after 3 months
  • Remove monthly databases after 13 months
  • It is also possible to invoke the script to run directly on a single Urchin profile's data directory with the --profiledir option. This is very useful in Urchin environments that use have implemented the optional Affiliations segmentation; in such instances, the profile data may not be centralized. In this type of configuration, it will be necessary to invoke a separate instance of this script for each Profile directory.

    Description of Command Line Options

    Usage: archive_udata.pl
           [ --urchinreportdir /path/to/dir || --profiledir /path/to/dir ]
           [ --urchinpath /path/to/urchin ]
           [ --noreadonly || --readonly N ]
           [ --nozip || --zipafter N [--updatezip] ]
           [ --noprune || -- keephistory N ]
           [ --verbose ]
           [ --help ]
    
    Where:
      --urchinreportdir  specifies the path to the Urchin reporting directory
                         to clean. Mutually exclusive with the --profiledir
                         option.
                         Default: /usr/local/urchin4/data/reports
      --profiledir       specifies the path to a single Urchin profile data
                         directory to clean. Mutually exclusive with the
                         --urchinreportdir option. No default.
      --urchinpath       specifies the path of the Urchin 4 distribution
                         Default: /usr/local/urchin4
      --readonly N       if specified, Urchin monthly databases "N" months
                         old or older will be converted to readonly status
                         by removing the specific databases used only by the
                         Urchin log processing engine. Mutually exclusive
                         with the --noreadonly option.
                         Default: 0 (never)
      --noreadonly       do not make Urchin databases readonly. Mutually
                         exclusive with the --readonly option.
                         Default: on
      --zipafter N       if specified, Urchin monthly databases "N" months
                         old or older will be compressed and archived in
                         a per-month ZIP archive. Mutually exclusive with
                         the --nozip option.
                         Default: 3
      --updatezip        Update a ZIP archive with newer Urchin database
                         filefiles if they exist. Use with caution!
                         If you need to refresh a ZIP archive, it is better
                         to manually remove the ZIP archive altogether,
                         process the webserver logs, and allow this
                         script to create a new ZIP archive the next
                         time it runs. This prevents ZIP archives from
                         accidentally getting updated with garbage data.
                         Mutually exclusive with the --nozip option.
                         Default: off
      --nozip            do not create ZIP archives of the Urchin databases.
                         Mutually exclusive with the --zipafter option.
                         Default: off
      --keephistory N    if specified, Urchin montly databases "N" months
                         old or older will be deleted, effectively setting
                         the reporting history to "N" months. This option
                         will cause any matching monthly Urchin databases
                         or ZIP archives to be deleted. Mutually exclusive
                         with the --noprune option.
                         Default: 13
      --noprune          do not delete old Urchin monthly databases.
                         Mutually exclusive with the --keephistory option.
                         Default: off
      --verbose          prints details about cleaning activity
      --help             shows this message