Perl Helper Script: archive_udata.pl
Location to Download Script
This Perl script is a working example of a procedure that can be used to perform pruning and compressing operations on Urchin 4 profile data. Significant disk space savings can be realized by implementing database compression and placing a limit on the reporting history for Urchin profiles.
The script implements the three methods of data reduction described in the Reducing Disk Storage for Urchin Profile Monthly Databases document. It is strongly recommended that this document be perused before implementing this script.
- ZIP archiving operations can consume significant CPU and IO resources, especially when transitioning from one month to the next.
- It is very important that only one instance of the script be run at any given time. No internal locking is built into the script.
Running the Script
In most environments, this script will typically be run from an automated scheduler (.e.g cron or the Windows Task Scheduler) on a daily basis.
Without any command line arguments, the script will attempt to operate on a default Urchin 4 distribution installed in the /usr/local/urchin4 directory. The default thresholds are:
It is also possible to invoke the script to run directly on a single Urchin profile's data directory with the --profiledir option. This is very useful in Urchin environments that use have implemented the optional Affiliations segmentation; in such instances, the profile data may not be centralized. In this type of configuration, it will be necessary to invoke a separate instance of this script for each Profile directory.
Description of Command Line Options
Usage: archive_udata.pl [ --urchinreportdir /path/to/dir || --profiledir /path/to/dir ] [ --urchinpath /path/to/urchin ] [ --noreadonly || --readonly N ] [ --nozip || --zipafter N [--updatezip] ] [ --noprune || -- keephistory N ] [ --verbose ] [ --help ] Where: --urchinreportdir specifies the path to the Urchin reporting directory to clean. Mutually exclusive with the --profiledir option. Default: /usr/local/urchin4/data/reports --profiledir specifies the path to a single Urchin profile data directory to clean. Mutually exclusive with the --urchinreportdir option. No default. --urchinpath specifies the path of the Urchin 4 distribution Default: /usr/local/urchin4 --readonly N if specified, Urchin monthly databases "N" months old or older will be converted to readonly status by removing the specific databases used only by the Urchin log processing engine. Mutually exclusive with the --noreadonly option. Default: 0 (never) --noreadonly do not make Urchin databases readonly. Mutually exclusive with the --readonly option. Default: on --zipafter N if specified, Urchin monthly databases "N" months old or older will be compressed and archived in a per-month ZIP archive. Mutually exclusive with the --nozip option. Default: 3 --updatezip Update a ZIP archive with newer Urchin database filefiles if they exist. Use with caution! If you need to refresh a ZIP archive, it is better to manually remove the ZIP archive altogether, process the webserver logs, and allow this script to create a new ZIP archive the next time it runs. This prevents ZIP archives from accidentally getting updated with garbage data. Mutually exclusive with the --nozip option. Default: off --nozip do not create ZIP archives of the Urchin databases. Mutually exclusive with the --zipafter option. Default: off --keephistory N if specified, Urchin montly databases "N" months old or older will be deleted, effectively setting the reporting history to "N" months. This option will cause any matching monthly Urchin databases or ZIP archives to be deleted. Mutually exclusive with the --noprune option. Default: 13 --noprune do not delete old Urchin monthly databases. Mutually exclusive with the --keephistory option. Default: off --verbose prints details about cleaning activity --help shows this message