udb-sanitizer: Database Maintenance Utility

Overview

The Urchin Database Maintenance Utility, udb-sanitizer, provides a means of checking the Urchin 5 profile databases and performing various maintenance operations on these databases.

The types of operations that udb-sanitizer can perform are:

  • Check the integrity of Urchin monthly profile databases
  • Rebuild Urchin monthly profile database headers and indexes
  • Roll back databases to a previous saved backup state
  • Delete profile data for a day, multiple days, or an entire month

Usage

udb-sanitizer is located in the util directory of the Urchin 5 distribution.

Usage of the utility is as follows:

  udb-sanitizer [-h] (prints usage message and exits)
  udb-sanitizer [-v] (prints version and exits)
  udb-sanitizer -p profile [-d YYYYMM[DD]] -bfhiprqx] [-z [-e DD]]
where:
   -b  go directly to rollback option
   -d  specifies year-month and optionally the day to operate on
   -e  with z and d options, zero multiple days (range d->e) in same month
   -f  force action to occur without confirmation
   -h  print this help information
   -i  go directly to rebuild-index option
   -p  specifies name of profile (required)
   -r  go directly to remove option
   -q  quiet mode, suppress output except for critical user confirmation
   -x  go directly to rebuild-header option
   -z  go directly to zero-day option
Note: When udb-sanitizer is called with options that do not completely describe what action to take, it will prompt the user as needed for additional input. You can cause an action to be performed without any user interaction by using the "-d" option in conjunction with any of the -b,-i,-r,-x, or -z options.

Operation

In normal operation, udb-sanitizer is invoked from a command shell and interactively prompts the user for the actions to take. For each month of Urchin reporting data that the utility finds, it will present the following interactive menu:

   Options:
      1. Rollback data to state before last run
      2. Delete this month entirely
      3. Rebuild header to match data
      4. Rebuild indexes
      5. Zero out one or more days
   Please choose 1-5 or press return to do nothing:
If no action for the currently selected month is desired, pressing the Enter/Return key will cause the utility to move forward to the next chronological month where data is present and present the same menu choices.

Actions associated with the options presented above are:

  1. Data rollback
    The utility will revert all reporting data for a profile to that contained in a ZIP archive. The user is presented with a list of ZIP archive backups to choose from. The ZIP archives are named with the following convention "YYYYMM-backup-YYYYMMDDHHMMSS.zip", where the first YYYYMM refers to the month of data being backed up (e.g.200309 refers to September 2003), and the YYYYMMDDHHMMSS portion is the timestamp of when the ZIP archive was created. This timestamp should be helpful in determining which ZIP archive you want to roll back to. Please note that there is no way to invoke udb-sanitizer to do a rollback based solely on command line arguments; it will always prompt for the ZIP archive to rollback to if any exist. If no ZIP backup archives exist, the utility prints a diagnostic to that effect and exits.

  2. Delete monthly data
    All data for a particular profile for the specified month is removed. This option is useful for zeroing out the statistics for a month if the data is incorrect, e.g. the wrong filters were applied or the wrong logs were processed; or perhaps some of the advanced profile parameters were changed such as the click path depth or referral level and it is desirable to update that month's Urchin reporting data to reflect the change. This action can be performed without user interaction by invoking udb-sanitizer with the "-f", "-r" and "-d" arguments, e.g.
      udb-sanitizer -f -r -d 200309 -p mysite.com
      
  3. Rebuild database headers
    This causes the utility to read the Urchin database tables directly and rebuild the database headers based on the data found. This should only be done if udb-sanitizer finds a discrepancy between the headers and the data. WARNING: if the database headers do not match the data, this is typically indicative of some type of database corruption; in this case, the prudent course of action is to completely remove the data for that month and reprocess the corresponding webserver logs. This may not be possible for various reasons, so rebuilding the headers may be the only way to resuscitate the databases so that the Urchin log processing and reporting engines are able to work with them, but this is not guaranteed to fix corruption. This action can be performed without user interaction by invoking udb-sanitizer with the "-f", "-i" and "-d" arguments, e.g.
      udb-sanitizer -f -x -d 200309 -p mysite.com
      
  4. Rebuild database indexes
    This causes the utility to read the Urchin database tables directly and rebuild the database indexes based on the data found. This should only be done if udb-sanitizer finds a discrepancy between the headers and the data. NOTE: the same warning given about corruption in the database headers applies to this option as well. This action can be performed without user interaction by invoking udb-sanitizer with the "-f", "-i" and "-d" arguments, e.g.
      udb-sanitizer -f -i -d 200309 -p mysite.com
      
  5. Zero data for one or more days
    This option allows data for selected days within the month to be zeroed out, thereby allowing Urchin log processing to be rerun for those days only (e.g. urchin -p profile -d YYYYMMDD). This action can be performed without user interaction to zero out a single day by invoking udb-sanitizer with the "-f", "-z" and "-d" arguments, e.g.
      udb-sanitizer -f -z -d 20030907 -p mysite.com
      
    and for multiple days by including the "-e" argument as well to specify an end date, e.g.
      udb-sanitizer -f -z -d 20030907 -e 10 -p mysite.com
      
    which will zero out data for September 7th through the 10th. This is more efficient than invoking multiple instances of udb-sanitizer to zero out a single day at a time, as the database indexes and headers only are checked once. The index/header checking operation can require a noticeable amount of time on profiles with a lot of data.

Considerations

  • Invoking udb-sanitizer without specific dates on profiles with a lot of historical data can be time consuming, as the utility must open up the databases for each month, perform sanity checks, and then present the menu of actions.
  • Actions that delete daily or monthly data cannot be undone! The only recourse is to reprocess the webserver logs for that time period to repopulate the profile databases. Use these options with care.