How do I get Urchin to read a webserver log from standard input (stdin)?

How do I get Urchin to read a webserver log from standard input (stdin)?

It may be desirable in certain circumstances to have the Urchin log processing engine be able to read webserver log data directly from standard input (stdin) as opposed to reading from an actual file. Urchin supports the configuration of a Log Source that reads from standard input rather than an actual file.

To configure Urchin to do this, first configure a new Log Source named "stdin" in the Log Source Manager using the following parameters:

Log Settings

  • Log File Location: local
  • Log File Path: -
  • Log Format: auto

Advanced Settings

  • Log Destiny: Don't Touch
  • Query Token: ?
  • URI Stem to Lower Case: off

NOTE: it is very important that the "Log File Path" be set to a single dash "-" character (no quotes).

Now, for any profiles that you want to read logs from "stdin", simply choose your new "stdin" log source in the Profile -> Log Source configuration screen.

Important Note: You cannot run Urchin profiles that are configured with a "stdin" log source to run via the Urchin Scheduler since the scheduler has no way of properly associating the log input stream. The only way to process logs for a profile with a "stdin" log source is to run Urchin manually. As a very simple example, here is how you would invoke Urchin:

  1. Open a command shell window on the Urchin system
  2. Change directory to "bin" directory of the Urchin distribution
  3. [UNIX-type systems] Run urchin with "./urchin -p myprofile < /path/to/webserver.log"

    • [Windows systems] Run urchin with "urchin -p myprofile < C:\path\to\webserver.log"

Using "stdin" as a log source allows you a great deal of flexibility, since you can now easily do some pre-processing of a log before Urchin reads it.

Additional examples

Extract hits for a single virtual host from a log comprised of hits from many different virtual hosts, and feed the results to Urchin:

grep " www.mydomain.com " /path/to/alldomains.log | urchin -p mydomain.com 

Extract raw webserver hits from a relational database using some sort of embedded SQL script and feed them to Urchin for processing

sql-extract --date 20040302 --domain mydomain.com | urchin -p mydomain.com