How do I get Urchin to read a webserver log from standard input (stdin)?

It may be desirable in certain circumstances to have the Urchin log processing engine be able to read webserver log data directly from standard input (stdin) as opposed to reading from an actual file. Urchin 5.000 and later support the configuration of a Log Source that reads from standard input rather than an actual file.

Important: Due to a bug in Urchin 5.500, standard input log sources did not work properly. Please upgrade to Urchin 5.501 or later to use this feature.

To configure Urchin to do this, first configure a new Log Source named "stdin" in the Log Source Manager using the following parameters:

Log Settings
  Log File Location:  local
  Log File Path:      -
  Log Format:         auto

Advanced Settings
  Log Destiny:              Don't Touch 	
  Query Token:              ?
  URI Stem to Lower Case:   off
NOTE: it is very important that the "Log File Path" be set to a single dash "-" character (no quotes).

Now, for any profiles that you want to read logs from "stdin", simply choose your new "stdin" log source in the Profile -> Log Source configuration screen.

Important Note: You cannot run Urchin profiles that are configured with a "stdin" log source to run via the Urchin Scheduler, since the scheduler has no way of properly associating the log input stream. The only way to do log processing for a profile with a "stdin" log source is to run Urchin manually for such profiles. As a very simple example, here is how you would invoke Urchin:

  1. Open a command shell window on the Urchin system
  2. Change directory to "bin" directory of the Urchin distribution
  3. [UNIX-type systems] Run urchin with "./urchin -p myprofile < /path/to/webserver.log"
  4. [Windows systems] Run urchin with "urchin -p myprofile < C:\path\to\webserver.log"
Using "stdin" as a log source allows you a great deal of flexibility, since you can now easily do some pre-processing of a log before Urchin reads it. Additional examples:
# Extract hits for a single virtual host from a log comprised of
# hits from many different virtual hosts, and feed the results to Urchin
grep " www.mydomain.com " /path/to/alldomains.log | urchin -p mydomain.com

# Extract raw webserver hits from a relational database using some sort
# of embedded SQL script and feed them to Urchin for processing
sql-extract --date 20040302 --domain mydomain.com | urchin -p mydomain.com