DynamicURL Filters (deprecated)

Note
Urchin 5 and later displays query terms in a seperate report by default. Unlike versions 4 and earlier, it is not necessary to create a filter to display query strings. However, unlike versions 4 and earlier, the data is not displayed in the 'Pages' report. Urchin 5 displays this data in a drill-down report titled 'Page Query Terms.' This report can be found under the 'Pages & Files' report menu.

Many sites today will use a CGI, ASP or other scripting mechanism to provide dynamic content. Often, a single script is used to deliver multiple pages of information. While this can be a handy way to track users sessions or provide ?live? content, it poses an additional challenge for meaningful reporting.

By default, Urchin strips all the parameters associated with a page request (e.g. those that would typically be used with a CGI or ASP) and stores only the pathname of the page requested in its database. The DynamicURL filtering feature allows you to use regular expressions to selectively capture these parameters and present them in an intuitive way.

As an example, a CGI script might be used to deliver information about all products in a catalog. The script draws from a database, and uses parameters passed through the request to determine which product to display. The resulting hit in the webserver log for this request might look like:

    /cgi-bin/showProduct.cgi?sessionId=123456789&productId=knobs
    
    |______________________| |_________________________________|
    

Under normal operation, Urchin will record that the showProduct.cgi page was requested, and all parameters up to and including the "?" will be stripped. By using a DynamicURL filter, Urchin can store some or all of the parameters and produce a unique page record based on the parameter list.

Now in this example, we don?t necessarily want to capture the entire second part of the request because of the ?sessionId.? Let?s assume that this parameter changes for each visit and we get 30,000 visits per day. Including this piece of information would create far too many unique pages and render the Pages reporting useless. Instead we just want to capture the ?productId? and report only on that information.

    /cgi-bin/showProduct.cgi?sessionId=123456789&productId=knobs
    

We may still want to know which script was used as well as which product was implicated in the request. By using a DynamicURL filter, we can capture multiple parts of the request and recombine them into a new, formatted request ready for reporting. Here is an example of a filter that could be used with the page request above:

    (/cgi-bin/showProduct.cgi\?).*productId=(.*)
    

This regular expression will match the above request no matter what the value of the sessionId or productId was. And the parenthesis capture the parts of the request that we want to keep for reporting. The effective request of the above example would look like:

    /cgi-bin/showProduct.cgi/knobs
    

Up to 5 sets of parenthesis can be used. And, multiple filters can be applied. If a request does not match the DynamicURL filter, it is left unmodified, but still included in the reporting. This allows you to use multiple DynamicURL filters for each area of a site. Keep in mind there is a slight performance hit for each filter used.

Note that DynamicURL filters can only be applied to the base URL and query string that form the page request. They cannot be used to filter referrals or any other fields in the log file. Also, when DynamicURLs and FilterIn/FilterOut are used together the DynamicURL will be applied after the other filters. So consideration must be given to how one set of filters affects the others when choosing what to filter.

Examples

Example 1: We want to capture the all the specific article IDs in the Urchin 4 report for help.urchin.com. Here's a sample of what the Request portion of the hit looks like in the log file:

    GET /knowledge.cgi?cmd=2&id=767
The proper Dynamic URL filter to extract the article ID is:
    (/knowledge\.cgi\?)cmd=2&(id=.*)
and this produces Top Pages reports that look like:
1. /knowledge.cgi         1,081   46.43%  
2. /knowledge.cgi/id=767    244   10.48%  
3. /knowledge.cgi/id=807    136    5.84%  
4. /knowledge.cgi/id=768     50    2.15%  
5. /knowledge.cgi/id=777     40    1.72%

Example 2: We want to capture the all the search keywords used in the Urchin 4 report for help.urchin.com. Here's a sample of what the Request portion of the hit looks like in the log file:

    GET /knowledge.cgi?cmd=1&S_STYPE=1&S_ASTYPE=0&s_cat=&s_keyword= utm
The proper Dynamic URL filter to extract the keyword information is:
    (/knowledge.cgi\?).*s_(keyword=[^&]*)
and this produces Top Pages reports that look like:
1. /knowledge.cgi                     1,373   68.65%  
2. /knowledge.cgi/keyword=utm            29    1.45%  
3. /knowledge.cgi/keyword=default+page   18    0.90%  
4. /knowledge.cgi/keyword=no+referral    11    0.55%  
5. /knowledge.cgi/keyword=scheduler      10    0.50%