« AJAX Patterns (and thoughts on future of Book Publishing on the Web) | Main | Ruby or Java -- A (Performance) Reality Check »

What my web server log analysis program doesn't tell me

My website hosting company lets me access the raw webserver log files and analog reports. The logfiles have all the information but are useless without furhter processing (well, they are useful during development debugging of webapps, but that is a different matter). The pre-configured analog reports are better but don't have much visual appeal. They also don't allow me to do further drill-down (For example, it lists top 30 search terms but won't let me see the complete list).

So, I installed and configured open source log analyzer awstats. The generated reports are much nicer to look at and allow me to drill-down for more information. A full report includes information on:


  1. No. of unique visitors, visits, pages accessed, hits and data transferred in a given month or day. Also, hits and data transferred in a given hour.

  2. Hits and data transferred by specific robots and spiders, in a given month.

  3. Distribution of visit duration, in a given month.

  4. Hits and data transferred by file type, in a given month.

  5. URLs sorted by view count, in a given month.

  6. Operating Systems and Browsers used to access the website, in a given month.

  7. Referring search engines and pages, sorted by view count, in a given month.

  8. Search keywords and phrases, sorted by usage count, in a given month.

Note that most of the information is organized by month and it is not easy (or even possible) to specify a different duration. What if I want to look at referrers for a particular link on a particular day or week? What if I want to see the distribution of view count for a particular page for last one year? What if I want to know the dates on which Googlebot (or MSN or whatever) indexed my site over last one year? Possibilities are endless.

Answers to these, and many more similar queries, can be easily obtained by doing a more thorough analysis of the logfiles. Perhaps I could read the log entries, import them in MySQL and issue SQL queries against the relevant tables. I could even write a web interface to it using something like Ruby On Rails. Should make for a fun hobby project.

About

This page contains a single entry from the blog posted on November 1, 2005 7:25 PM.

The previous post in this blog was AJAX Patterns (and thoughts on future of Book Publishing on the Web).

The next post in this blog is Ruby or Java -- A (Performance) Reality Check.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.33