Web Server Log Files
A web server log file is a text file that is written as activity is
generated by the web server. Log files collect a variety of data about
information requests to your web server. Some examples of the data
collected and stored are: Date, Time, Client IP Address, Referrer, User
Agent, Service Name, Server Name, Server IP, etc.
Your server logs act as a visitor sign-in sheet. They can answer
questions such as: Who visits your website? What browsers do they use?
Where do they go in your site? What pages do they view? Your server log
files can tell you:
- What pages get the most and the least traffic
- What sites refer visitors to your site
- The pages that your visitors view
- The browsers and operating systems used to access your site
- When search robots and directory editors visit your site
The above data can help you identify specific problems on your
website. If you have many visitors but few sales, check your server logs
to learn the number of visitors that view your product offerings. Do
you need ROI on search marketing campaigns? Your server logs can reveal
the traffic and conversions generated by your marketing campaigns.
Advantages of Web Server Log Files
Data Ownership: Your web server creates the log
file as requests are served, and the data is collected and stored on
your own equipment. Regardless of server location, your log files can be
stored on the same network serving your web pages (unless your website
is on a shared hosting environment).
Data Collection Flexibility: Your web servers can be
instructed to collect specific data while ignoring other data. This
gives you the ability to choose the information, file types, server
errors, redirects, etc., that you want to analyze.
Easy Implementation: No page tagging or other page coding is needed.
Database Integration: Some web servers permit direct
requests to a database application. If you have advanced SQL developers
you can answer many of your web analytics questions without using an
expensive analytics application.
Ability to Measure Robot Traffic: It is important to
exclude robot and spider traffic from your reports on website use.
However, it can be useful to have the robot activity data when analyzing
the effect of SEO efforts as this indicates indexing frequency.
Disadvantages of Web Server Log Files
Proxy Caching: Proxy servers speed delivery of web
pages to users but have a negative effect on web server log files
because the request for content never actually comes through to the web
server. In caching, the requested information is kept on the Internet
Service Provider’s machines, which are closer to the user (to speed
delivery) as opposed to the content’s original site.
Browser Caching: Browser caching refers to the
ability of your web browser to store frequently or recently viewed
information on your computer’s hard drive for speedy retrieval. This is
why you get immediate delivery with your “back” and “forward” buttons.
Because the information is stored on the user hard drive during this
process, the information is not recorded, thus lost to the web server
log files. Since users frequently use their back/forward buttons,
information about visitor navigation is not available in the web server
log file.
IP Address Unique Identified: Since the IP address
is always available to the web server log file; one would think this
would be a good way to determine visitor uniqueness. This is not so
because proxy servers are frequently used to pass requests for
information to web servers. The result is that many different users are
identified by the same IP address.
This issue, and proxy caching and browser caching issues, are the
most serious disadvantages of using web server log files as a data
source. Estimates of the information lost have been pegged at 40 percent
or higher. This would translate to a 40 percent traffic undercount to
your website if log file analysis is used by your web analytics
provider.
Upfront Costs: When using a web server log file
analyzer you must purchase all the software, hardware and expertise in
advance. This differs from the ASP model used for client-side data
collection where you pay a monthly fee.