![]() |
|||||
Your web metrics are all wrong, and they'll never be all right!You have a web site, so what! Four years and more than 100 site analyses later, I've come to the conclusion that web analytics will never be an exact science even though its' influence in the decision making process is growing. There are issues inherent in web data analysis that can make it difficult to get accurate, insightful data. Most people are not aware of many of these issues. This article's purpose is to make you aware of the pitfalls associated with web data analysis, so you can account for their impact or fix them. This week, we'll look at some of the ways your data can be skewed by the very technology it seeks to leverage. Below, I will list some of the major technology issues that may prevent you from having 100% accurate web traffic data. I will also illustrate how these issues can impact your business. 1. AOL Proxy servers "When a member (AOL user) requests multiple documents for multiple URLs (web pages, PDFs, etc.), each request may come from a different proxy server (A different IP address). Since one proxy server can have multiple members going to one site, webmasters should not make assumptions about the relationship between members and proxy servers when designing their web site." Implication on your business: I know when I see a lot of single page loads to the home page, I start considering making minor changes to get people to click further, incorrect data can lead to you making the wrong decisions. The reports that quantify number of visitors, or unique users, have the potential to be highly inaccurate, if you do not account for or fix the AOL proxy server issue. While many corporations and some ISPs use proxy servers, this sort of scenario does not pose a problem because the number of users coming through non-AOL proxy servers to web sites is small. But for AOL it presents a huge problem, as AOL drives up to 50% of the traffic on some sites I analyze. 2. Random spiders This is good news! Using a tool that automatically recognizes and filters out automated spiders helps you get closer to reporting, analyzing, and making decisions on 100% correct data, which as I mentioned earlier should always be a goal, even though it is a very elusive one. Here's the bad news any programmer can create a spider and send it to your site. There are thousands of unknown spiders, and some of them are crawling your site, inflating your web data even as you read this article. These unknown spiders usually get through the average web analytics tool filters because these spiders don't identify themselves as such, but instead appear as regular users. E-mail harvesters are an example of random spiders. An e-mail harvester is an automated program that is built to traverse the web looking for e-mail addresses to add to its database to SPAM later. Ever wonder how a spammer got your e-mail address? One popular method is through automated spiders like e-mail harvesters.
Well, think about it, if you just started an ad campaign or new marketing initiative, and suddenly a significant amount of traffic came to your site, you might attribute this success to your new ad campaign or marketing initiative. In actuality the increases you saw in your data may have been the result of an unknown and unfiltered spider. Even worse, if you assumed that your campaign was a success, you might extend the campaign and spend more of your budget on it, essentially throwing money out of the window. NOTE: If your web analytics solution does not require reading log files, but instead uses a small piece of JavaScript on each page, then you may not have this issue. Spiders typically can't read JavaScript and will not register in your web site analysis reports. I recently spoke with a rep at IBM who works with their surfaid analytics program, and he explained that their software uses logic to automatically filter out automated spiders. If a user loads a given amount of pages in a certain period of time, then the user can be automatically filtered out. IBM's surfaid team also keeps track of the growing list of spiders and updates their software to filter them out. This is the first program I have seen that recognizes the importance of automatically filtering out suspicious activity, as it can lead to highly inaccurate data. 3. Frames Implication on your business: 4. Flash & Dynamic sites
No matter how many different pages a user views, it will always appear as if they load the home page over and over again. Fortunately not all flash and dynamic sites are programmed in this manner, but many still are, and for those that are, true analysis can be difficult and sometimes impossible. Analyzing data for conversion metrics, ROI for various marketing campaigns, top entry points into your site, user paths through site, fall off rates, and many other essential Internet business metrics are not usually possible without paying for additional programming changes to correct the one-page site dilemma. 5. Sharing secure certificates (I don't mean a user that goes from: http://www.mysite.com to https://www.mysite.com, but more like http://www.mysite.com to https://secure057.notmysite.net) When secure transactions happen on someone else's server in which you are "sharing" a secure certificate with others, you do not have access to that log data. Implication on your business: Getting this data from a shared hosting environment could prove rather costly, and may even be impossible depending on how flexible your web site host is. The #1 question some of you may have asked after last week's article
is: Here's my answer: You are already half way there! 1. Use cookies or unique user logins Using cookies with your analysis tool may take some time to configure, but stick with it. It has the alternative benefit of alleviating the proxy server issue. I feel much more confidant in web data when cookie technology is used in conjunction with a web analytics tool. You can even go one step further and use unique user logins, but unfortunately this isn't always feasible. However if you are analyzing the effectiveness of an Intranet site, where users have to log in to get access, then unique user logins will help you get data that is even more accurate than cookies. Remember: Some cookies will be rejected or deleted occasionally by cookie washers (software designed to clean all cookies off of a computer), but the vast majority won't be, so your data should be pretty solid. 2. Educate yourself on the basics of programming Learn just enough about web programming to be able to articulate to your programmers what the pitfalls are. Being able to "talk the talk" will help your programmers to avoid some of the issues that come with dynamic and framed web sites. If your site uses flash technology, explain to your designer that each time a link is clicked the URL bar should change. You don't need to understand how this gets done, but you do need to be able to articulate these kinds of changes in a language that your programmers and designers will understand. 3. Analyze your data for anomalies 4. Focus on what matters Having a tool to aggregate a basic, but highly important, statistic for auditing purposes is very helpful in ensuring that the data you have is accurate, and the methodologies used to get that data are sound and able to be replicated. 6. Get an experienced Internet metrics analyst · The programmer understands what technologies can be implemented
to provide accurate data that the business feels comfortable using to
make decisions. They fully understand the implications of web site design
on web analysis. As someone who has analyzed log files for over four years, I still have to refer to my checklist of things to look out for when performing analysis, the technology is always changing, and every site is programmed differently. Bonus:
|
![]() |
| Sign Up for my newsletter, it won't hurt! |