How we count “visits”
This article was updated February 3rd, 2014.
Our pricing is partially based on the number of monthly visits to your site, so we’d better have an accurate definition of “visits.” This is an interesting question anyway, because it’s one of the primary web analytics metrics. But, this is harder to define than it seems.
There are two fundamental questions:
- How should a “visit” be defined?
- How do you measure “visits” in practice?
Defining a “visit”
Let’s just write down some some events that we think should and shouldn’t be a visit:
- When a human being first arrives on the site and loads the page, staying there for 31 seconds, that’s a visit.
- If that same human then clicks a link and sees another page, that’s not a new visit; that’s part of the same visit.
- If that same human doesn’t have cookies or javascript enabled, still all that should count as one visit.
- If that same human loads the site with a different browsers, that’s still not a new visit; that’s part of the same visit.
- If that same human bookmarks the site, then 11 days later comes back to the site, that is a new visit.
- When a robot loads the site (like a Google or Bing search bot), that’s a visit, but if one robot scans 100 pages quickly, that’s one visit. (You might disagree that a robot is a “visit,” but consider that from a hosting perspective, we still had to process and serve all those pages just like it was a human being, so from a cost or scaling perspective, bots count the same as humans.)
- If a robot scans 20,000 pages over the course of a month, that’s not just one visit. It shouldn’t be 20,000 visits, but neither should it be 1. Something in the range of 100-1,000 visits is acceptable.
- There are additional cases too where the “right thing to do” is less clear. For example, take the case of a “quick bounce.” Suppose a human clicks a link to the site, then before the site has a chance to load the human clicks “back.” Does that count as a visit? Our servers still had to render and attempt to return the page, so in that sense “yes.” But a human didn’t see the site and Google Analytics isn’t going to see that hit, so in that sense “no.” Because we need the notion of a “visit” to correspond to “the amount of computing resources required to serve traffic,” we round off in favor of saying “yes.”
So rather than attempting to write down an exact definition of a “visit,” we’ll just say that whatever it is, it has to be consistent with all the notions written above.
Exception: We do NOT count “image visits” towards traffic charges.
There’s a special kind of “visit” as defined above which we do NOT count towards your account. This is a visit which hits only static content (usually an image), but doesn’t hit a normal page on your site.
This is common with things like not using a CDN, getting hot-linked, Twitter campaigns, and embedding images into email campaigns. While this does represent real traffic to your site, and real cost on our side to serve it, we also appreciate that sometimes this is out of your control, and that it’s less expensive for us to serve static content than it is to serve dynamic content.
If you get a lot of this sort of traffic, we’ll reach out to you to understand what’s happening, and see if we can work together to create a solution that doesn’t involve so much traffic, such as enabling our CDN, getting you signed up for a service like CloudFlare, moving content to a content service like S3, and so forth. But we won’t charge you extra.
Measuring a “visit”
This is where things get tricky.
It’s tempting to say “Whatever Google Analytics says is the ‘number of visitors’ in a month, that’s the number of visits in a month.” But it’s clear that this metric does not satisfy the definition above. GA doesn’t measure bot traffic or “quick bounces.” And GA would double-count the case of a human using two browsers or (sometimes) who has cookies disabled.
We also need something clear and simple so it’s trivial to compute and easy to analyze if it’s not behavior like we expect.
So we’ve settled on this metric:
We take the number of unique IP addresses seen in a 24-hour period as the number of “visits” to the site during that period. The number of “visits” in a given month is the sum of those daily visits during that month.
Does this satisfy the conditions above?
- Yes, because that’s an IP address.
- Yes, because that’s the same IP address, so it won’t be counted again.
- Yes, because we’re not using cookies or Javascript or any other feature of the browser.
- Yes, because it’s tied to the network, not the browser.
- Yes, because we reset our notion of “unique IP address” every day.
- Yes, because robots and humans are treated the same — both have an IP address.
- Yes, because robots have the same few IP addresses, so they will be consolidated within one day, but will count again the next day.
- Yes, because we’ll see the hit in our logs.
This does mean there’s some cases where you could theoretically argue we’re counting visitors too often. For example, a person visits a site from work, then drives home and visits the site again later that day. That will count as two visits because the IP addresses will be different. But, we’d argue; (a) that doesn’t happen much, (b) it’s not terribly unreasonable for that to count as two visits, (c) those events are counter-balanced by times where we count only one visit where really it’s two.
As an example of that last point, what if two people in the same office visit a site from two computers? That should be two visits; even Google Analytics would count it as two. But we count it only as one because their IP addresses (from our perspective) are the same. So the cases where we count too few are counterbalanced — to the first approximation anyway — by those where we count too many, and therefore we think this is still a fair metric.
Thanks for your nice explanation, Jason. I’ve been having conversations with WPEngine engineers about this for the last month and this matches what I’ve been told, with a few additional explanations, for which I’m grateful.
One issue that is perplexing to those of us running WP sites is the number of “unsolicited” and “useless” bots that parasitically survive by feeding on the “Twitter ecosystem.” Many of these bots are experimental, hit us from multiple (sometimes -many-) IP addresses (most frequently based on Amazon AWS) and may spider hundreds or thousands of pages very quickly. They place great stress on our site (or on WPEngine in this case) and since they do not identify themselves in many cases (User-Agent:), we cannot perceive any value in serving pages to them. All they do is hit pages and walk all around the site hitting page after page, which is the most expensive object to server, and they generally do not hit graphics at all, so it’s clear they are not “real browsers” (human beings).
So, when we tweet something and include a link to our WPEngine site, we may be hit with hundreds of bot requests within 10 to 30 seconds. Almost all unsolicited. And all of which end up “costing us” (and WPEngine of course) by adding to the load on the site.
Operating my own WP servers, which I have done for years, I see this kind of behavior in logs and I firewall these guys out when they do not properly identify themselves or do not obey robots.txt. That is a constant, intensive, and ongoing battle. But on WPEngine I 1) don’t have logs; and 2) can’t firewall them; and finally 3) they cost me incrementally if they hit from enough IP addresses on enough days. And they end up kicking me and my customers into a higher service tier.
My thought has been to firewall out all “bot-like” requests that do not present a useful User-Agent: … and perhaps that is a way of applying pressure to these guys to get them to properly ID, but it would have to be done on a larger scale to make a difference. Otherwise they just move to more IP addresses and change their User-Agent: to look more like a real browser (which some of them do).
There is a further issue, which is that if someone wants to “attack” a site because it espouses views they do not like, one way to attack it is increase (useless) traffic in such a way to cause it to scale up its number of servers and/or its level of service until the sponsor can no longer afford the cost. This is called an “Economic Denial of Sustainability” attack or EDoS attack. I can give you several examples of this from my personal experience, since we deal with clients who have been attacked in this fashion. If WPEngine has not seen this yet, you may rest assured that at some point you will. (It depends a bit on your client base, of course.)
You are clearly (all) thinking about this, but what are your ideas specifically about all of these seemingly-useless bots?
Hey Sky,
I wanted to follow up with this question. In general, what you do to attract traffic varies site by site, and of course that’s an independent decision that WP Engine does not monitor. We want to make sure all our customers can employ their preferred strategies thundredso grow site traffic, whether that’s via twitter strategies or otherwise. Bots, in any case, will be a part of managing and growing a site.
However, with that in mind, this is a question best suited to directly engaging our sales team, which I understand you may have already done. I’m going to close comments on this particular post because I want to make sure that questions like this one, which may be relevant to the particulars of one site or another can be fully addressed directly one-on-one.
If you have more followup questions, please let us know at [email protected].
Thanks again!
-Austin Gunter
This is an interesting approach to a problem with a lot of technical and social pitfalls. Initially, I was concerned that you would miss a tremendous number of visits where users were behind a gateway.
But, I went back to check, and the EFF’s Panopticlick (https://panopticlick.eff.org/) browser fingerprinting dataset did not see a huge number of visitors behind gateways, and their audience is skewed toward the technical end of the spectrum.
From the PDF: “We saw interleaved cookies from 2,585
IP addresses, which was 3.5% of the total number of IP addresses that exhibited
either multiple signatures or multiple cookies.”
Thanks for sharing.
Thanks for clarifying what a “visit” is. I was curious about the metric you used since pageviews are very different than visitor counts in Google Analytics, for example, but that might even be thought of as a new visit since each pageview requires additional resources.
If someone is browsing exactly when the 24-hour period restarts, there’s the chance their unique IP will get logged twice, correct? (Edge case, but I thought I’d ask).
If you exceed your visitor count of the personal plan for a single month, are you automatically moved to the professional plan?
Is the visitor count displayed anywhere in the dashboard so you might have an idea when to upgrade plans?
This is a very reasonable, and well-articulated policy. (The anti-example would be the Google Maps rep I asked about what constitutes a map view – I felt like I was speaking Venutian trying to an approximation so I can budget.) The key point is you’ve got no motivation for nickeling and dimeing.
If it is “correct” is not terribly important.
It is important that it is something that is
a) measurable (by you)
b) consistent (doesn’t give different results for similar traffic)
c) relatively easy to understand (from a customer perspective)
d) relatively consistent with how other people measure “visits” (to give it some credibility)
Google Analytics fails on a) (since you don’t have access to Analytic data for customers).
Your previous measure, where you used something akin to “server request” (if I understood right) was wildly off the mark on c).
This new measure seems reasonable.
If you then start giving customers more that just a single digit as stats (“Usage – Last 31 Days”) but instead, say, daily numbers and a (at least) yearly history, in addition to a monthly number would be even better.
With the added benefit that customers could compare it to their own stats from e.g. Analytics (even if the exact numbers are different) which should show similar trends if different numbers.
How do you count visits that come through a proxy server such those used by large companies and ISPs? In that cases, doesn’t each computer on the network appear to the web server as if it generated from a single IP address?
The answer is in the post:
“As an example of that last point, what if two people in the same office visit a site from two computers? That should be two visits; even Google Analytics would count it as two. But we count it only as one because their IP addresses (from our perspective) are the same.”
My guess is that many of your customers are bloggers, most of whom monetize their content through advertising. The ones that are making money from their blogs understand the importance of the term RPM, revenue per thousand. So if that is a term that everyone understands, why not just price the service in a similar manner? Instead of pricing by visits, why not price by pageviews? As a publisher, I should be most concerned with three things: creating good content, the RPM of that content, and the cost of serving those pages.
I found that the “visits” stat I know about is not the same as WP Engine sees it when I first contacted sales. This article helped a lot in understanding how “visits” are counted and I can see the logic in all arguments, but when I come to the point of deciding to go either way, the “hidden” visits (as mentioned by @Sky) are coming out of left field for most of us.
I became aware of those when testing cloudflare for the first time and noticed a drop in my stats. At first there was a bit of panic, but the I realized that it actually does the job it should and well… blocking those unsolicited and potentially harmful visits and thus save on resources and speeds up the site performance.
I did not use it for long due to other conflicts with my WP site setup, but I keep thinking getting back to it, especially in tandem with WP Engine so that the total visits count will reflect the true human visitors count.
IT might be better if WP Engine puts a system in place that will allow to filter / firewall these extra visits so that they do not get counted as it is today. This blog post is great and should be linked to the visits number in the pricing plan so people are more aware of it – I was surprised to learn the first time that my GA visits stats is one or two tiers lower than what I actually need 😉
The main problem of this situation is that you can’t have a stable predictable notion of the costs if you can track the hidden visits metrics.
Does that make sense?
A very good explanation what a visit is. I think using IPs as a metric is a fair deal. Thanks
One issue that concerns me is the number of useless (any non search page related) bots that scan WP sites. I don’t perceive any value in serving pages to them. And they are coming from all over the world. Do you firewall out these useless requests?
I hope this is easy to answer. Its clear that you guys must measure our visits on our sites to see if we are going over the limit of our account or not. So – is there anywhere where we can look up the number of visits to our site in our wpe account? I know you must have the numbers but I dont know if we can easily see where we are at in a given month. If there is no way for us to access this in our account; let me be the first to suggest this as something you guys start doing…
Thanks!
Gerard
Hey Gerard! You can find the visits for each of your sites in the User Portal.
https://my.wpestaging.qa/
Hi there,
How about bots that no one wants on their sites? Are those considered visits?
1. Bots that are not associated with search engines?
2. Brute force attack bots . . . Hundreds of thousands of IP addresses are used now for brute force attacks. All automated. NONE of them real visitors. Are they considered a real visit?
3. Bots from countries where I don’t want visitors? Bots from counties that are known for trying to exploit websites in an automated fashion.
4. Bots from compromised IP addresses looking to compromise more sites and IP addresses.
Lastly, and I don’t mean to sound ethnocentric here, but I have an English language sites. I want English language speakers on my site. That’s my market. I don’t care if visitors (real people from other countries visit my sites). But, I don’t want or need bots from any other country visiting my site. I don’t need them and i don’t want them, because 99% of these bots are simply looking for exploits and clean IP addresses to do more exploiting and hacking and stealing.
Regards,
Jeff
I think a lot of us in the comments are looking at the WPEngine visitors metric from a ‘this is how many people are coming to MY site’ metric, when the angle on the metric is really for:
1. Your plan allowance
2. Server load
The metric is from WPEngine’s perspective, on their hardware. It is not a measurement you show to advertisers for example.
It’s interesting because my stats in WPEngine are 3723.0/day whereas my WordPress.com stats are around 100 which include page views and my Google Analytics are around 500 as they count unique visits not pageviews.
I noticed when using the staging url for WordPress.com stats my numbers where 4 times higher than what they are for my main site.
Why is there such a huge difference between WPEngine stats and the others?
While I’m happy you have this long explanation here it still doesnt make a lot of sense. I have 2555/visits per day on wpengine and 500-600 real humans on the site daily inside clicky and maybe a few more than that in Google Analytics. In the error logs I see tons of spammers trying to hit (join / login) pages that don’t exist. Is there a better way you can break down that extra 2,000 visits a day? Maybe a breakdown of bots vs crawlers vs humans vs spammers?
Hi Lauren,
That’s a great feature request. And in fact, we do now offer that breakdown in the User Portal (http://my.wpestaging.qa). The “Overages Report” has lots of helpful and actionable data: mobile vs. non-mobile traffic; breakdown of bot traffic by User Agent, country, etc; breakdowns of normal traffic and feed traffic by those same lines; your most common static and dynamic requests; etc. It’s an awesome tool, and we hope it helps!
What if we experience a DDOS attack or something where we are getting zombie machines to continue pinging the site? This is similar to Lauren Albrecht’s question about spammers but the answer of being able to see the visitor breakdown doesn’t give much of an answer oh dealing with that.
Thanks.
Thanks for your question Vinny. We have some scripts that look for attacks like this and block them. Initially those single IPs may get counted as one visit, but if they hit the URL over and over they won’t be counted again. We also know when a DDoS is happening, and will contact customers when their site is being targeted in an attack. Hope that helps! Kirby
I appreciate that you need to determine visitors in a way that makes sense to you, but I do have some concerns.
Firstly, there are a lot of bots on the net that no one (except AWstats) counts as visitors, and it seems from other commenters in this post that there is a significant difference between what Google (for instance) counts as a visitor and what you do.
But more significantly, I was wondering how you treat DNS attacks, which can come from very large botnets. If someone decides they don’t like my website, they can cost me a huge amount of money by directing thousands of bots to my site over a number of days. If a 20K botnet attacks it would overwhelm the smallest plan in a day.
Or can it?
Hi John,
I can definitely sympathize with your concerns, and we thank you for taking the time to share them.
There are obviously a lot of analytics out there. The gold standard, Google Analytics is a great tool – we won’t argue. GA counts the humans to whom you can serve ads; bots, for example, are ignored because you can’t advertise to a robot… cue Robot Apocalypse music. So GA is designed as an advertising tool; it’s not intended to reflect the actual hits that the server processes, or the amount of resources needed to fulfill that role. The server access logs provide us with authoritative data that truly reflects the load on the server, data not available by any other source.
We’re never surprised when GA numbers differ from the stats drawn from server access logs. It’s a discrepancy that every host must navigate. While we’d love to work out an analytic that a) correlates to GA by some predictable ratio, while b) still accurately reflecting server load, we simply haven’t found that metric yet.
In the event that your site is under attack, you can trust us to defend you and not penalize you financially for having been targeted. We’re flexible enough that we can approach each customer’s issue case-by-case, and react fairly and accordingly.
I hope that addresses your concern. Let me know if there’s anything else we can do.
Alex
This looks like a great opportunity for you to add a kind of robots.txt brute-force protection service with a built in honeytrap. Much more innovative and attractive than playing numbers.
We have a handful of visitors and a shedful of bots. I am put off by your visitors policy as we have a fixed budget, Have to stay away from wpengine while it is bots that are determining our plan.
Shame.
Hi Dave,
We definitely hear you re: your desire to block unwanted/abusive bots – and of course not pay for them. You’ve raised a great question, and I’ve attempted to break it down below.
There’s a big difference between: a) legitimate bots that are critical for your search engine rankings; and b) abusive bots that hog resources, threaten your site, and rack up visits. Our own Tomas Puig dives into the importance of the former in his post here: https://wpestaging.qa/2013/10/22/robots-may-feelings-eyes/. For the latter, our Support Team can help you out with IP blocks and other security recommendations. In fact, we already have automated systems in place that protect you from a subset of known malicious bots.
For a typical site on our platform, bot visits might account for 10-20% of the total visits, with a vast majority from legitimate search engines. Which means in most cases, unwanted bots represent a negligible portion of the traffic you’re paying for.
Fortunately, our monthly traffic analysis breaks down that bot traffic by country, User Agents, assets crawled, etc. In the event that you’re seeing patterns that you’d like to block or investigate, please don’t hesitate to reach out to our Support Team.
Our method of counting bot visits is actually more forgiving (for the customer) than a bandwidth measure, as an alternative example. Bots may crawl hundreds/thousands of pages very quickly, demanding far more server resources than a human. We regularly see patterns where legitimate bots may account for about 10% of a client’s “Visits” count, but they account for 40% of the hits/requests on that site. A bandwidth measure (or pageviews or similar method) would charge the client far more for those bot visits than does our count of unique visitors.
Please let me know how I can help in the future. And thanks so much for sharing your feedback.
Alex
Howdy,
I have looked at WPengine in the past and was put off by many of the same issues as have been listed here repeatedly (most especially the issue re: unwanted bots).
This little snippet from your reply to Dave’s concern above was invaluable:
“…our Support Team can help you out with IP blocks and other security recommendations. In fact, we already have automated systems in place that protect you from a subset of known malicious bots. […] Fortunately, our monthly traffic analysis breaks down that bot traffic by country, User Agents, assets crawled, etc. In the event that you’re seeing patterns that you’d like to block or investigate, please don’t hesitate to reach out to our Support Team.”
This should be emphasized on your visit policy page.
For many non-techy users, it is not obvious that such solutions are possible. Making it crystal clear that unwanted bot traffic can be mitigated and/or prevented once it has actually occurred might allow many people to feel more secure in their cost forecasting.
I myself chose Pagely over WPengine based in large part on this unwanted bot traffic point, but generally because their service seemed to give me a more predictable cost picture. However, due to some ongoing technical issues, I am (again!!!) on the hunt for some awesome hosting.
The ability to work with Support to investigate and block or otherwise treat unwanted traffic (bots/countries/etc) is a game changer. I wish that this point had been made clearer back when I first looked at WPengine.
Cheers,
Max
(seoactivist)
Love your service, love the performance, couldn’t live with the ambiguous pricing given the odd definition of “visitor”. If you guys ever go to a CPU usage pricing or something a bit easier to understand, I’m back hosting with you guys! I just can’t host when I’m paying for what “might” be bots or Google images.
Hey Adriel, we’re going to be updating the way we count visits in a blog post this week. Stay tuned!
-Austin
Will my visiting my own webpage register as a visit? Not that it matters in the larger picture of thousands of visits, but I’ve always been curious if the software excludes the count when it’s the owner/admins page.
Hey Steven, if you hit your site, it’s hitting the server, so yes you would count as one IP address in a 24 hour period.
I’m experiencing more than a 10x discrepancy between page views in GA and “visits” in WPEngine.
My site gets comparatively little traffic, with less than 3,000 page views for the previous month, yet somehow I have exceeded my WPE plan, with more than 32,000 “visits” logged in my dashboard, costing me extra money in overage visits!
Not sure how this is possible. I’ve lodged a support ticket to try and get it sorted out.
Hi Damien,
Our Support Team will take care of your support ticket as soon as they can.
In the meantime, you can see a very detailed analysis of your visitors by going to the User Portal “Home” tab and clicking “Overages Report” in the “Overages Last Period” box on the right side.
– Kirby
Hi
I love wpengine, but because of this policy perhaps I have to move to pagely who do not count page view for the same price (without git and staging). But for personnal plan I suppose few people use it.
I really hope you can change this setup.