Referrer data and keywords
When you click from
https://yoast.com, your browser tells the website you went to (yoast.com in this case), where you came from. It does this through an HTTP header called the referrer. The referrer holds the URL of the previous page you were on. So if the previous page you were on was a search result page, it could look like this:
If you clicked on that search result, and came to yoast.com, I could “parse” that referrer. I could check whether it holds a q variable and then see what you searched for. This is what analytics packages have been doing for quite a while now: they keep a list of websites that are search engines and then parse the referrer data for visits from those search engines to obtain the searched for keywords. So your analytics rely on the existence of that referrer to determine the keywords people searched for when they came to your site. And this is where a search engine moving to HTTPS starts giving some trouble.
HTTP, HTTPS and referrer data
The HTTPS protocol is designed as such that if you go from an HTTPS page to an HTTP page, you lose all referrer data. That’s necessary because you’re going from an encrypted to an unencrypted connection and if you’d pass data along there, you’d be breaking the security. If you go from HTTP to HTTPS or from HTTPS to HTTPS, this is not the case and the referrer is thus kept intact.
So if all search engines were on HTTPS and your site wasn’t, you’d never get keyword data. The solution for that is simple though: move your website to HTTPS and you’d suddenly have all your data back. This is the case with Bing’s HTTPS implementation: if you search on it and go to an HTTPS page from their results, the keyword data is all there, as you’d expect.
Google’s not provided
“But, but, but” I hear you think: would moving to HTTPS get me all my Google keywords as well? No. Google is doing some trickery when you click on a URL, they actually redirect you through another URL so that the site you visit does get referrer data (showing that you came from Google). They hide the keyword though, as they say that’s private data. Even if you think they’re right that keywords are private data, the wrong bit about what Google is doing is that they are still sending your keyword data to AdWords advertisers. I’ve written about that before in stronger words. If they were truly concerned about your privacy they’d hide that data too.
I’d argue, in fact, that Google is breaking the web more than Bing here: even though I’m going from HTTPS to HTTP, Google is telling the website I visited that I came from Google. It shouldn’t. That’s just wrong.
Is this “right” in the first place?
I’ve been thinking a lot about this. Of course, as a marketer, I love keyword data. I love knowing what people searched for, I love being able to profile based on that. But is it right? Let’s compare it with a real world case: say that you’re shopping in a mall. You leave store A, and they put a sticker on your back. You enter store B and the shopkeeper there takes the sticker from your back and can see what you looked for in store A. You would argue against that, wouldn’t you? Now if you walk from section to section in a store and the shopkeeper can see that and help you based on that, there’s arguably not that much wrong with that.
Of course there’s more to this, in real life a shopkeep can see you, your clothes, your behaviour etc. And of course, shopkeepers target on that too. Targeting always happens, perhaps it’s just that people should be more aware of this. In quite a few cases, it might actually be deemed helpful by the user too.
Another thing I should mention here is EFF’s HTTPS everywhere project (of which I used the logo in the top of this post), which helps you use HTTPS on websites that have HTTPS for users but don’t default to it.
Should we all go to HTTPS with our websites?
Now that Bing has launched its HTTPS version (even though the vast, vast majority of their users still get the HTTP version by default as you have to switch to it yourself), it makes even more sense to move your website to HTTPS.
Here at Yoast.com we’ve always had every page that contained a contact form and our checkout pages on HTTPS and everything else on HTTP. The reason for this was that HTTPS was slower than HTTP and we’d rather not put everything on HTTPS because of that. Google’s recent work on SPDY actually negates most of that speed issue though, if your hosting party supports it. You can find the hosts we know well and recommend here.
There’s another issue with mixed HTTP / HTTPS websites: they’re horrible to maintain when you’re on WordPress because WordPress mostly sucks at it. When you’re on an HTTPS page all internal links will be HTTPS and vice versa, which is annoying for search engines too.
So we’ll be changing, moving everything to HTTPS somewhere in the coming weeks. My suggestion is you do that too. If we’re all on HTTPS, we all get referrer data from each other (for now at least), we get keyword data from search engines like Bing that play nice and we get a more secure web. I’d say that’s a win-win situation. I’d love to hear what you think!