Trouble In the House of Google

Let's look at where stackoverflow.com traffic came from for the year of 2010.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2011/01/trouble-in-the-house-of-google.html

I’m pretty sure I would buy a stress ball shaped like your head.

Anecdotally I had the same problem the other day. Horror of horrors Bing was better in the end!

This is a really interesting article and reflects what I have been feeling for a while, that the relentless and exponential rise in SEO activity would eventually start to affect the usefulness of Google.
In a weird way it feels analogous to when people say that Windows gets more virus attacks because it has a bigger audience than other operating systems rather than being less secure necessarily (Im not saying it isn’t)
In other words, perhaps Bing has an advantage here as it is less targeted by SEO activity, the same way that non-Windows operating systems dont suffer the same level of virus attacks?

You’re confusing a model with reality: Google isn’t gravity, they’re trying to model it using their ranking algorithm. And as your experience clearly shows, it is currently modelling gravity (i.e. the “reality” of which sites are relevant and which aren’t) badly.

To stay in your metaphor, it’s clearly time for a paradigm shift, a kind of Einstein of web ranking algorithms. Sounds like a very interesting thesis topic. cstheory.SE, anyone?

I recall that for a while mirror sites started edging out Wikipedia. Google apparently added a bonus for Wikipedia just to force it to the top over its clones. Is it reasonable to expect them to do that for everyone? I don’t know.

This is exactly the reason why I supported webmasters.se so adamantly. Where else could someone go to get great expert help for problems like this?

I’ve often wondered about the case of Wikipedia, and how scrapers that simply syndicate it are quickly penalized. I think Google has some sort of ‘special intervention’ when it comes to Wikipedia. I just hope they afford the same luxury to Stack Exchange.

I have been frustrated with this in recent months also. A scraping website has recently copied the top posts from my blog verbatim without even attributing the original. Google somehow ranks the copies with a higher page rank, causing a significant drop in my stats.

The worst thing is that you are mostly powerless in this situation; who do you complain to, google?

I was recently shown an application for blind testing search engines ( http://blindsearch.fejus.com/ ), and was surprised to find that Bing and Yahoo often delivered better results than Google.

There are search terms that do not list Wikipedia content on the first page, but scraped content as the first result, e.g. http://www.google.com/search?q=elvett+semic

I have tested it from Germany, if it makes any difference.

It is because contrary to wikipedia people don’t link back to StackExchange, they just want the info and close the tab, so the original content provider doesn’t gain anything.

But we should try to find why spammers are better than you, what are they doing better ? You should ask them.

In a techcrunch post (http://techcrunch.com/2011/01/01/why-we-desperately-need-a-new-and-better-google-2), Vivek Wadhwa said the same thing. Trouble over google head ?

This is a really interesting article. More and more recently these sites are taking priority but I have only seen it with stack overflow related content. I always end up ignoring them and trying to find the stack overflow link as ultimately the number of adverts show you the content didn’t originate on that site but I think as other people have said it would be interesting to find out why these sites are taking priority.

Wait - “broken” doesn’t mean “broken beyond repair”, let alone that somebody else would be able to provide a better fix.

At first sight this is merely a problem of Google neglecting to use a specific bit of information in their filtering, namely the information who copied content from who. It won’t be possible to do this automatically with 100% accuracy, but I guess if you take all content from two webpages as prepared for indexing (that is, basically the plaintext conversion), compare them for similarity, and when found to be similar, downgrade the ranking of the one to appear most recently (which is not so very easy to determine) then you can mostly fix this problem.

The basic problem I see is that even with smart optimization techniques (e.g. limit comparisons to pages with similar sets of keywords) the similarity testing required probably won’t scale. However there are ways around that too (e.g. don’t calculate it at indexing time but spread the calculation over queries at querying time, to make ranking progressively smarter, but I have no idea if Google’s infrastructure allows stuff like this).

What we need is a quality measure for search result and this can only be provided by the user. The best way to collect user feedback is by making the process social.

Google previous vote up/down is lacking in social interaction, it could for example shows how many users vote up/down for a certain site.

It’s something annoying indeed. There will always be ways of cheating the system, but I think Google should close the holes in a faster pacing.

Google should buy delicious and add relevance based on number of bookmarks and it’s tags. =)
When search results show only a bunch of add based with no real content sites I go to delicious and the same search most of the time brings helpful results.

Thank you for this. For near two years I’ve seen an increasing deterioration in the usefulness of Google results and thought that was what was happening to the web in general (“blogging is dead,” for example). The few times I used another search engine, the results were better and that confused me, given my past experience with how good Google once was. At least now I know it’s not the web in general and it’s not been my imagination: It’s been Google.

I used to work at a search engine company, so I know this is a very difficult problem, and I honestly do believe Google is trying their hardest, but clearly not enough, and you have to wonder if it’s partly because they are making money in many cases from those scraper sites (IE they have Google ads).

My site - http://www.ausedcar.com used to be in the top 10 on Google for the very popular keywords “used cars” but I’ve been knocked back to around #45 over the years simply because everyone in front of me uses black hat SEO. Including major million dollar companies! The only difference is that Google is not going to blacklist a major company, while they probably would blacklist me, so there is essentially no way for the little guy to do anymore then hope Google will give some scraps off the table.

The fact that you simply can’t do a product review search on Google anymore without getting spam is serious trouble for Google. Unfortunately Bing seems to copy the same algorithm, if they were the “Anti-spam” engine, they could gain real ground.

There’s one thing that I never understand by any critic of Google (especially in the search domain), when you say:

“Still, looking at the statistics, it’s hard to avoid the obvious conclusion. I’ve been told many times that Google isn’t a monopoly, but they apparently play one on the internet. You are perfectly free to switch to whichever non-viable alternative web search engine you want at any time. Just breathe in that sweet freedom, folks.”

What do you mean?

Do you expect that a better search engine just materialize out of thin air? What are you expecting? That perhaps Google should fund their competition? Maybe they should hold back a bit on improving their technology and let the other poor souls catch up?

One thing I dislike very much is when people feel entitled, like it’s their birthright, to something they should work hard for. Have all the free internet services gotten you into the mindset of a spoiled consumer? This is not how entrepreneur thinks, this kind of thinking is not going to make the world go round.

Yes, you are right, you are being “churlish”, show me a better search engine and then we’ll talk.