Trouble In the House of Google

It seems like search result quality went down with the recent real-time searching update. Maybe stuff like real-time twitter search beating Google made them rush out a non bulletproof real-time algorithm. Most likely this will improve with time as long as search is still the highest priority at Google.

Google is not gravity, it is a part (only a part) of the mechanics of natural selection (Darwin is The Dude, not Newton). The ecosystem has changed, which is all too predictable, but the law (force?) is still there. That something else will have to be added to the picture, possibly on top of Google - probably inside it too - is pretty obvious: we a still at the rehearsal phase of the Web. And Google still has a lot to show in the infrastructural front. Also, we need to think differently about our own data, and we will.

But I have to say, agreeing with you, that it is not the semantics, the secret is still on the syntax. Syntax is our stuff.

Maybe it’s time to use blekko?

Just thinking out loud, but Google did make some changes this year… adding Caffeine and all.

I wholly agree as to the poor recent performance of Google’s search results. I was just recently looking for binaural audio (I have my doubts, but was curious) and the results Google returned were garbage, with many results hosting identical content. The scammers are currently winning, or Google has failed to continue to implement its core concept.

That being said, in the search realm, this seems similar to what happened in the earlier days when Lycos promoted itself as having more indexed pages than any other search engine. That was a relatively easy hallmark to beat, so Lycos died rapidly. If someone else comes along with a much better search algorithm than Google at this stage, they just might have inertia like the early Google did. Or, more likely, the wealthy Bank of Google will buy them out.

Great post, and a great idea Iraê to add Delicious into the algorithm.

I agree that we should be using other search engines and also like comparison engines such as http://blindsearch.fejus.com/ but believe it will take a huge upheaval for what we have learned as a species over the last 10 years to change behaviour on the sort of scale needed to alter the statistics as above.

Google has fallen foul of it’s own algorithmic success and is at a stage where diversification seems to be their strategy rather than adding in other methods for improving results. Personally I can’t see Google winning and am keen to see what new players will come into this space. I believe more complex algorithms that search multiple resources and think/calculate longer before results are returned could be the way to go.

While annoying when it happens, this is really nothing new. Several years back Google search results were inundated with parked domains only serving ads and other such useless pages. Google eventually cleaned up their algorithm. Sometimes their ranking system is gamed, other times they may make changes reducing the quality of rankings.

I considered switching to a new search engine and before I could find a better one, they fixed the problem.

Complain and then give them a chance to improve. It’s a well established cycle that works very well for them.

I’ve been noticing for some time now that the results have not been as good as they were and I regularly find myself either trying more and more detailed terms or other search filters to help find what I’m looking for, and often it is not in the first 3-5 results.

Right now, my default engine is Bing and I’m finding better results there for some queries but they haven’t indexed as much of the web as Google has so Bing’s results are good, but still needs more time to develop.

Google on the other hand has been getting less and less reliable and I’m finding more and more content scrappers showing up in the top results and from what I can tell I’ve been noticing the decline in quality ever since the Mayday Update from last year. Since that time the quality of results for longer tail stuff have been a lot more inconsistent and there seems to be a lot more non-relevant data showing up in the longer tail searches.

Also, I don’t like Google trying to show me results before I even typed in what I want. If I’m looking for a new car, and type in ‘new cars’ in Google, they end up giving me ‘netflix’ for the first letter n, ‘netflix’ again with the second letter added (ne), ‘New York Times’ when I type in the third letter (new), ‘New York Times’ when I add the space (new ), and if I add the first letter for the next work I still don’t get what I’m looking for ‘New Century Bus’ for (new c), ‘New Carrollton Metro Station’ for the next letter (new ca), and it still doesn’t get the search right even if I type in the last letter so now I’ve typed in ‘new car’ and the results that I’m getting in the instant are for ‘New Carrollton’ and noting about new cars is coming up. However, if I look at their suggestion list the actual result that I wanted is showing up as #3 on their list, but the problem is that I typed in the exact term I wanted to search for and instead of giving me what I wanted, I’m getting some other search that is not even close to being relevant or of any quality for me. With this in mind, I personally think their instant search is a joke, waist of time, and a big nuisance because I got 7 different search results for what I typed in and not one of them is actually relevant or what I’m looking for. What a big mistake Google made on releasing what I call their ‘Crystal Ball’ search where they are trying to predict what you want but are not doing a very good job of it.

Great blog post. I can’t imagine what effect an increase of that 88.2% will have if Google makes a better model.

Jeff,
Do you continue to interact with Matt Cutts on this matter? Matt’s opinion would be most interesting and most convincing.
You have not provided any material evidence any particular search, etc. to prove what you said in this article. Do you own any statistics? I admire StackOverflow, I know that you are a well know person in Web / programming circles, but it lately became very fashionable to attack Google.

Somewhat related article was published on TechCrunch (no real data either) at http://techcrunch.com/2011/01/01/why-we-desperately-need-a-new-and-better-google-2/

I’m pretty sure there will be related discussion in Google Buzz, hopefully with Google employees, including Matt Cutts at
http://goo.gl/6eVTw
http://goo.gl/xCDsD

Thanks

Over the last year or so I’ve noticed Google’s results getting worse. I basically taught myself design and front-end development by googling. Now, when I try to google for the most basic of searches I have a hard time finding the god content that was once right in front of me.

I’ve had to type more detailed searches, use the timeline features on the left sidebar and I started using delicious as a search engine for web related stuff more and more.

An article I read last night mentioned blekko.com (often times much easier to find what I’m looking for there) - I tried it out a few months ago but added it to my bookmarks bar recently and have been getting used to using it the last few days.

I’m glad other people are starting to notice google’s bad results and hopefully the more people talk about it they’ll the more google will work on improving and getting back to the search results of a year or two ago when I was actually able to find stuff…quickly.

I also noticed ‘stack scrapers’ recently and figured you guys would be a little upset. It’s not their users creating the content, why the hell should they make money piggy backing off of you. But yes, if someone scrapes your content the original content should always be placed first in the results.

Good luck, hope you guys get it resolved and hope I get my google back.

I don’t think we should be surprised by this. Google might say that they’re in business to make the world a better place, but let’s be honest… they have a responsibility to their stock holders to be as profitable as possible, and I believe that letting some of the scrapers move to the top of the list can only be padding the bottom line at Google. After all, how many of those scraper sites have placed AdSense ads on their sites?

So, let’s see:

  1. Google’s primary income comes from AdSense ads.
  2. StackOverflow doesn’t have AdSense ads.
  3. efreedom have Google AdSense ads.

If you were Google, which site do you want people to go to?

You do the math.

Perfect example: I just searched “Android tablet” and clicked on the Google News tab. The 1st link I’m offered is for a site “TMCnet.com”. It talks about Toshiba’s new tablet being launched later this year. But throughout the article, it constantly links back to the “REAL” sources of the article, engadget and crunchgear.

So, how does an article by TMCnet.com, which is basically regurgitating what the other 2 more legitimate sources are saying, JUMP AHEAD of the actual sources? How is that possible? Interestingly, I didn’t find any Google Adwords on their site, so that’s not the motivation in this case.

Yes, converging feelings about social vs algorithmic search, I would say it is a recurrent rethoric now at each year’s end.

Last year’s passage (2009/2010) we had predictions and high praise of “real-time search” which could make Google bite the dust.

Where is real-time search now ? Where is real-time social search either ? the point of it and the results of it ?

I have long held that Search (ie, Google) would eventually bend to an anti-network-effect, where the SEO-gamers would eventually win and smaller search engines would flourish. I’m fond of DuckDuckGo right now. However, Google has gotten their edge back several times over the last few years, and I wouldn’t count them out. I agree that Google is currently losing, and I recommend people try the blind search tests themselves.

I believe Google’s strength will come in personalized search results, and I don’t think Google is using personalization as much as they need to. If I bypass all the other links and go direct to StackOverflow every time, it would seem that - for me - this should work itself out quickly. I would be interested in whether your experiments with StackOverflow were with “clean accounts” or crusty accounts like mine, where the searcher is a known technologist.

Regarding someone who posted about categorization, Google’s Caffine architecture, and the search results currently being returned, show a high amount of diversity. They’re clearly categorizing and showing “best in category” in the front page results. This effect is positive for a number of kinds of search, but negative for “tight searches” (like iphone 4 covers, where you might end up with a ipad cover taking a slot due to a diversity algorithm, to the point where you’re only delivering 4 or 5 results that were tight).

An improvement I would like to see in Google is a Google Labs experiment with a prominent “more like this” button. I’d rather do my first search and drill down, and it’s clear Google has the categories and pre-calculated math to do so.

If I were out to game Google right now, I’d be building a very human-like browser (or using mechanical turk) to search for terms and click on my links. I suspect Google has greatly raised the priority of link-click in their reputation scheme, and gaming that system wouldn’t be terribly hard. The benefit of blending personalization is hopefully I don’t look like most mechanical turks.

@Konrad: “To stay in your metaphor, it’s clearly time for a paradigm shift, a kind of Einstein of web ranking algorithms.”

That’s not staying in any metaphor, bud. It’s all over the place. =)

I’d written about what is the intrinsic flaw of algorithmic search a while back, this may be of interest:

http://lesswrong.com/lw/28r/is_google_paperclipping_the_web_the_perils_of/

Not meant as an insult: BUT it’s very very difficult for Google to decide if a site is a content farm, a ripoff or “valid” content. How should Google decide if a link at stackoverflow is a link that comes from an SEO idiot or valid? If a link on del.icio.us/digg/reddit is valid or simple SEO?

But I’d agree that Google should react faster. Especially ripoffs that do not confirm to the cc-license could be detected (at least mostly) automatically.

A social approach is the solution, but it MUST be designed so that it may not be gamed. The best way to do that is to allow me to vote up or down the search results and to allow me to blacklist/whitelist sites and to OPTIONALLY include my friends black and white lists.

It is this component of including friends, i.e. people I already trust, that ensures that it won’t be gamed. If a friend of mine tries to game me, he/she won’t be my friend for long. So it’s self-policing.