Trouble In the House of Google

I’m a member of StackOverflow, but I got into the habit of using Google to search for results on StackOverflow. When I remember, I just scope the search to limit to only the stackoverlow.com site.

But, as your Amazon search experience, I’d just search StackOverflow directly if I felt the search results would be as good as Google. (I haven’t compared recently)

Moving away from content-based ranking feels scary to me. I’d rather have things stay as they are. Or, if you will, give the social search engine as an optional approach to enrich the algorithmic search.

I feel the problem with Google however is not the algorithms, but the absence of essential information that can no longer be ignored; i.e. Google has to stop presenting results as a veritable shopping list and seriously consider the introduction of categories into its search engine.

This much was attempted by Cuil and was my favorite feature of that otherwise failed attempt at producing an alternative to Google. Backed up by intelligent algorithms like Google is capable of doing, scrapers wouldn’t be able to avoid being moved to their own category away from normal searches.

<conspiracytheory>It seems to happen every year in the leadup to christmas when companies need to rank well (and will pay for adwords if they aren’t ranking)</conspiracytheory>

I remember very well that i asked some friends if they’ve noticed how google results deteriorated the day google instant search was deployed.
Another clear difference is in the way apple sdk documentation don’t appear anymore among google search results ( although in that case it may very well be apple’s decision).

this may be overly simplistic but could the magic dial google needs to turn simply be to adjust how the date something is published adds to the ranking?

on most searches say for “lady gaga” they should return the content with the most current date.

on a search for “binding a select list with MVC.Net” the scraped site is going to have a more recent date but should be ranked lower.

either way this problem seems to show up when using google to research technical solutions more so than other things.

I link to StackOverflow quite a bit in my Buzz. I wonder if Google has though of using Buzz as an input? :slight_smile:

Google has been giving terrible results for product searches for a long time now. I tried to research a new dishwasher a few years ago and Google was a mess.

Google’s main problem right now is that they think they can stay ahead of the black hat SEO with algorithms. Take it from somebody who used to work in the anti-spam and anti-virus world … you can’t! You need to use more than algorithms, in particular you need to use feedback from your users.

Google has a huge database of data on it’s users. Google knows I’ve been using google and gmail for years, that I’m a real person, I use Google search dozens of times a day and I’m pretty technical. If Google had a “this is spam” or “this is not useful” button the would have millions of pre-validated curators to help them filter results.

I’ve been seeing a ton of efreedom.com & questionhub.com data dumps of stackoverflow outranking you all through the holiday break.

I don’t think “social” is the answer to better search results. In fact, I think it makes it worse.

  • First of all, there is no reason to believe that spammers cannot manipulate a social search engine. Just look at sites like Digg.com where at one point people were getting paid to blindly vote content up.

  • Even with educated, moral users minus the spammers social search ranking would most likely result in search result popularity indicators, which is not necessarily actually the best search result.

  • It is questionable in the first place whether users at all would rank search engine results. They just want the result.

I’m hoping this recent problem can be solved by an algorithm, as I place more trust into that than mankind itself.

The scrapers are probably doing lot of SEO optimization. It is time for stack overflow to hire some SEO services. Wikipedia is not monetizing in anyway other than donations whereas stackoverflow does display ads of its own so why not hire someone to do SEO and stay on top?

As I understood you’re talking about scrapers, that try to cheat google algorithms (if the algos change, the cheating will eventually evolve) but you’re blaming google, and not the scrapers??

What about SPAM, whose fault is it?

(just to be clear, I see google in this case more as a victim than as a villain)

I hate to play devil’s advocate here, but I’ve noticed that efreedom.com, one of the SO scrapers, actually provides significantly better search and related question generation than SO does. There’s a fine line between being a leach and being a value-added content aggregator.

As I understood you're talking about scrapers, that try to cheat google algorithms (if the algos change, the cheating will eventually evolve) but you're blaming google, and not the scrapers??

Google has always been about organizing oceans of chaos into something manageable and searchable. Lately they’ve failed in two respects. First, their searches are finding more noise and less signal. Google is supposed to be chock full of the best minds on the Internet and their algorithms are being beaten soundly. Not only has SO been s

(continued… damned mouse touchpad error)

…Not only has SO been squeezed out of the rankings, but I found Christmas shopping online to be much worse this year than last because what I wanted was buried under crap search results. Maybe they’ve stopped trying or caring about search.

Secondly, Google is the 800lb Gorilla of the Internet. If they wanted, they could simply crush anything that opposed them. I’ve been dying for a “never show me results from this site again” button in Google’s search results. One click and the offensive scrapers go away.

Bing, on the other hand, seems active in tweaking and refining results…

I’ve been dying for a “never show me results from this site again” button in Google’s search results.

Indeed. A content-based algorithmic search with the addition of user tools should be the way to go. I’d really like to manage my search results, and I don’t mean in the way of voting for links like Google has implied sometime ago with their social searching services. That won’t solve the problems and will introduce new ones (like social engineering or regional/cultural encroachment).

The current model is becoming expired and the “market” of content consumers is becoming less relevant in Google search results. This was once the great novelty of Google and what elevated them to their present status.

Reminds me of Ben Croshaw’s comments on user-created video game DLC: “…and don’t tell me user ranking is the answer, because anything that references Naruto will automatically get five stars.”

@Clintp Interestingly that “never show me results from this site again” button used to be in google’s results but isn’t anymore. I miss it! Meanwhile we can try this extension: https://chrome.google.com/extensions/detail/ddgjlkmkllmpdhegaliddgplookikmjf

I’ve noticed Google results being gamed for the last 6 months at least. Too many sites devoid of actual content being listed at or near the top of the results. software.informer.com was the first I noticed, but it’s only gotten worse.

Lately I’ve switched to duckduckgo.com and msdn for any technical searches. MSDN even includes stackoverflow results!

This is an interesting twist on the software monoculture problem. By being the overwhelming favorite, Google gives spammers a single target to focus on.

On the other hand, unlike viruses, there are no inherent platform differences that prevent spammers from tweaking their content scrapers to poison other search engines as well. So if Bing were to gain more market share its results would likely start to drown in noise as well unless they have some secret sauce (or armies of content reviewers) to defend the walls against the barbarian hordes of spammers.

The problem with adding a button that says “this web site is useful” or “report abuse” is that it shifts the battle to gaming that metric instead.

The other problem is that from the algorithm’s perspective, information is information, so who cares if some content scraper serves up information copied from someone else as long as you the searcher get the answer you’re looking for, right? In some cases, the sites in question could be legitimate mirrors. When discussing flaws in the ranking algorithm, this goes to the heart of how you phrase the question – assuming there are no flaws in the software, it is probably performing as intended, and this is essentially a garbage in/garbage out problem. Unfortunately, there is a lot of garbage on the Internet.

I do find that if I search for a product I end up getting drek back, but on the flip side it is drek offering to sell me the product, which is a reasonable assumption. The search for iphone cases mentioned above is a good case in point.

I will admit to being somewhat of a Google fanboy, but then I also have a lot of patience and am willing to venture as far as page 20 in search of useful results, I’m also making friends with Google shopping, not to mention continuing to read magazines. I find that helps me find products to search for.

Personally I’ve never had a problem with searching for technical data, but I understand that if you’re producing it, that could be an issue due to the amount of drek and scraping that is served up.

I’m sure they’ll get on top of it, otherwise people will vote with their feet so to speak.