Trouble In the House of Google

Jeff,
Out of the 88.2%, how much of the traffic was previously browsing stackoverflow, then went to Google to run a search, then came back to stackoverflow?

I know frequently when I’m looking for an answer to a question I use Google search rather than the stackoverflow search because it works well (for me) and I’m familiar with how Google behaves. I think the SO search probably works fine, but it’s not what I’m familiar with. Also very often I don’t know whether my I want to search stackoverflow or serverfault, so again Google works better. I know each site has it’s intended audience/purpose, but very often there are nuggets of information that exist on the opposite site. By altering the “inurl:” parameter on Google I can quickly look at a particular site.

Also, within the 88% aren’t there people that Googled “stackoverflow” rather than typing it into the address bar? Technically Google would be the traffic source on these, but I wouldn’t categorize them the same.

@Alexandronov:

Anecdotally I had the same problem the other day. Horror of horrors Bing was better in the end!

Hmm… Why “horror of horrors”? :confused: Shouldn’t it rather be: Good thing Bing was better in the end?

I honestly wonder why Google has never implemented a rating system so far - where users could rate search results. Perhaps something like ranging from A (“exactly what I was looking for”) to F (“Yuck! Obvious scraper/scammer/spammer! - never even show me anything from that domain again!”)

Ok, I am a blekko.com employee (and a former Google employee :slight_smile: so take my comment with a grain of salt. But one of the reasons I joined up with Blekko is that I believe Rich Skrenta’s fundamental tenent that “algorithmic” search was always a hoax. (He doesn’t state it like that but it is what the reasoning comes out to :-).

Basically “algorithmic” search is code for looking for “signals” (which is code for an HTML construction that indicates a value intent) and applying those signals to a list of possible results. Back in the way back times that was everyone had a ‘links’ page of sites they thought were the coolest/best. Google could scrape those, infer the intent, and then rank based on linkage (the original Backrub algorithm).

The achilles heel of algorithmic search is this, “What if you have people who lie?” Which is to say that an algorithim cannot tell, a priori, if the web page it is scanning, which was written by a human for human consumption, was written “from the heart” (which is to say original content, original expression) or was written “from the wallet” (which is to say to specific key word and phrase requirements). Since human labor on the Internet is cheap, an algorithm based on infering human intent cannot discriminate between “good” humans and “bad” humans.

Blekko’s premise is that people know good content. And if a small fraction of those people are willing to take a bit of time to identify the content that is “best” for a given category. Blekko enables that understanding to be codified into slash tags, which are a community resource. Thus a small fraction of people with good taste can create a much better search experience for everyone.

Of course their remains the question of why can’t evil humans do the same thing to Blekko, by creating their own slash tags which have primarily their ad revenue generating content? The answer is that while such slash tags can be created, you as the user decide which (if any) slash tags you want to use to filter your results. If you try a user’s slashtag and find it is full of spammy links you don’t have to use those links, what is better you can use them as “anti-filters” which is to say exclude any sites this spammer has in their slashtag from the returned results.

Unlike the curated directory which Yahoo! pioneered in the 90’s, Blekko crawls the web like an algorithmic search engine, and then sieves the result through what is a community constructed filter of quality. The goal is a scalable, robust, search engine with consistent high quality results. Content farms and content duplicators have to fool a human to get into a slashtag, which thankfully continues to be an unsolved problem.

Great discussion by the way on this.
–Chuck

“I greatly admire Google” - Except for that part where for political reasons they tamper with search results. OH! And their usage of data “accidentally” acquired by Street View cars. And how about how for years they collaborated with authoritarian governments like China to censor free-speech.

Definitely admirable. Definitely NOT evil.

I’m looking for useful information on baby monitors. Sadly, SEO devils know this, and have made it very difficult to find anything real.

If you want a nice tour of these crap sites, try searching for “The Summer Infant Best View color video baby monitor is one of the highest rated video monitors”, and use the quotes so you get exact matching on the phrase.

It takes you on a nice tour of content replicators and scrapers. There are automatic variations as well – “highest rated video monitors” becomes “best rated video monitors”, and so forth.

If you break it down to searching for duplicate sentences, I bet you find enormous numbers of duplicators very quickly.

@Chuck,

I really admire Blekko. It’s been a few years since I started complaining about the vertical shopping list that search engines have become. I always identified the lack of categorized searching as one of the main problems (exactly the ability to trim my search results based on categorized information as opposed to search terms), but always bumped into a wall when I was asked how it would ever be possible for a search engine to categorize the web. The answer was obvious, it wouldn’t be the search engine doing it. It would be the users. I just couldn’t imagine how. Your slashtag solution is tremendously elegant. It’s essentially a categorized search in the hands of the user, allowing them to trust a community effort to categorize search results, but also create their own which can be made private, if they so wish, for the maximum search results customization possible.

This is the type of innovation that defined Google back in 1998. An innovation that I no longer expect from this company, which I predict will lose its dominance of the web search engine market sometime in the next 10 years because it is falling prey to the exact same vices displayed by the companies it displaced back in 2000 when it became a phenomenon; Google corporate nature slows down new developments and its commitment to the current winning strategies clouds their vision of future (and present complaints that are starting to emerge). Without competition, Google has been falling behind in user expectations and admiration, to the point of having become a common target of criticism.

I’m not saying however you guys are the solution. I’d hope you to be because your current process really strikes a chord on how I personally see the web search requirements of the decade that is just starting. The SEO button and the /rank slashtag are also a boon that cannot be overstated. The fact you folks chose the angel funding venue also gives me some confidence in the ability of your project to stay afloat in bad weather. As far as I’m concerned, I’ll do my part by using it as much as possible, create slashtags when needed, and essentially be part of what I hope to become a growing community. As I said I don’t trust the current solutions as the ones who will take web search engine to new heights. They are becoming old and disconnected from their users requirements.

I’m really happy Matt Cutts is aware of this problem now, because it is HUGE.

I recently had to 301 redirect two domains to new domain names, due to a copyright issue with the domain name. The domains were trusted, 2 years old, excellent rankings. 4 Weeks after a redirect and the rankings were completely gone. Nada. Zilch.

Here’s the kicker. All those scraper sites that loved the old domains so much are now OUTRANKING the new 301’ed domains.

Everyone is talking about this on Webmaster World forums. I can’t find stuff easier on google anymore. The long tail has been cut off, and the google rabbit doesn’t know it’s head from it’s tail(!) Finding anything complex just doesn’t work like it used to. From looking for new drivers to finding solutions to php coding problems, it takes AGES to find what i need.

The relevancy signals are all screwed up. G, i hope you get this fixes, not for my sites but for everyone! Bing is very, very, VERY attractive right now!

@InsomniacGeek put it in very transactional terms, but let me back up a step and suggest that what’s happening is what the world wants to happen, and you simply don’t like it. (I’m not a fan, either, but let’s see if we understand it.)

Just as TV has democratized from the 3 networks with programmers guessing what people will want to watch out to today’s 108 channels being run over by YouTube.

The news of today is the same story. From a couple of national authorities plus a local rag that ran the wire stories plus the want ads, blogs such as this have arisen and are being monetized by scrapers and packagers. This is NOT because Google is evil (a separate question); it’s because the efreedoms of the world actually collect repeat hits from viewers who indiscriminately are happy to see a story of interest, and don’t give a damn about cultivating sources, depth, communities, etc. No repeat hits, no eyeballs for sale == no aggregators.

I’ll assert (at least for discussion) that the aggregators are performing exactly the service that the lumpen proletariat wants. And Google is merely delivering on its promise of “relevancy” for the majority of its visitors. Just not what you or I think is “relevant.”

Unfortunately I get the feeling that Google has started to go the Microsoft root - forgetting about polishing the core functionality of the product and just adding bloat upon bloat. The awful site preview ‘feature’ is an example of this - you can’t even turn it off.

I mean route of course :wink:

@Coruskate: wikipedia actually is in (a pretty big) part a “scraping” website, one that aggregates texts that were published elsewhere, indexes them, enriches them, updates them, and so on. This is perfectly legitimate when the original content was Creative Common (which by the way is also not always true).

So, by many metrics, wikipedia is “better” for the consumer than the (often obscure, poorly laid out, poorly edited, maybe even advertising infested) websites that originally published the content. Being the original publisher doesn’t automatically qualify you as “better” from the consumer point of view - even if I agree that original content producers should be rewarded.

Another way to look at this is as an incentive issue, rather than an algorithm issue. A large part of the incentive for spammers is Google itself, in the form of Adsense. They could easily reduce spam by having more stringent Adsense publisher guidelines, but of course they wont, because this affects their (Google’s) bottom line.

Basically Google’s incentives are as aligned with the MFA spammers as they are with the consumer (if not more so), especially given the lack of viable alternatives (although thankfully Bing and Blekko are gradually getting there).

I unfortunately have to second your perception of decreasing quality of google results. Just yesterday I needed a SIP VOIP provider, but almost all the results were SPAM or sites looking very shoddy. I chose the one that seemed to be the least untruthful of them (fonosip.com) but it appears that I was ripped off. I bought credits, registered my SIP device in their network, but cannot make any calls. They do not reply to my e-mails and their twitter account (@fonosip) seems to be pure automated spams to farm links… Guess I just lost 15 USDs but losing faith in Google was much worse than that.

Funnily enough - before Christmas I was hunting down some obscure jquery/ASP.net compatibility bugs, and I was delighted to find so many results on Google related to my problem. Imagine my surprise when it was just one SE post, scraped and reposted over and over and over. And the SE post was actually about 3/4 of the way down the page, so I only found it later on in my search. I then started noticing a LOT of scraper sites popping up higher than SE. Only in the last month or so have I noticed this though.

One reason that Google will produce “worse” SEO-spammed results than other search engines (which nobody seems to have noted in my quick scan) is the obvious one.

Spammers don’t gain jack from trying to optimize their Bing or Yahoo results.

Google has the most impact, and thus it’s the thing they’ll put the vast majority of their effort into gaming.

Maybe we should outlaw the root of all this: affiliate programs. Linkshare, Amazon and other big sellers are causing this since they are giving content farms a raison d’etre.

I agree with Robert Osborne and Jasonharrop – Google must use feedback from their users:

  1. Google Toolbar data – the more users visit certain page or subdomain – the better.

  2. Google Search results – the more users click search results – the better destination subdomain is.

  3. Google Search results views – the more users see results – the worse subdomain is. Ideally, every view in search result should end up in some actual web site views (if possible).

The only reasonable explanation of why Google does not efficiently do that is … deteriorating quality of management in their search team.

I also noticed that search results weren’t quite what I expected lately, so what I did was to set the search to display 50 results per page instead of 10, to be able to quickly scan through more results.
With right search words and a quick scan I can find what I need, even if not in the first 4-5 results…
I always hope to see better.

I think the problem isn’t the search engine, the problem is that the advertisement is allows on such sites. It generates value for noone, especially the adverts firm.

How about the net-ads get a hold of themselves and stop accepting any webpage as a potential ad-spot. If there were a policy that said that ads would not be put up on sites that scraped content of other sites just to get webhits, then this problem wouldn’t occur. Infact, ads would become more valuable so such a policy could be a win-win.