Trouble In the House of Google

Like it was said way upthread… I see this as a manifestation of the Windows/Mac malware thing. All the bad guys are optimizing their dark SEO for Google, not Bing. If they decided to focus on Bing, given time to catch up to their extensive knowledge of Google internals, the same would happen to our bingy buddy.

A thought: does Google factor in domain registration time to its algorithm? This seems like a reasonably accurate heuristic for tracking original content vs. scrapers. Obviously a “reused” scraper domain would be the problem.

I remember seeing this auction on flippa a while back:
https://flippa.com/auctions/102189/1-iPhone-case-site-11kmonth-profits–2-million–pageviewsmonth

Thats the number one search result for “iphone 4 case” above apple, above amazon…crazy. Their auction description gives more detail on their seo efforts.

I troll AM forums like wickedfire where I often find insightful threads like overstock dominating the SERPs for very generic keywords. Why is the #1 result for “watches” “luggage” “crib sets” Overstock? arent there more deserving and relevant results? is google and overstock profit sharing?
http://www.wickedfire.com/shooting-shit/111865-fuck-overstock.html

As a web developer and SEO enthusiast, I’ve been increasingly surprised at how hard it is to find anything on Google anymore. This holiday season was particularly frustrating. After three or four attempts to find a Tiffany’s bracelet, I gave up and went over to Bing, where I actually found several pages of relevant content to choose from.

One of my professors in college told about some theory (can’t remember the name right now) where as you try and narrow down your hypothesis to get more and more specific, at some point, you actually become less and less effective and what you’re trying to achieve. He used the visualization of an hourglass. As you narrow your results, you get to a finite point (the actual apex where the sand drops into the next chamber), after which point you get further away from what you’re trying to achieve.

To me, this is where Google is right now. They’re trying WAY too hard to continue to generate revenue while delivering the most personalized, specific results to the user. People are catching on and their gaming the system, without penalty. The fact is Google is broken and it will take a while before it’s fixed.

Well, my blog / site is a lot smaller than yours, and I have put zero effort so far into SEO, but I get very little traffic from search. Most of my traffic comes from Twitter, Hacker News and DZone, with DZone being the biggest contributor.

In any event, I do think you’re right about the content farms and other “spam” clogging Google. It seems we need some kind of reverse Turing Test. A person can easily tell a chatbot from a human conversation partner, but can a few cubic miles of MapReduce engines?

Well, unless you count sports reporting, that is. :wink:

http://borasky-research.net/2010/12/30/sure-why-not-five-predictions-for-2011/

Really useful article, thanks google chrome…

Indian Sarees

I have been disturbed by Google for sometime and in actual fact never use it as a search engine. From a hidden program installer on my computer to the new Google chrome which outright says its keeping all and I mean all your information on a cloud. It wont even let you download programs. I also noticed recently that Microsoft gave the makers of a game information with regards to how many players were playing the game for the month as well as how long they played it. Seems to me every iota of privacy is disappearing. Quite frankly it scares me silly that people continue to believe it will do no harm. What if the USA Government or some other controlling force demand the info be handed over? The same with the new Apple patent with the application they intend putting in your phones? There is really only one person you can trust with your personal information and that is yourself. I am not a scare monger but I have seen what harm a dictator can do. The only way to keep yourself and your information safe is to totally keep it off the net.

When I have a programming problem, I google for ‘{error message} {platform}’, get the useless results, and then google for ‘{error message} {platform} stackoverflow’ and i get a lot of good SO results and nothing else. In that respect, the system works, but yeah - you guys are missing out on a hell of a lot of good traffic.

Internet would be a much better place if we finally deploy some cryptography based ranking/kudos solution. I don’t know – like http://www.bitcoin.org/ but for rankings. Something that would made SEO impossible at all.

Social search will work until the minute it wont work anymore. It’ won’t take long before spammers and scrapers find a way to beat the system like they managed to beat algoritmic search engines.

It’s allow more easy to beat a social sistem than a algorithmic one. I’m just saying, humans like to see the dancing bunies.

+1 to personal website blacklists: I’ve felt a need for them for years. If I take the time to click a link, and understand that a website is somewhere I never wish to come again, I would like to be able to leverage that investment.

+1 also to being able to “follow” other people’s blacklists. It would be very dangerous, though, to make the stats about following public: it would probably create a gravity effect towards the most followed blacklisters, who could then become too powerful - and be tempted to monetize that power, as it happens on many social networks. This should be of interest to Google - it would also allow them to have better social graph data.

If enough “trust communities” will grow, this should also lower the incentive to game the system, making SEO-only websites less lucrative.

Talking about the scrappers, there are two very different situations here. The first one is where the content is Creative Common (or similar) and legitimately reproduced. In this case, there is no reason why Google should automatically give the first publisher a better ranking: if some of the scrappers published it in a “better” (whatever the metrics) way for the searcher, why shouldn’t it get a higher placement? It would be very nice, though, if Google aggregated the similar pages, like it does in news: it is very annoying when you click several copycat links in a search.

The other case is when someone steals the copyrighted content. This should definitely be penalized, and in theory it should be the law to do so. Considering the reality of things, though, it would also be very much in the interest of Google to help the original content producers protect themselves - giving them an incentive to produce even more good content. Considering how fast, and how often they crawl the web, Google could very often find who really published first, and if there were a meta tag about the copyright and license, it could at least warn the original publisher, if not find a way to penalize the stealer.

I think what this shows is that over the long term, Search as we know it is broken.

The only way out will be “hybrid curation” – basically back to the Yahoo model at the highest level – with algorithmic results (ala Google) into the depths of the curated high level web sites.

Amazon is a great place to research products even if you don’t shop there because there is curation and great reviews.

[dc]

well Matt C did say thatgoogle had taken resource away from some aspects of antispam and that would be returing this year.

One of the biggest flaws of Google is that it gives way too much importance to domain names. If you search for ‘iphone 4 case’ you see plenty of websites like iphone4case.com, getiphonecase.com, iphone4gcasereview.com, www.iphone-4g-case.net, www.4iphonecases.com etc. I don’t understand why domain name is such an important criteria for ranking search results. That needs an immediate fix.

Interesting article. My responses to a couple of the comments:

First: “[…] where the content is Creative Common (or similar) and legitimately reproduced. In this case, there is no reason why Google should automatically give the first publisher a better ranking: if some of the scrappers published it in a “better” (whatever the metrics) way for the searcher, why shouldn’t it get a higher placement?”

As a consumer I would rather reward (with traffic) the content creators for making knowledge available to mankind than I would the scrapers who have not generated anything new. By doing this, I assume, I am encouraging them to continue to create content - looking at the scrapers instead is less likely to have that effect. So, the scrapers are by definition not “better” and if the metrics think they are then the metrics are broken.

Second point: the categorisation into “social” and “algorithmic” search seems to me terminologically inexact when a key element of the “algorithmic” search is which sites have incoming links from other people. If those links are put there by people, that’s a pretty social algorithm :wink: Perhaps the distinction would better be drawn between “anonymous” and “personal social” search.

@Vasuadiga

Well, it really isn’t the domain name that is influencing the results. Along with that domain there’s a legion of SEO techniques that are the actual responsible for the website placement. There never was, and still there isn’t, any reason to believe the domain name factors in a website rank. Neither it would make any sense. What happens instead is that a domain name like iphone4case.com facilitates the creation of a link anchor text that may be more relevant to Google’s algorithms (it is believed that a link anchor text is important).

So with a domain like that, the owner is effectively creating a commercial name that goes like “iPhone 4 Case”. Contrast that with the same business, had it been named mobileshell.com. When someone links to their business, the link anchor text and surrounding text could read as:

  • Find your iPhone cases at iPhone 4 Case
  • Find your iPhone cases at Mobile Shell

On the first case, both commercial name and anchor text accurately reflect the business, whereas the more creative second option will however produce an anchor text that doesn’t. So when searching for the company name, “Mobile Shell” may produce a lot of false positives with links to the military or engineering areas, whereas “iPhone 4 Cases” will not. On the other hand, when searching for the more generic term “iPhone cases”, the first company is at an advantage because there’s a real chance that the vast majority of anchor text that link to their website include these exact terms (the plural form is largely ignored by google).

Have you tried Googling “sugar bowl” lately? Very misleading first listing.

How does Bing fare in all this??
Is Bing equally scraper-infested??

Should and could Google and Bing et al. create scraper blacklists similar to anti-spam blacklists??

I have not yet experienced this scraping etc. – are such issues related to how general or specific are one’s search terms??

Thank you, Tom

Crowdsourcing is the answer, IMO. If there’s one thing Google has, it’s a lot of users. Whatever happened to SearchWiki (http://googleblog.blogspot.com/2008/11/searchwiki-make-search-your-own.html)? It had a pleasant user interface, well-integrated with the results. I feel that such a mechanism, with a reputation system of some sort (perhaps subscribing to weighting results as edited by trusted groups of users) could drastically improve search result quality.

From google perspective :
How about letting users vote to bury sites that just copy content?

From web browsers perspective:
How about making an plugin to preprocess google result filtering out sites on a black list?

From a user perspective:
Use alternatives to google (yahoo, bing, etc), the less people use google, the more google is forced to improve. Google replaced yahoo, but it can be replaced if they don’t hear their users.

@Scott Willeke, Thanks! I installed that blacklist plugin. I’ve been wanting such an extension for some time: https://chrome.google.com/extensions/detail/ddgjlkmkllmpdhegaliddgplookikmjf

I want to acknowledge Google as an innovative company that almost single-handedly made the world wide web useful. As of last year they’d crawled over 1 trillion unique URLs, an astounding amount of noise to sift through. I admire their engineering ethos and feel their business largely adheres to “don’t be evil”.

That said, there is a real, serious problem with result quality. Google is a victim of its own success. The ecosystem they created is so profitable that it requires Google to spend inordinate time (possibly 50% of engineering?) keeping webmasters honest. Pick your metaphor – traders gaming the stock market or bacteria growing antibiotic resistant – bad websites are out-evolving Google.

Ranking knowledge has become ubiquitous, and sadly knowledge of gaming an engine has become more important to content sites that writing valid, expert content. It’s not just the spammers, malware sites, and scraper sites writing worthless keyword stuffed content and buying links.

Google also made a deal with the McContent devil, Demand Media: http://techcrunch.com/2009/12/13/the-end-of-hand-crafted-content/

Demand buys up search queries and pays writers a paltry sum (dollars) to write poorly-researched content on subject areas they often have little to no experience in. Demand makes a few ad dollars per article, with traffic exclusively driven by search (I’ve never met anyone who goes directly to eHow.com to browse.) In turn, Google takes a cut of Adwords dollars. In the short run, Google’s bottom line looks better, especially on a Youtube site they’ve had trouble monetizing. Demand runs eHow, but you’ll equally vapid content on Q/A sites Wikia, Yahoo Answers.

Google needs to respond or their flagship search will suffer. The solution will be complex and multifaceted. In addition to small, incremental changes, I think Google will need to make some seismic ones. Google will face cries of injustice from “content producers” in the gray areas, but they need to stand tough.

I run SEO program at a large US media organization, NPR. From the beginning, we’ve stayed above board - fixing coding issues, worked on syndication, and trained our writers on the very basics. We write first for humans. That ensures that Google crawls us adequately, but we do lose traffic to sites that out-SEO us, legitimately or otherwise. My long view is that this current state of search is not sustainable, and any efforts we spend beyond the basics are at the expense of other products we can build.

It’s easy for my organization to take this tack, however, because we’re a well known brand and we can focus on other channels, such as social media and viral sites. Content producers should think about the tradeoffs they make when going broke for SEO – it’s impossible to quantify the traffic you don’t get from Facebook/Twitter when you water down your content.